Monday, August 15, 2022

Capacitor removed from Arduino and secondary SPI clock is clearly working properly


I found the capacitor which had one end connected to the AREF external connector of the board. Given its relatively large capacitance, its impedance at the clock frequency of my SPI link was nearly zero. 


It was quickly desoldered. The repaired board was taken to the bench and the scope connected to the AREF external connector pin which was now wired to the SPI clock for USART 2 on the microprocessor. If the clock was now working, with the scheme I put in place there should be bursts of 16 clock cycles every second or two while my Arduino looped around sending alternating commands to the FPGA.


I also connected this through the level shifter boards I will be using, as I needed to be convinced that the shifters could operate with a 4MHz signal. The results were great! The clock from the SPI link clearly showed up on the scope in bursts of 8 cycles with the rest state high.

The voltage of the Arduino produced clock swung between +5 and 0V, while the output of the shifter moved in concert between +3.3 and 0V. The waveform was not distorted too much, a possible outcome if the shifter couldn't handle the 250ns cycle time and 125ns duration of the 'on' phase of the clock. All was good with the signal quality and levels. 


What was evident immediately, however, was that I was NOT getting the 16 bits I expected during the SPI transaction. The ATMEL documentation made it sound like the buffered USART would accept two bytes and send out both in one transaction, but that is not what I am seeing on the scope in real life. This could be a C coding error or a misunderstanding of the documentation. 

In addition to that, there are several other things that must be verified to ensure that both ends are able to communicate. These include clock polarity, clock phase, and endian-ness of the transmission. I can change these at the Arduino to ensure a match with the FPGA but they must be verified and adjusted in necessary to have a good link. 

Polarity for an SPI clock is whether the idle state of the clock is high or low. I want high and that is what I see on the scope, so clock polarity passes muster. 

Clock Phase is a design choice for when signals are sampled - rising or falling edge of the clock - and as a consequence when they can be changed from one bit's value to the next. I can only verify this by watching the MOSI line, where the master (Arduino) sets up the 1 and 0 values of the bits in relation to the clock. 

The Arduino takes a parallel word input to the SPI channel and shifts out bits one by one. What we need to see is whether it should put the most significant or the least significant bit out first. The goal is to have the assembled word in the FPGA match what we expect, not be swapped back to front. Again I can observe this by watching what the SPI channel outputs on MOSI and SCLK from the pattern I am writing 

Sunday, August 14, 2022

Of all the pins I could have chosen to repurpose for the secondary SPI channel clock . . .


I discovered alternative schematic files for the Arduino Mega 2560 which conflict with the ones provided by the official site. In particular, the signal AREF which is show on the official diagram as connected solely to the connector block pin labeled AREF, in the alternatives I find a capacitor on that signal with the other end tied to ground. 

Not a good thing to have on a multi megahertz clock line. At 4 MHz this capacitor has an impedance of 0.4 ohms, essentially a dead short to the oscillator signal. Even if my setup of the secondary SPI channel on USART 2 is correct, I wouldn't have much of a clock signal available. 


Reworking to pull that capacitor off the board was quick and easy once I knew it had to be done. If by chance the USART clock was damaged by the short, I have three other USARTs I can switch over to on this board. 

First testing of SPI link between the Arduino and the FPGA - SPI is not trying


The FPGA input output pins are set to LVCMOS 3.3V logic levels, but the Atmel Mega 2560 processor operates with TTL 5V signal levels, thus I have to insert a level shifting circuit in the SPI link lines MOSI, MISO, SCK and Slave Select, as well as with the SPI transaction flag line. 

When I build the final production version of the Virtual 2315 Cartridge box I will have manufactured a printed circuit board to hold the level shifters, which also are needed for the thirteen lines between the IBM 1130 disk drive and my FPGA board. 


I tied the grounds of the two boards together and then snaked the signal lines through the level shifter breadboard. The Arduino code was modified to loop sending two initial commands over the SPI link, turning the RAM access over to drive mode and then switching it back to SPI mode. Success will be immediately visible on color LED 0 which will shift to blue when the board is in drive mode. 


The clock of the SPI secondary channel never oscillated, nor did I see any action on the MOSI line. This tells me that the channel was not initialized properly or there is some other defect. I verified this on the TTL (+5V) side as well, it is not a level shifter failure.


I need to walk through all the documentation from Atmel on setting up the USART for SPI Master mode as something is definitely not right with my Arduino sketch. 

Friday, August 12, 2022

Modified Arduino Mega 2560 to enable second SPI channel


The chip on the Arduino has power serial channels called USARTs which do more than the usual asynchronous serial protocols. Atmel provides support in their serial ports for pseudo SPI mode - master mode only, can't work as a slave - and the chip has four USART ports. 

One is connected to the USB link for serial communications as well as programming the board. The other three are accessible for normal serial communications, bringing their Tx and Rx lines to external pins on the board, but the clock pin of those serial ports is not brought out for connection. This limits the use of certain clocked serial modes and more importantly it does not allow the use of the port as an SPI link. 

Those signals are on pads of the Atmel ATMega2560 chip, sitting right there on the board in its TQFP 100 pin format that has the leads around the rim of the chip. If it had been a BGA or similar chip that has the connections underneath it would not have been as easy to gain access to the signal. 

Time to whip out the stereo microscope, rework tools and modify this board to allow the clock signal of one of the USARTs to be connected via one of the external connectors on the board. All the external connector pins are wired to other signal pins on the chip or to other circuits, but not every pin is important for my planned use. 


For my sacrificial external pin I chose the AREF signal, used with analog inputs. I don't need it for this project. It is at pin 98 of the chip and runs only to the AREF connector which is on the other side of the ground in next to Arduino pin 13. That will now be my Sclk pin for USART 2.  The lead of the processor was lifted off the pad to disconnect it and I verified lack of connectivity with a VOM.

Pin 98 lifted off pad


USART for serial port 2 has the clock pin at position 14 on the chip, which is currently unconnected. A wire tacked onto that lead of the chip is the next step of this modification. 

Pin 14 is XCLK2 - clock for USART 2

I used some 30 gauge wire as it is nice and thin, appropriate to the width of the processor lead to which it is soldered.

Wire tacked onto pin 14


The wire was routed through a mounting hole and down to the underside where I connected it to the bottom of the connector block for the AREF pin. This is now a SCLK pin for USART channel 2. 

Routing wire to underside

Soldered to AREF pin of connector


I discovered that Seeed Studio produces a variant of the Arduino Mega 2560 which they call the Seeeduino Mega 2560 which among other differences brings out more signals to external pins. Very significantly, it brings the serial port two clock to a connector, the same signal that I had to hack to access with a standard Arduino. Seeeduino Mega description

It is on order and should arrive next week, allowing me to skip the hacks when building more of these. For now I can begin to debug the Arduino to FPGA communications with the hacked board while I wait for the shipment that is already delayed by UPS for 'operational reasons' which I guess is code for 'my bad'. 

Feedback and diagnostic information brought to LEDs on the Virtual 2315 Cartridge FPGA board


The FPGA board I use for this project has a number of user inputs and outputs that I decided to put to use. There are four slide switches, four pushbuttons, four white LEDs and two tri-color LEDs on the board. 

The FPGA board


As I was limited on output indicators, I chose to use the four slide switches to route signals to the four LEDs. For each of the sixteen binary values set on the switches, I route a different four signals to the LEDs. 


The board has two tricolor LEDs, with individually selected red, green and blue LEDs inside the package. While these can be set to a wide range of colors by modulating the brightness of the three colors, interpreting subtle shade differences didn't seem useful.

As a result I set these up so that in most cases they would glow in one distinct color to highlight various conditions. One of them displays normal state information, the other displays if there are high level error conditions. 

The status LED is blue when the board is set to communicate with the Arduino, off when set to operate with the drive or green during reading or red during writing. 

The error LED is green when no error exists. If we caught an error during write and locked up the drive, it is red, while if there have been errors on the SPI link to the Arduino we light it blue. 


There are four pushbuttons plus the reset button on the FPGA board. One button will reassert the power-on reset condition to put all the logic back in its initial state. Another will reset the latch set whenever there has been an SPI link error. By pushing that button and observing how long the error lamp stays green instead of blue, the user can get an idea of the rate of errors on the link.

The other two buttons have no real function. For convenience I wired one to pulse the Access Go signal, thus changing the cylinder number in accordance with the step size and movement direction signals. The other button triggers a push of a read from RAM with whatever address happens to be set up, primarily to allow observation of a RAM access cycle.  I will change these to something more useful when come up with a use. 

Wednesday, August 10, 2022

Finished the debugging of the Arduino link on the FPGA side using Vivado simulation


Just as I had yesterday with the unload transaction, I closely examined the behavior of my logic when a simulated SPI link stream gave it a properly formatted transaction with a stream of data to be loaded into RAM. After chasing some issues, I was satisfied that it was performing as designed. 


Sector load
The trace above shows the start of a load transaction, which addresses a given sector on the virtual disk cartridge and then passes in the new content, 321 words of 16 bits, to be loaded into RAM at the addresses assigned to those words in that sector. It accumulates a checkpoint value which is tested against the checkpoint transmitted in the transaction, then returns a flag indicating all error checking was passed. 
One word of the transaction
Above is a smaller section of the transaction, with one word shown being shifted in from the SPI link and stored away in RAM. SPI is always bidirectional, we are shifting out bits of an outgoing word while simultaneously shifting in bits from the remote end to form an inbound word. The load transaction sends words of x0000 to the other end but receives back the contents to be written into RAM, except that during the checksum word, the penultimate word in the transaction, we ship our calculated checksum at the same time that we receive the checksum which will come from the Arduino.

RAM activity during load
Further zooming shows us the action after we have acquired the inbound word, where we trigger this being pushed into the FIFO to the RAM engine, see the signal to the memory controller which write the data into RAM, then see what is pushed into the response FIFO to indicate the memory activity is complete.

I moved on to the unload transaction. This one will read all the words from RAM for a given cylinder, head and sector location, passing them back to the Arduino one by one. My fake memory module which was substituted for the actual memory controller in order to simulate, provided a known sequence of 8 words that repeated to permit me to validate that the logic sent those up to the Arduino correctly.

A sector unload transaction begins
The unload transaction above involves fetching words from memory before each word is exchanged on the SPI link, whereas the load transaction above first grabbed a word and then sent it to memory. The relative timing is visible in the above and the following traces.

A word being unloaded
Narrowing the focus lets us see one word which is fetched from RAM on the left then toggled out on the SPI link. The checksum at the bottom is updated based on this word, giving us a running checksum that covers the entire 321 words of the sector. 

RAM during unload
Our command sequence for the memory access is appropriate for a read, where we pass along the addresses of the 8 byte grouping that are read as a burst by the DDR3 memory controller, then wait for the data valid signal from the controller which indicates we can now grab the contents off the app_rd_data bus. For simplicity of my design, I put the same data in all four words (8 bytes) and pick off just one of the duplicated words on a read. 

I then simulated a sequence of two transactions, one each unload and load, to verify that the state machines restore properly to handle multiple transactions from the Arduino. 

Two transactions serially

At this granularity, you can't see the individual words but the start and end of the successive transactions is evident. This ends my simulation testing of the FPGA unless I discover issues once I am connecting it to the Arduino over SPI and to the disk drive electronics in the IBM 1130. 


The Arduino Mega 2560 has a chip with a primary SPI channel which can be either a master or a slave. However, the chip itself has four serial port channels which are capable of being operated as SPI masters, but not slaves. It is a simple matter to set up the port for this mode and to exchange words using it, except for one minor issue.

The designers of the Arduino board did NOT bring the clock pin of any of those serial channels to any pins on the board. Thus, while the port can be made into an SPI channel it can't be used as the wire can't be connected to the Sclk pin. 

I worked out a way to tack a wire onto the pin on the chip that carries Sclk for one of the serial ports, the hook the other end to a connector pin on the board. In order to do this, I need to disconnect the original signal line for the pin I choose, then tack on the wire from the Sclk signal. I have worked out a candidate but haven't yet implemented this.

When I am next in the shop I will use the microscope to break the sacrificial connection, then tack on the jumper wire to bring Sclk to that external connector pin. This will allow me to use one of the other serial port channels for an SPI link to the FPGA, reserving the primary SPI link for its connection to the SD card shield where I host the virtual cartridge images.

The software to access the SD card shield assumes it is the only device on the SPI link and does not play well if a second device is sharing the channel. This in spite of the use of individual Slave Select lines for each of the multiple devices sharing an SPI channel, since it is the software that gets confused not the hardware. 

Tuesday, August 9, 2022

Arduino link state machine written and in debug


The connection between the FPGA and the Arduino is a four wire SPI (Serial Peripheral Interface) link with the Arduino as the controller. Every transaction is initiated by the Arduino, first turning on a fifth wire that marks a transaction, then sending 325 words. Each word is 16 bits and the Slave Select line of the SPI link is switched on and off for each word. This makes it easy to detect the incoming words.

The link sends a five bit command code plus the cylinder, track and sector address as the first word - the command word. The second word sent is the same word with every bit inverted. This serves as a kind of error check on the transaction. 

Next we send 321 words, one for each word of the sector from 0 to the end. As the words are processed, a running checksum is accumulated. It is simply the exclusive-OR of each of the 16 bits of each word with the corresponding bits of the checksum. The initial value or seed is x1234 to ensure that a completely open link doesn't produce a seemingly valid checksum at the end.

After the last sector word is sent, the checksum is transmitted as the penultimate word of the transaction. If the checksum calculated matches the checksum received over the link, and if the inverted command word was correct, then the pattern xA5A5 is transmitted as the final word, a flag. If any error occurred the flag is set to x0000 which is also the value that would be received on an open SPI link.

All error recovery is the responsibility of the Arduino, generally by resending the same sector down until it is successful. The FPGA won't respond to the drive electronics until a transaction is sent with a command word whose command bits request a switchover to drive mode. Drive mode also allows the drive modeling logic in the FPGA to control RAM access.

This can be rescinded such as when the heads are unloaded at spin down, by sending a different command bit pattern to return control of RAM to the SPI link machine. Initially the system is in SPI link mode as that is how we load a virtual cartridge image into RAM. 


When a transaction word begins, the data to be sent out to the Arduino must already be set up in the output word. The Slave Select is activated then the SPI clock toggles 16 times to shift 16 bits across the link, taking one bit off the output on each clock pulse and sending it out over the MISO line. At the same time, it is grabbing bits from the Arduino off the MOSI line and shifting them into a 16 bit input word. Thus at the end of the word, as Slave Select is turned off, we have exchanged two words.

If one wishes to respond to the last exchange of words based on the contents sent down from the Arduino, the output word must be set up before the Slave Select activates for the next exchange. If it is set up too late we have the wrong bits going up to the Arduino. 

The SPI link operates at 4MHz, therefore it is clocking at 250 ns per clock. The FPGA main logic is switching with a 50ns clock, which does give us a bare minimum of five clock cycles even if the Arduino were blazingly fast reasserting the Slave Select. 


I put together a procedure to switch the SPI MISO, clock and Slave Select lines to shift out one output word. Stringing these together, 325 of them, bounded by the SPI transaction signal I defined, will drive the SPI slave module and the state machine to handle the transaction. 


I started with the Unload command, which sends a cylinder, head and sector address to the FPGA along with 321 words of all zero. The state machine, after validating the command words, will read RAM for each of the word addresses and send out the contents of RAM on the output word for the same exchange - zeros in, RAM content out. 

I could indeed see the logic managing the stream, verifying the command and its inverse, reading and sending out all the words, then transmitting the checksum and a flag word. The flag word will be a given pattern if nothing went wrong, otherwise it is x0000 to indicate an error.

I found a few small issues and fixed them up, moving on to fully validating all the error checking behavior of the link for this command type. I did test the two trivial commands - switch the RAM control to the drive circuits and switch it to the SPI link engine, which worked exactly as designed.


The major work for tomorrow will be debugging the Load command. This is a transfer from the Arduino to the FPGA of 321 words of content, each of which must be written to RAM at the proper address, added to the running checksum, and then at the end we must have a good compare of our calculated checksum with the value sent down by the Arduino in order to give a good flag value indicating success. 

Successfully pushing words back into RAM from the write stream coming from 1130 CPU - in simulation


Using the edges of the 720 KHz write oscillator phases emitted by the drive, my state machine properly scans for pulses during the phase B intervals and uses that to differentiate a 1 data bit from a 0. It shifts those into an accumulated word as it records the number of 1 bits seen in this word. 

The final four bits, the error checking ones, are also recorded into the count of 1 bits so that at the end of the word we can do an error check. The final counter must be b00, thus the count of 1 bits must be evenly divisible by four. If this fails we lock up with a Write Select error setting the drive Not Ready and informing the software on the 1130. 

After the word is accumulated, the data and address must be combined along with a command indication of 'write' to push into the RAM request FIFO. A trigger pulse is emitted which causes the request to push into the FIFO then we wait for the indication from the memory controller that our RAM write is complete. Once that indication arrives, we bump the word address and go back unless we have captured the 321st word. If at any time WriteGate is de-asserted we go back to idle immediately. 


I observed that the signals are properly set up with enough hold time before clock edges, that the control signals are asserted at the correct time and for the appropriate number of cycles and that we wait for the memory controller to inform us when the write has completed to the memory chip. 


My logic for controlling the RAM interlocks on the app_rdy signal from the memory controller. This goes not ready after we request the write, while the controller is putting away the data into the DDR3 chips. Once the physical writing is complete the memory controller will reassert app_rdy. My fake memory controller module is emitting the app_rdy signal to complete the interlocked transaction with the fpga logic. I could see this working properly in the simulation traces.


Once I had the logic working as it should to capture words when the CPU is writing to the disk drive, given perfect data, I set up one of the words with an incorrect set of check bits to verify that the error is detected and the drive is stopped from further writes. It stopped at the end of the word as I expected.


The data patterns I created allow me to see the edge cases of the first and the last data bits of the incoming word. Having patterns where those bits are 1 and others where they are 0 gave me confidence that I the logic captures the word correctly.  The simulation showed this working as intended.

As an additional test, I had the testbench produce more than 321 words of data, allowing me to validate that I ignore the extraneous data. As an acid test on this, I wrote bad error checking bits for the 322nd word, which were ignored - I didn't want to trigger a write select error spuriously that turned off the drive ready status. 


It is legitimate to write less than the full 321 words of a sector and this does occur in software that runs on IBM 1130 systems. When the word count of the XIO Initiate Write command in the CPU is exhausted, the device controller drops WriteGate which should turn off the erase and write head operation. 

My logic has to handle this gracefully, in other words it has to stop looking for 720KHz clock pulses and the write data bits. I set up my simulation testbench with a short count so that I knew this works properly.  

shutdown after word 321


Onward to finishing up the SPI link protocol logic and debugging it from this side. 

Sunday, August 7, 2022

Debugging my way through the capture of writes to the disk drive and proper storage onto the virtual cartridge image


The pattern for a new sector is to have zeroes written for 250 us after the sector mark pulse, one sync word to establish the timing of clock versus data pulses and the start of each word,  then  up to 321 words of 20 bits each. The timing of the clock versus data pulses is handled for me by the drive emitting the two phases of the 720 KHz write oscillator and sending the data bits only during phase B.

However, it is important to spot the sync word as that tells us how to split a long stream of 6, 440 data bits into words. My logic begins watching during the early part of the sector when only clock pulses are written, in other words when the bit cells are all data value 0. As soon as I see a 1 bit written from the CPU, I assume it is the sync word. 

Each word from the IBM 1130 is 16 bits long, written back to front on the disk, with four check bits appended. Thus the sync word appears to be a string of fifteen 0 bits, a 1 bit, then the appropriate four check bit pattern of 1110. When I see the 1 bit, I test that the next three bits are 1 and the last one is 0. If this is true then we have correctly synced up and know that the immediate next bit is word 1, bit 15.

If this is not true, I turn on a signal that is generated by the drive to signal some kind of writing error in the circuity that makes it unsafe to write on the disk. This signal will make the drive go "Not Ready" and signal the error status to the CPU. It can only be reset by spinning the drive down to zero and turning it back on. 

I tested this with a variety of error conditions as well as the proper bit pattern for the sync word and am satisfied that this works properly.

Test catching and validating the sync word

In the simulation run above you can see the two clock phases A and B, the pulses on the Write Data Bit line from the CPU that would be detected and the state machines for disk rotational modeling at the top and the write capture machine below the two clocks. In between the clocks and data bits you can see the bit counter that tracks the number of 1 bits in a word in order to validate the check bit pattern.


Up next is the testing of the logic that accepts 16 bits, 15 down to 0, turning them into a parallel word with the bits 0 to 15 from left to right, then verifies the four check bits of the 20 bit recorded word. Any error in the check bits will be cause for me to turn on the same write unsafe error condition to make the drive Not Ready, as this blocks any further words from being written onto the virtual cartridge. 

I have already set up the testbench with four words of data following the sync word, all properly formatted with the appropriate check bit patterns. It will allow me to verify the operation of the logic to extract words and ask the memory controller to write them to RAM. 

Adding logic to snag words being written to the drive from the IBM 1130


Other uses of the similar mechanism, e.g. Diablo on Xerox Alto or DEC RK-05, have the processor's device controller generate the clock pulses along with the data bit pulses, thus snagging any data written by the CPU requires capturing async incoming pulses and separating data from clock.

Fortunately, since the IBM 1800 and IBM 1130 were the first uses of this mechanism before it was widely licenses to other vendors, they did things a bit differently. Sometimes the newer ways were an improvement, but I am glad for one of the original design choices.

In later versions of the drive, the CPU sent the target cylinder number to the drive which moved as many cylinders as needed in a single operation, while the 1130 implementation of the drive only moves 1 or 2 cylinders per operation. This puts more burden on the device driver software or user application but otherwise has no impact on me.

In later versions, the data to be written was delivered in a single line called Write Data and Clock, which was the raw stream that would be amplified and turned into flux reversals by the write head. If I intercepted such a line I would need to shadow the clock to know which pulses are clock, to be stripped, and which are a 1 data bit and which interval with no pulse is a 0 data bit. 

In the original 1800/1130 version, the drive produces a 720 KHz clock whose normal and inverted state are output as 700KCphaseA and 700KCphase B. The CPU device interface logic uses this to time the emission of data bits in phase B, leaving the drive to generate the clock pulses during phase A. The only bits I see on the Write Data line are data bits that are a 1. If there is no bit during a phase B, I infer a data bit value of 0. 


The sector begins after the sector mark ends with a stream of zero bit cells for 250 us. Since every data bit value is 0, there will be no pulses coming out of the Write Data line during this time, although I will see the 700KCphaseA and 700KCphaseB signals alternating. 

The next word is a sync word, a word with value x8000 whose pattern on disk including check bits is b00000000000000011110 thus when I see the first 1 bit coming out of Write Data I know I am near the end of the sync word. 

I will check to be sure that the next four bits following the one I just detected are 1, 1, 1 and 0 because that proves I am seeing a sync word being written. If not, I will force the Write Error state to turn on which will stop the CPU from writing and block any further operations to the disk. 

Once I consumed that stream of 1, 1, 1, 1, 0 the very next bit cell will be for bit 15 of the first of 321 words in the sector. It is possible that the CPU may write fewer than 321 words, thus I am always prepared to safely cease the capture of written data. I then use the 700KCphaseB as the indicator of each bit cell, counting out 20 per word and then 321 groups of 20 to complete a sector. 


Each occurrence of phase B opens a window where I look for any 1 value coming from Write Data. I latch that in as a data bit value 1, but if the phase ends without such an incoming value, I have detected a data bit value of 0. At the phase A time following, I shift the bit into a shift register to assemble a word. 

The bits on the disk surface start with the low order bit of the data word, bit 15, and continue leftward to bit 0, after which there are four check bits. I have to inject the bits into the bit 0 position and shift right on each cycle to end up with a 16 bit value in the proper orientation for the 1130.

I begin the write request to put that newly captured word into RAM at the address associated with the cylinder, head, sector and word count. I bump the word count and move on to error checking the final four bits.

During the capture of the data I increment a two bit binary counter whenever I have captured a data bit value of 1. As each of the four check bits is captured, it also increments the bit counter. A proper check value causes the counter to end up with the value b00 indicating that our written word has a number of 1 bits that is evenly divisible by four, the error detection algorithm.

If the bit counter is not b00 then I force the Write Error state. This again stops the write from the CPU and blocks the software from further disk access.

When the WriteGate control signal is switched off, we are done writing words. My logic should go back to the idle state where it waits for the next write from the CPU. 


For generating the bit stream during reading, I have to model the actions during the disk rotation very faithfully, deciding where each clock pulse and data plus is placed. I have to generate 250 us of zero data bits, the sync word, then spit out all with words along with clock pulses and the correct four check bits. 

When writing, I don't have the same control over timing and won't be able to predict where the clock pulse is or where the sync word begins with its first 0 data bit. Fortunately I don't have to. I only need to know in sequence that

  • we have Write Gate on and are seeing 700KCphaseA and 700KCphaseB
  • a bit on Write Data while in 700KCphaseB means we are inside the sync word
  • proper verification that the bits I see next are the right check bits 1,1,1,0
  • every 700KCphaseB is a new bit cell of the 20 that make up a word
  • every 20 bit cells I begin a new word
  • when Write Gate is dropped or I finished capturing 321 words we are at the end
  • bad check bits or a sector mark with Write Gate active is an error I inject back
Thus my rotational modeling isn't needed at all except as a simple indication that we moved beyond the fall of the sector mark, easily gleaned from the modeling circuit's output. 

Saturday, August 6, 2022

Locking down the read bit stream performance for the Virtual 2315 Cartridge system


There are a number of state machines involved in generating the read bit streams. One covers the overall sector timing from the sector marker pulse, initial zero stream, sync word and then 321 20 bit data words. Another walks through the timing of the bit cells for each word, spitting out clocks and data pulses at the proper time, counting data bits that are 1 and producing the appropriate final four check bit cells of the word 

In addition there are machines that pull the next ram word and ready it for the machine timing each word. If the word isn't ready in time or arrives too early thus stepping on the value still being output, we would not have proper output. if the word machine doesn't begin at the proper point when each of the 321 words begins, we have distorted initial and final bit cell patterns. 

I worked out means to get this operating together rather than attempting to ensure that the individual word machine cycles perfectly to align with the sector machine defining the start of each word. The individual word machine waits to be triggered by the sector level machine, thus I could adjust the timing to get distortion free transitions. 


From some maintenance documents I uncovered, I found that the pulses produced by the read amplifier for each flux transition was shorter than the pulses I initially designed into my device. I adjusted the VHDL to make my pulses 200 nanoseconds long, which is in line with the scope images shown in the maintenance document. 

They must be long enough to trigger the drive circuits to emit a properly timed pulse out of the data separator. They also must be short enough to avoid false detection of 1 bits. With a bit cell of 1.4 us and a clock separator circuit that generates a 600 ns window to detect the data bit pulse, my original pulse width of 400ns might allow a clock pulse to slop over into the data bit window time. 


The final simulation run showed me ideal results - the bit cells were all exactly 70 cycles long, or 1.4 microseconds, with the clock pulses all in the proper place and pulses for data bits of 1 exactly where they should be. 

I checked that the RAM addresses being requested were consistent with my scheme for storing the virtual cartridge images. I verified that the logic switched properly from sector to sector and wrapped around to zero after a rotation. 

Start of a sector - zeros, sync, words

zoom in on a few words

multiple sectors, sector marks at top

Friday, August 5, 2022

Feeling good about generating a read head stream of pulses but not done yet


I made changes to the fake memory module such that it would produce a known stream of word values that appear to be the data read out of RAM. Eight values were chosen with an eye to validate that words are produced correctly and that the first and last bits of adjacent words are not corrupted due to poor timing of the read results.


I had to find a way to trigger the read of RAM late in the bit generation of a word in order to have the data available for the next word to start its output. Since the last four bit cells are a check word whose value is already calculated after finishing the 16th data bit, I could issue the trigger at this point with no consequence for when the RAM result is latched in. 

That new value from RAM can appear anytime from the start of the first check bit all the way to the beginning of the first data bit of the next word. The first value from RAM has to appear before word 0 starts its generation, thus I found a time during the latter part of the sync word output where I could trigger the read. Since sync is a fixed pattern, it does not matter when the RAM data arrives as long as it is by the time word 0 begins. 


To assist with this study, I emitted a pulse at the start of each bit cell, which should neatly delineate the pulse or pulse pair that are a 0 or a 1 value. It also allows me to visually look for incorrect duration of bit cells. Coupling this with the edges when the word number changes and I had all the tools I needed to verify that my circuit was operating as intended.

I did find a number of suboptimal situations which I am working through. Adjusting here and there. Hopefully I will be happy tomorrow and ready to move on to capturing sector writes. 

Thursday, August 4, 2022

Substituted fake memory module for the real memory interface - speeding debugging


I took the interface design of the memory controller module and used that to build a false memory module. It was easy to stick that into my design, although I did have to drive a few signals and provide the clock that the rest of the system depends upon. 

A clock module was built into the fake memory module to provide the 50MHz main clock for my FPGA logic and I forced the app_rdy signal high to indicate that the controller is ready for commands. I then worked up a delayed response to signal that the data was available from the RAM read, sending the data valid signal four cycles later to ensure that my driving logic is sound.


My VHDL is simulating much more rapidly now that it isn't trying to represent all the gates inside a memory controller. I believe it is at least ten times faster, perhaps as much as 15 times faster, which makes a simulation run bearable as I iterate testing and repairing my logic. 


The high level flow of my logic is that as we begin on the next word being read in a sector, we push a 48 bit request that contains the address in the RAM and a bit that requests a read (16 bits are unused for reads but contain new data on a write). 

The FIFO that has had the request inserted alerts the FIFO emptying process to pull the request, which then triggers the RAM engine. I correctly set up the addresses, the read command byte and the enable flag for the two cycles needed to cause the memory controller to begin a read. Several cycles later the RAM has completed the read and turns on the ram_done flag.

At this point, the response FIFO filling process would package up the word just returned and push it into the queue. The FIFO emptying process will load that content into the active word register which is used to shift out the 16 bit cells plus the four check bit cells. 

I verified up to the activation of ram_done at the proper point. Tomorrow I will watch the response FIFO work and check to see whether the new word is in place fast enough for the generation of the first bit cell.

Triggering the RAM engine to read each word

Quick update on simulating DDR3 memory interface


The models that will accurately simulate the DDR3 memory interface I generated and permit me to verify that my driving logic is correct were built for Verilog based projects. Very approximately you can talk about hardware description languages as being in two camps. The government established VHDL as a standard and toolmakers support this language in order to participate in the funds that flow from defense projects. Various toolmakers promote an alternative, Verilog. 

Xilinx is a toolmaker. Many hardware device makers are also aligned with the Verilog camp. The result is that the only simulation models that can be used for the DDR3 memory in the Xilinx toolchain are based on Verilog. 

The simulation models have a handy parameter - SIM_BYPASS_INITIAL_CAL - which will turn off the modeling of the lengthy process by which the memory controller has to write and check all bits in order to adjust timings for the most reliable operation, as part of its initialization. This process is cumbersome and not practical for simulations, so they provide options such as FAST and NONE to speed up this stage and allow the designer to focus on their interaction with the memory controller and not on how the chips work internally.

This sounds great, but there is no method I can discover, either by myself or on the interwebs, where a VHDL based memory interface can access that parameter to modify it. Were I to have written the project in Verilog and implemented the memory interface in Verilog, I would be merrily debugging. But No! Not for you, lowly VHDL oriented designer.

Time to devise a workaround - either a dummy module to substitute for the memory or some other convoluted hacking of my code - since I really need to see waveforms to be comfortable that my read and write requests are set up properly. They have to happen in the right dance of interlocking signals, occur at the correct timing and with all the necessary setup and hold timing so that the memory will work properly. 

More learning curve - memory interface generator, DDR3 memory and FIFO startup requirements


When I begin my debugging of the RAM access mechanisms that would be driven by the disk modeling to fetch words of cartridge data, I saw that the FIFO is marked full at startup, which blocks me from pushing in any requests for RAM access.

The reset of the FIFO should result in the flag being off so that I can write a request into the queue. I began monitoring all the signals involved in the FIFO and the reset logic, where I saw the flag sitting at full. Some investigation shows that the reset of the FIFO must occur after its clocks are operating in order to properly reset. Perhaps when my power on reset is asserted the clocks weren't running, so I planned for a FIFO reset signal that would blip on for a few cycles well after my reset is complete.


The FIFO reset was generated as I expected but I still didn't see the full flag go off. Tracking the clock produced by the memory interface circuits that is used to empty the request FIFO and feed the response FIFO was illuminating - no clock coming out of the memory interface. 

A bit of reading highlighted the need for the interface to complete calibration before the DDR3 memory is ready for access, said completion to be signaled by a calibration complete signal out of the memory interface. I set up a run to watch for the completion signal and the appearance of my FIFO clock. 


I ran the simulation for almost half a second but never saw the calibration complete nor the clock begin operating. This led me to look closer at the reset for the memory interface generator. It was active low - thus I had to invert my usual power on reset signal to cause the memory to be properly reset. 

Once the memory interface was properly initialized and the FIFOs were ready, the number of changing signals that had to be simulated shot way way up. The speed of simulation slowed to a crawl, from its already pokey rate.  A quick estimate was .2 milliseconds of simulation for each elapsed minute. 

To see the signals I care about I think I need to get to around 25 milliseconds total time in the simulation. That means something in the vicinity of 125 minutes to reach that point. Yikes - the investigations are going to be painful if every small insight requires 2 or more hours to check out once the issue is resolved. 


After waiting several hours I was able to see that at least my front end of the RAM access mechanism is working. I saw the trigger cause the request to be pushed into the FIFO, it was then pulled by the RAM engine and that engine advanced to the next step of its state machine. Unfortunately, it did not proceed any further and that meant we would not generate the ram_done signal to push a response into the second FIFO. 

Looking at the logic it was immediately apparent that the problem was that the memory interface was not activating its interface app_rdy signal to indicate it was ready to accept commands. This was consistent with the initial calibration completion signal remaining off. 

I need to look very closely at how to get the memory to simulate through to a completion of calibration as that is an essential prerequisite to the interface becoming ready for operation. Since each attempt to verify a fix will take more than two hours, I have to be more careful and complete in my studying to avoid wasting huge portions of a day. 

One way I can shorten the wait is to fast track the disk rotational model - right now it starts with a SM, then a SM + IM, after which it takes two more SM before it can begin to request data. Each SM interval is 5 milliseconds, so that burns 20 milliseconds when no useful information is being collected. I will start the testbench for simulation at the index mark, this should cut the delay about in half. 

I will also be looking at a parameter for the memory interface which will produce a fast calibration during simulation - hopefully that will make the memory interface happy. 

Wednesday, August 3, 2022

Bitstream generation and disk modelling verified through simulation


As you remember from prior posts, each bit written on the disk is an approximately 1.4 microsecond interval called a bit cell which is divided into two halves that can record a pulse by switching the magnetic field direction. No switch means nothing is written and nothing is detected by the read head.

The first half provides the clock and the second half has a pulse if the data value is '1', while if it is absent we impute a data value of '0'. There is always a pulse in the clock half of the bit cell, this is how the drive achieves self-clocking.  A long train of words of all zero provides nothing but the clock pulses and trains a data separator to recognize which is the clock half and which is the data half, so that the pulses can be separated and sent out their own clock and data signal lines. 

Thus, we need to have a way of training the data separator and that is achieved by a fixed format for the sector. Each sector begins with a sector marker, a pulse that is 160 us long. Having masked every other physical sector marker, we see only four of the eight and that defines the four sectors following the index marker pulse. 

At the falling edge of the sector marker pulse, we begin writing all zeroes, a clock pulse followed by no pulse in each bit cell. This is produced by the IBM 1130 device controller circuits for 250 microseconds, after which a sync word is written. This is a word whose high order bit is 1 - 0x8000 - with its proper error checking bits. Immediately after this sync word of twenty bit cells is written the device controller commences writing the 321 words that fill this sector. 

Each word of 16 bits from the CPU is augmented with four error checking bits at the end to yield 20 bit cells going onto the disk. The bits stream onto the platter from the low order bit up to the high order bit then the four check bits are written. There is no delimiter between words, we have only a continual stream of bit cells and depend on the device controller logic to break them into words of 20 bit cells and then data words of 16 bits going to the computer. 

From the moment the sector marker pulse trailing edge is seen, we have a stream of about 6,620 bit cells must be divided into 321 words of 20 bits each. The sync word pattern following the stream of about 180 zero bits allows us to know that the very next bit cell is the low order bit of word 1 and that every 20 bit cells thereafter is the low order bit of the next word until we have read or written all 321 words.

A sector is nominally 10 milliseconds long at the 1500 RPM rotation speed of the disk platter, minus the 160 microsecond duration of the sector marker pulse. Our pattern of zeroes, sync word and 321 data words burns up about 9, 268 microseconds of the 9,840 us available on a sector. We need some safety buffer because the disk rotation speed can vary in the real world,  the physical slot that produces the sector marker might be inaccurate, the length of the generated sector marker pulse can vary, plus the oscillators generating the bit cells can vary a bit. 

In an ideal world we could have fit another 20 words into the sector but if the sector marker pulse rises before we have processed the last bits of the sector we generate an overrun error and have to abort the read or write.  


Upon startup, I wait until the platter has rotated to an index marker, which sets up my logic (and sets up the IBM 1130 device controller logic) to treat the next sector marker as the beginning of sector 0. I block any bit cell generation until I have encountered the index marker.

The sector modelling logic is triggered by the sector marker, beginning to write bit cells of data value '0' after the fall of the sector marker pulse and continuing for 250 microseconds. It then writes the 20 bit sync word pattern B00000000000000011110 which is x8000 with the proper error checking bits.

I then write successive words as 20 bit cells, counting the words as I go. Each time a new word starts, a read request is pushed into the FIFO queue for the RAM with the address corresponding the cylinder where the arm is sitting, the head selected, the current sector number, and the word number. 

I use the returned word from the RAM response FIFO to shift out the 16 data bits, bit position 15 first and continuing leftward until we get to bit position 0 at the high order end. As each bit is written, any '1' value bits are added to a running counter. When we have finished with the 16 bits of the data word we produce the appropriate four check bits based on the running counter value. 

As previously mentioned, the purpose of the four final bits is to detect errors. The more obvious way it does this is by sending 1 bits until the total of all 1 valued bit cells is evenly divisible by four. Depending on the number of '1' bits in the data word itself, there can be 0, 1, 2 or 3 additional '1' bits that must be written. The device controller verifies that the full 20 bits read in have '1' bits that are 0 modulo 4 and throws up an error if this isn't true.

The second and less obvious error checking, not implemented in the IBM 1130 but possible due to this error checking scheme is to validate that the last bit written is a '0' value. It must always be '0' because the controller of the IBM 1130 writes either B0000, B1000, B1100 or B1110 to make the 1 bits a multiple of four. Three bits are sufficient to accomplish this, thus the fixed fourth bit of 0 is another kind of error checking and may play a role in ensuring that the data separator remains able to distinguish which pulses are clock and which are a 1 data value. 

At the end of the day, I had simulator runs showing me that the stream of bits were produce exactly to this schema and at a realistic timing that matches what a real world cartridge would produce through the head. I had a signal whose level varied beween +3V and 0 to feed to the disk drive electronics at a specific point. 

The drive spots magnetic flux reversals, converts them to one polarity regardless of the way the flux swings and from that produces a pulse for a reversal. The pulse is turned into a transition of a logic signal from 1 down to 0 for the length of the pulse. Thus, my signal stream is interpreted at this point as a pulse for as long as the level is 0 and absence of pulse all the time is stays up a +3V. 

The duration of my pulses are set to 0.4 microseconds which fits within the 0.7 us bit cell half and enough separation between the timing of the clock and the data pulses to be properly separated. 

There is an esoteric effect on disk drives where the timing of the pulse detected 'shifts' based on surrounding flux reversals. The separator has to accommodate this time shifting without errors. I don't think I have to shift my own pulses but would be prepared to add this in if it becomes necessary.


Sector spanning view
The section above shows more than one sector, so that you see the sector number change and the bottom stream of pulses that are the 6,620 bit cells produced for the sector. The top line is the sector number, the next down is the word counter. The two lines between the pulse stream and the word count are the state machines involved in sector modeling and bit cell generation. 

Beginning of a sector 
I have zoomed in a bit to show you some detail in the state machine and word counter values plus the pulse stream begins to show distinct patterns as the data bit values change. 

Bit cells visible

This final screen shows more detail so that the 20 bit cells of the sync word and bit cells of data words can be discerned. For this testing I produced a fixed word value of x5AA7 for each of the data words, which you can verify by decoding the bit cells and validate the check bits are correct.  


All of the above depends on the RAM having returned the proper word in time for my bit generation circuitry to turn it into bit cells. The request for the word is generated just before we begin writing the clock pulse of the bit cell. We have 35 FPGA cycles for that half of the bit cell and then 8 cycles into the next half before the data value must be present. 

The RAM itself will return the data word in about four FPGA cycles, but we also have to traverse a FIFO for the request and then a FIFO to pass back the answer - these take a few cycles each. On paper I have enough time since the RAM is doing nothing except serving up our word read requests. 

The reason for the FIFOs, by the way, is to deal with the different clocks involved. The DDR3 RAM operates with 100MHz and 200MHz clocks, while my logic is running at 50MHz and that is not exactly in phase with the clocks of the RAM. 

The FIFO is implemented to act as a buffer to accommodate the dual clock domains, one on each side of the FIFO. It will have either zero or one item in the queue at any time, not really storing up a queue of requests. One FIFO from 50MHz to 100MHZ for requests and then a second FIFO for responses from 100MHz back to 50MHz. 

I won't have real data in the RAM during the simulation, but I can validate that all the triggering takes place. My sector modeling logic must trigger a request into the request FIFO before each word, the RAM side must see the request and pull it off successfully, and the data must be set up to read the RAM.

I then have to watch the state machine driving the RAM to see if it seems to toggle the control, address and data lines at the proper times. Assuming that is good, then when I see the signal from RAM that data is valid, I must see the data pushed into the response FIFO and my logic pull it out on the far end to put it into the word buffer for the bit generation circuitry. 

Once I get through all this checking, then if the data is able to be written into the RAM prior to disk operation, I will have some confidence that the disk drive electronics will be seeing the right stream of pulses to turn that into 321 words for the CPU. 

Tuesday, August 2, 2022

Finishing the drive modeling and simulating as I go


I put the logic through its paces with simulation to ensure that I will properly capture the position of the arm (cylinder number) at all times. During the course of this work I discovered a signal I need to capture that I had not built into the original plan. This is +Access Ready which goes low then back high for each arm movement. With that signal I can properly interlock my logic to ensure I capture exactly the right number of moves. 


As the rotational model reaches the time when each word of the 321 words in a sector is complete, it bumps the word counter and that serves as my signal to fetch the next word from RAM so that it can be fed bit by bit to the drive circuits that process the read head signals. 

From the time I see the bump occur to the next word, I emit the 16 bits of the data word, with the low order bit output first and moving sequentially up to the high order bit 0. Each bit of the data word is sent with a 1 pulse during the first 700ns and then the value of the bit as a pulse in the second 700ns interval. Together the 1.4us represents one bit cell on disk as either a pair of bits or a bit plus an absence of a bit. 

In addition to the 16 bits of data I am writing in the 16 bit cells, there are four check bits sent at the end of each word. The error checking scheme is to have the number of 1 bits be evenly divisible by four. Thus, depending on the count of 1 bits in the word itself, we write 0000, 1000, 1100 or 1110 as the last four bit cells. 


Capturing the data involves detecting when the CPU believes it is emitting a clock bit and when it is sending a data bit, so that I can assemble these into a word then push it into RAM. To do this, I realized I may need both clock signals the drive is generating for writing - phase A for the clock bit and phase B for the data bit. I added the phase A which was not in my original plans.  When WriteGate is activated, the drive begins to emit the clock signals and writes the clock pulse and the data 1 bits to the surface immediately. The scheme for encoding bit cells is to have the two phases, A and B. A flip of the magnetic field is always written during the A phase,  but a flip is only done in the B phase if the data value is a 1. If the data value is 0, nothing happens on the disk surface. 

This makes it impossible to recognize when zero bits are being written other than to see the clock bits (A phase) and note the absence of a pulse for data. Having the two clock phases in my circuit ensures I will know for certain what pattern of bits is being written onto the disk surface.

Since, however, the heads are not loaded on the disk surface, we have to capture the bits and update RAM to keep our virtual cartridge reflecting what input-output has taken place. Writing the word to RAM will take place after the 20 bits have been captured coming from the CPU, whereas my reading logic reads the word from RAM before it begins streaming the 20 bits to the CPU. 

This implies that my state machines must issue RAM access requests at different times for read versus write. There is a similar duality with the SPI link between this FPGA and the Arduino - data downloaded from the SD card file to the RAM is captured at the end of a word transaction on SPI, while uploading changed content from the RAM back to the Arduino has to set up the word before the word transaction begins. 

Monday, August 1, 2022

Validation of drive modeling in FPGA by simulation is well underway


In order to inject the proper bits into the read circuitry of the physical drive or to snag written bits to update my virtual cartridge, I have to model the position of the read/write head very precisely. This involves two dimensions of movement - radial and rotational. 

The radial modeling captures which of the 203 cylinder locations the arm might be positioned at. This involves sensing some feedback signals and capturing the control signals which cause the arm to move in or out. 

When the arm is fully retracted to the location of cylinder zero, the home cylinder, there is a microswitch that activates and generates the +Home signal. As the arm moves, it turns off the +Access Ready signal until it is settled into its target cylinder position. 

The drive is given a direction of movement and a step size. The arm can move towards the home cylinder or out towards cylinder 203, depending upon the direction signal. Movement takes place in either two cylinder or one cylinder steps per movement request, controlled by the 10mil/20mil signal. Whenever the Access Go signal is seen, the mechanism moves according to the direction and step size.

My logic tracks the cylinder by accumulating the various movements, such that I should always know where the arm is positioned on the real drive. 

The disk surface is rotating continually under the heads. The disk drive picks up sector mark and index mark pulses to know what part of the circular path is under the head at any instant. My modeling is following the rotation of the real cartridge so that my virtual cartridge can access the data word associated with that spot.

A comparatively trivial part of the modeling is to watch two signals. One, the track signal, selects either the bottom or the top head on the arm since data is recorded on both surfaces of the physical platter in the cartridge. The other is the control that keeps the heads loaded on the physical cartridge and the disk drive ready for access. I don't allow the drive to actually load the heads into contact with the platter but I keep track of the signal. When it switches off, we must stop simulating disk read or write activities. 

There is a complication because the physical cartridge has eight sector marks evenly distributed around the platter, each providing an approximately 160 us pulse. In addition, there is one index mark that occurs shortly after one of the sector marks, which defines the 'start' of the track. The IBM 1130 instead treats the disk as having four equally spaced sector marks, which it accomplishes by blocking every other sector mark during operation. 

My circuitry sees the eight SM and one IM signals, at all times generating the sector number where the head is currently passing. This is used as part of the RAM addressing for the current word, which is an amalgam of the cylinder number, top or bottom head (track), sector number and then which of the 321 words in the sector is under the head. 

A related part of the circuitry has to model the sector in detail. As each sector begins, at the falling edge of the SM pulse, we will have 250 us of zero bits, a specific pattern which is the sync word, then 321 words of 20 bits before we pad the remainder of the sector with zero bits again. My circuit times from the sector mark and moves a state machine through the stages - zero bits, sync word, the 321 data words and final zeros. It generates the word number at the proper time to match what the real cartridge would have passing under the head


The head (track) selection and head loading signals worked properly. I moved on to the rotational modeling, which is now clearly working as it should. I see the stages in a sector occur at the proper time and the word number increment every 28 us which is a valid timing for a 20 bit word. The entire sector of 321 words fits nicely in the sector with a bit of zero word padding at the end. 

The sector numbers are generated correctly and synchronize properly based on the first index mark pulse received. The programmer issues an IO command which includes a two bit sector number. The drive controller logic in the CPU watches the SM and IM pulses to track the sector coming up, beginning input output operations at the sector mark after an equal compare of target and actual sector numbers. 

The cylinder movement engine is still getting some tweaking, to ensure that I properly interlock to always count the same number of steps as the real arm traverses. There are subtleties - behavior when the arm attempts to move inward past cylinder 202 or outward past the home cylinder. The mechanism reports a successful movement but in fact the arm stops either at home (0) or 202. The +Home signal helps detect this on the cylinder 0 side, but no feedback exists to indicate this for movement attempts beyond 202. 

I should be able to finalize the operation of the arm movement modeling tomorrow and move on to the coding and testing of the bit generation itself - the delivery of the 16 bits of a data word in the proper order and the subsequent four check bits that provide error checking. These have to be timed to coincide with the data separator in the disk drive.

The separator will entrain on the long sequence of zero bits at the start of the sector - a word of all zeros consists of alternating 1 and 0 bits which are the clock and data respectively. This is cemented by the sync word pattern so that the drive is now diving each bit cell into two phases - A and B - that represent the clock bit and the data bit. 

When phase B is active we either inject a bit if the data value is 1 or stay inactive to assert that the bit is 0. Thus, my logic watches the 700KC B signal to time when it produces the raw data stream. We always inject a 1 bit when in the midst of phase A then selectively inject the bit in B based on the data value.