Debugging the virtual 1442 functionality
I chose a lunchtime test where I would single-step the diagnostic code and see what is returned by my adapter. Checking that against the diagnostic listing, I should be able to see where my adapter behavior diverges from what is expected.
I walked through card 7 of the one card diagnostic deck, which runs with interrupts disabled and tests the ability to boot and read from the 1442. I found that my python code gets in trouble if booting a 1 card deck since I don't gracefully handle the resulting empty file. I also see what is going wrong - my busy and busy/not ready bits (14 and 15) should be set in the DSW but are not.
I looked into the fpga logic to see where the bad DSW contents are occurring, then looked at the Python code to fix the vulnerability. Perhaps I have a problem with the setup and hold of the DSW bits, similar to what I had a week or two ago. In that case, as well, the lowest bits were not getting into memory.
I did fix up the Python program but it wasn't clear what was wrong with the DSW. I tightened up logic around the busy condition and used the LEDs for various diagnostic conditions that should be lit when I am stepping through the one card diagnostic #7.
I had time for one last test before going to bed. Good news indeed. My changes fixed the DSW issue. I booted up a card deck that had a boot card, the diagnostic monitor and the 1442 functional test program.
I hit the Boot button on my GUI, it loaded core with the boot card, I reset and pushed start and lights flashed and flashed. I saw my input deck finish and close automatically, then the typewriter began writing away with the diagnostic messages!
I ran again with the output file opened and saw all the cards copied through as they were read. I had one small flaw in my Python when a program tries to read past the end of the input file - a race hazard I suspect. Something that is not a major issue -I can fix it tomorrow.
Resilient high speed SPI link with restart
A design requirement for the high speed link is that I detect errors or lack of sync between the two sides and reliably restart the link. There needed to be a way to interlock the sides, ensuring they restart in the right sequence.
Two elements help in this quest. The master spi module has a reset input, which will force it to the initial state once reset is lifted. The master, when reset, is not selecting the slave. The slave module, when it is not selected, returns to the initial state.
I have four ways to detect a problem on the link. The input process on each side has a timer set which ensures I know that something has stalled. The CRC and hamming capability of each side can detect an uncorrected parity error.Single bit errors are corrected automatically and do not cause an error because of the SECDED (single error correcting, double error detecting) scheme built into the link.
My design challenge is to take the four cases of which mechanism detects the problem first and decide what I need to do in each case so the the two spi modules, the two input and the two output processes reset and release in the proper order.
Because of the timeout test in the master, the slave can signal problems by freezing itself (blocking the slave select line), which the master will see and react to. Thus, whether it is an uncorrectable error or a timeout of the input, the slave forces itself to reset and wait for the slave select to go active upon restart. The slave will react to the dropping of the slave select line by resetting the module, input and output processes.
The master will reset its module, input and output processes, which drop the slave select to sync the other side, then wait long enough before restarting so that the slave will be ready. The master can react as soon as it detects an error and that will cause the slave to promptly go into reset. The case of a slave detected error requires that the time for a master detected timeout must elapse, to be sure that the slave has stopped talking to the master.
A key design question is the length of the timeout test. Too long and it will introduce unacceptable delays in the high speed link that could jeopardize the behavior of higher speed physical peripherals attached to slave boards (e.g. disk drives). It should be enough longer than a real transaction to ensure we reliably catch a stall, but not much more than that.
SPI protocols add very few clock cycles surrounding an interchange, so the minimum timeout is just a bit longer than the packet size of 128 bits. The SPI clock is running slower than the FPGA clock by a ratio of 4 to 1, thus we need somewhere above 512 clocks to detect a timeout. I selected 556 clocks which is 11.6 microseconds when running on the current ztex board as master (the master generates the clock).
I will accumulate a restart count in the slave and display it on the seven segment displays since my first slave board is a Nexys2 that can display four hex digits visually. That will give me a way to quickly view the quality of the link over time as I test.
My changes for the error recovery were made to both slave and master fpga boards and that would be an easy thing to test when I got out at noon. Unfortunately, work was too heavy and I did not get out for testing until the evening. The LED segments are locked on 0000 but I don't have direct evidence that the link is operational. Some diagnostic LEDs on the slave board will reflect the 1130 state and therefore should change if the link is running.
When I got out for the last test tonight, the diagnostics showed me that something is not working properly on my modified high speed link. The data pump is not delivering any live data to the slave. It will only change the signals if a transaction completes and the hamming and CRC checks are passed. I could be timing out too early or it could be some other design flaw.
Implementing physical plotter (1627 equivalent)
The slave FPGA changes to support the 1627 were completed including the new Not Ready signal that will tell the master whether the plotter interface box is plugged into the slave board or not. The plotter is set to execute 3,000 moves per minute (50 per second) which I hope the Strobe 100 is capable of handling. A real 1627 runs four times faster.
Implementing physical paper tape reader/punch (1134 and 1055 equivalents)
I am still working on the slave side logic in light of my division of labor between master and slave. The new design with the interlocked transactions means that if the link drops while a punched character was sent, the paper tape punch will only punch a single instance and wait. If I had put the drive signal on the master side, it could have kept the reader or punch moving for as long as the link was down.
I need to study the schematics of the reader and punch in order to design any necessary interface hardware - certainly the solenoid drivers to advance the devices needs this, and probably the punch output signals, but perhaps the contacts for sensing the input needs something.
The design of the paper tape reader does not connect the eight channel contacts to the common bus while the reader is at rest. Instead, the contacts are enabled partway through the read cycle. To support this, my logic has to monitor and latch up the contacts sometime during the movement.
The user starts a read by issuing an XIO Control which clutches in the motor solenoid on the reader. My process to drive this (in the slave) holds the clutch engaged for 68 ms. I can probably trim this down to make sure I don't accidentally take a second cycle. The 68ms is the delay before the response goes back to the master fpga and becomes an operation complete interrupt.
As I start driving the motor of the reader, I reset the register holding the eight channels of data. Then, during the entire movement time, I continually OR the eight channels with the value in the register. Thus, any hole will at some point during the movement turn into a 1 in that channel of the register. Any channel without a hole will stay at 0. This register is the value that is sent back to the master fpga.
An XIO Read from the user just fetches the current value of the register that was copied over. The user issues XIO Control and XIO Read in alternation to move, latch up the register and then store the register contents into core.
Nice - always happy when you finally register a success!
ReplyDeleteWill you be giving tours during VCF?
I am leaning toward bringing the 1130 system and SAC Interface box to VCF West, although the logistics are daunting. Certainly available for in-situ tours anytime someone is visiting, whether during VCF or otherwise.
ReplyDelete