Thursday, October 27, 2016

Debugging write sector function, almost done but timing is a bit too long to fit in sector

ALTO DISK TOOL

I set up the first combination of signals to watch on the logic analyzer to get to the bottom of this misloading of the serializer from RAM. I also modified the WriteField logic to add a step to more cleanly load the serializer after memory access has finished. I will see if this is unnecessary during the test run.

First morning run, I captured the data and looked it over carefully. The change I made fixed the issue with reading from RAM and serializing. The output stream of transitions reflected the data put into the serializer (sync word, two header record words, checksum word). However, the WriteField FSM is stalling in the checksum writing state.

I tweaked a few signals and the FSM, but I am not exactly sure what is causing the problem. It may be a race hazard once again, this one between the serializer and this FSM. The serializer emits a signal getnewword for one cycle, after which the  FSM should issue loadword for one cycle duration, a few cycles later, to cause the serializer to pick up the new word to emit.

If we are missing the getnewword signal for some reason, or the serializer missed a loadword signal, we will stall forever because we have to load the serializer in order to get the getnewword signal when it shifts out the 16th bit.

I did a test run while my tweaks were synthesizing, hoping to spot the two signals and their timing. I know that the checksum was loaded, as I saw that on the analyzer last time, but I didn't look to see whether the zero word for the postamble was properly loaded.

This showed me a successful loadword to get the checksum into the serializer, then later the getnewword that should trigger the postamble phase. However, the WriteField FSM did not move out of the checksum step, while it should have.

What I should see is the checksum step, the load of the checksum value, the bits shifted out and then the getnewword signal which is supposed to trigger a move to the postamble step of the FSM. I see the getnewword but it never steps. We stay in checksum, loading words of zero in an infinite postamble.

I pored over the logic for the WriteField FSM to see if I could find a way it would malfunction as it has. I am left with race hazard as the only conclusion. In the checksum step, I see the word to load being set to zero, right after the getnewword signal is received. However, in the same logic group that sets the word to zero, it also should emit loadword but it doesn't, nor does it move on to the postamble step.

I now have verified operation from the request to write a sector all the way to the correct emission of the checksum word at the end of the header record (first field). What I need at this point is to get the postamble of five words writing properly and the WriteField FSM to go back to idle.

In the late morning, I had to cease work in order to get over to the CHM for the 1401 team meeting, but resumed work in the early evening and set up a clean new step in the WriteField FSM, between checksum and postamble, where I set up the zero word and load it.

With that change processed into a bitstream, I was ready to test again. I now found myself through the header and label records, apparently fine, and chugging through the data record of 256 words when it stalled in the postamble. The sector mark appeared and reset the WriteSector FSM but the WriteField remained stuck at postamble.

I was running various tests to look at parts of the sector and the exact behavior at the end of the data record, wanting to see it write the checksum and count through the postamble. However, at this point, the drive powered down by itself.

I touched the external power supply which provides the drive with its +15 and -15 levels, at which point it turned back on. I spun it up and tried for another test, but the trigger condition wasn't quite right. When I tried to cycle again for a new test, the drive went down and stayed down.

I will have to inspect the power being delivered by the external supply, to be sure it is good, in oder to decide whether my problem is in the supply or the drive. Before I do that, I will take a quick look over the WriteSector logic to see whether I can see any reason that the data record might stall when the first two records completed just fine.

One way this can go awry is if the total time to write out the sector is longer than the time between sector marks. As a safety measure, my logic will shut down the WriteSector logic and turn off the WriteGate when the following SM is detected.

If my logic for the sector was still in the process of writing the trailing five words of zero when this happened, then the overall process is taking too long. Right now I can't distinguish between this case, long but working properly, and the other case where the data record postamble is stalled.

The power supply appeared good by the time I tested it, and the disk began to spin up fine. I waited to put my latest diagnostic trace version of the fpga bitstream into a test, then went back to testing. What I discovered was that indeed I am taking too long to write out the sector, bumping into the next sector mark and resetting my WriteSector FSM.

It is time for me to go back to the idealized timeline and compare what I am writing, to figure out where I am overshooting or to trim some time off. 

No comments:

Post a Comment