Thursday, November 3, 2016

Inconclusive investigation of one sector that receives checksum errors

ALTO DISK TOOL

Today I am going to dig deeper into sectors that had checksum errors while being read, to see if I can spot anything in the signal which preserves the real information even though it foils correct reading. Potentially, there are conditions which I could recognize and improve the successful reading rate, but it all depends on why these sectors fail.

I set up the logic analyzer to trigger the oscilloscope at the start of reading a sector, employing various delays to look at different portions of the incoming signal. The scope can look at the output of the differential amplifer, which should show the actual magnetic flux reversals on the disk surface. It can also show the separated data bits.

I picked on of the sectors and triggered the scope to capture what was occurring, both as ReadData bits emitted from the separator and as raw transitions from the disk surface. I found a spot in the Label record where I could see a bit that appeared or didn't over multiple reads.

I moved the scope to focus more on this time and verified that the bit appears or is hidden on different passes. It is consistently that one bit in this area that pops up or does not.

Region when bit is not detected
Same region, extra 1 bit shows up

I then looked at the raw transitions relating to this and began to study them for signs of some anomaly. Note that it is the Diablo hardware which is responsible for this bit appearing or hiding, not my logic. The signals above are sampled from the incoming ReadData line.

Anomalous region where the random bit appears


A region with all zero data bits correponds to a flux transition every 600 ns, as show below. The spacing is nice and regular as are the voltage swings.

Waveform during string of 0 data bits
Next we look at a region with some 1 bits mixed in, to see what the waveforms look like. I have included a diagram of what we should see as input and the effect it has on recovered data bits.

Some 1 bits, adding transition between the 600 ns clock swings
What mixed waveforsm should look like and bit recovery

Repeating our anomalous region to compare to the waveforms, we interpret it based on the midpoint of each transition. Using the centerline as time 0, the bit cell with a data value of 0 ends with the transition at 250ns. The cell with a 1 data value has the extra transition occuring at 550 ns and is completed at 850ns.

Repeat of suspect signal

We see another transition at 1150 ns, which is the timing for another 1 bit. The end of that bit cell is at 1500 ns, close to when it should occur. The next transition is a 2150 ns, consistent with a bit cell containing a 0 data bit. .

I have to conclude that this is not an anomaly at all, but a pair of 1 bits. I was catching the wrong part of the signal on the scope. I had to retest and shift round the scope to find the location with the random bit value. Somewhere to the right of 561.9 us where this picture took place. 

The captured spot is a clean 1 bit, so I went back to watching the output of ReadData to see if I can spot it failing to output the 1 value.

I did see that this sector will sporadically get a Header record checksum failure, often enough to watch for differences in the delivered bits on ReadData. This is easy because one of the two words is 0000, so all I have to watch is the other header word and the checksum.

What I found was zero difference between times when the read was successful and times when it failed, at least as far as I could see. Below are the four scope images, two pairs of successful and unsuccessful reads of the header word or checksum.

first set, first photo

First set, second photo
Second set, first photo


Second set, second photo

This is especially confusing, meaning it might be a race hazard or other vulnerability in my logic that is decoding these improperly. One more thing to check - I need to watch a long stretch before this header record begins to be sure that there are no spurious 1 bits sneaking in to cause the checksum failure.

Nothing conclusive yet, but I will keep investigating in the hope that I find something which makes reading even more reliable. 

No comments:

Post a Comment