I spent the day poking at various locations, observing the signals and trying to find a cause for intermittent checksum failures at these sectors. Nothing jumped out at me but I will keep looking for a clue that is reproducible and can be clearly tracked to a cause.
My working hypothesis is shifting of the clock timing. There is a specific point where the clock pulses don't arrive 600 ns apart, right before two data bits that sometime clock in as 1100 and sometimes as 0110 meaning I am skipping a clock cell ahead sometimes. The later pattern, 0110, when it sees an extra 0 value that doesn't exist, is when the checksum error occurs.
|Sometimes this is captured as 01101100 and sometimes 01100110|
|Clock bits where this happens are irregularly spaced|
Therefore my logic to recognize clock pulses, set up the data bit values, and push them into the deserializer has some weakness that is triggered by the slightly deformed timing of clock pulses. I will have to stare at the state machine, change the data being emitted to the logic analyzer, and test some more to spot where it goes awry.
Looking at the two sides of the differential amplifier decoding the head signals,we see the following:
|one side of differential amplifer|
|other side of differential amplifer|
If somehow the decision to count extra bit cells is due to a weak process in my fpga logic, then I could improve reading quality by redesigning this part of the hardware. I will look into a different way to handle the incoming ReadClock and ReadData pulses that might avoid this problem.