My initial testing showed much of the new logic seemed okay, but the critical part that recognizes whether a bit call has ended and if it had 1 or 0 inside, does not appear to be working properly.
Fortunately, I instrumented quite a bit of the new logic so this means some painstaking stepping through the logic analyzer traces while examining my state machines. The goal is to find behavior that doesn't reflect what I expect to see, then look at the logic to find the flaw.
The logic for the fuzzy recognition of ReadData and ReadClock worked - three in a row got these machines to one of the two end states, while a sequence of erratic readings are ignored. The logic to recognize the sync word worked properly.
The fourth machine is the critical one that will delineate bit cells, remember whether ReadData reached the 1 state during the cell, output the value of the bit cell as 1 or 0, and trigger the deserializer to put that bit value into the shift register.
The fourth machine is not working correctly. It is stuck forever with a bit value of 1 once it finds any bitcell with a 1 inside. Added correct reset logic to set the bitvalue back to 0 then watching for a ReadData of 1 to flip bitvalue back on. Back to testing.
I now find the logic is not starting properly at sync, loading an incorrect initial bit of 0 when we are just at the end of the sync word bitcell and not at the end of the first real word bitcell.
The solution is to extend the synchronization state machine - which had turned on the synced state after seeing a ReadData value of 1 and then the ReadClock pulse - but now will wait until ReadClock goes back to off before flipping on sync. Actually, I am using the fuzzy machines that handle ReadData and ReadClock.
Testing this refinement produced a big improvement in reading accuracy. I only encounted checksum errors on 140 sectors, a rate of 2.87%. I slashed the rate of errors in half with my redesigned state machines. Now that some of the sectors that gave me pretty consistent errors are reading clean every time,
Next, I will investigate any sectors that still get errors, to see if I can find anything on them that could explain the error and possibly let me further reduce the error rates.
As an experiment, I ran the ReadEntireCartridge transaction with autoretry enabled, to see what rate of errors occur in that case. The rate dropped to 1.6%, or 78 sectors with checum errors out of 4,872.
Tomorrow I will go after the sectors with the errors, with and without retry, trying to dig into what is occurring in those cases. I will keep digging until I can find no more ways to improve the logic or compensate for errors or mitigate errors.