Saturday, December 10, 2016

Still investigating the read error cases, plus working on the new PCBs


I am still pondering the best way to handle the data separation for the Diablo drive. Yesterday I was looking at using a phase locked loop, but am now considering other means as alternatives. The point I am set on is the need to reverse the disk drive's separation and then accomplish this myself in the fpga.
Data Separator circuit inside Diablo drive
The circuit above accomplishes data separation from the signal coming off the disk surface. In other circuitry, not shown here, the flux reversals detected by the disk head are converted into a positive going pulse of fixed duration, somewhere between 50 and 100 ns in duration. Thus, each reversal delivers on pulse to the circuitry above.

We will discuss its behavior proceeding from a point in time just before the arrival of a pulse that represents a clock. The routing flip-flop in the center is initially reset, thus the incoming pulse is routed to the lower right gates to become a ReadClock pulse.

The pulse that is sent as ReadClock will also serve as the clock to the routing flip-flop, which has its D input tied high so when the clock pulse arrives, the flip-flop sets to on. This will continue forever until the timer circuits to the left side switch off, this causing the routing flip-flop to again reset so that new pulses again go out on the ReadClock line.

However, while the routing flip-flop is set and has not yet been reset by the timer expiration, any pulse coming from the heads is routed out the upper gates as ReadData. Thus, from the arrival of a clock pulse until some time duration has elapsed, subsquent pulses are routed as data, but after the time elapses they again are processed as clock pulses.

The timer circuits on the left consist of two different timers, both are started by a ReadClock pulse, but lasting a short or a long duration. Which of the two timers is used depends on the leftmost flipflop, which is set to on when a ReadData pulse is delivered, but reset when the next ReadClock pulse arrives.

Thus, the default state once the ReadClock pulse arrives is that this is a zero bit, enabling the long timer to be used. This enables the long timer chip to produce a negative going output pulse for the duration of its time, kicked off when ReadClock line goes on.

However, when a data pulse arrives during the time interval, it sets the logic to instead use the short timer chip whose negative going output pulse occurs earlier after the clock. This occurs by setting the 'one bit' flipflop on the left. It is not reset until the next clock pulse ends, thus the timer started is based on the prior value at clock rising edge, before the falling edge resets this flipflop.

Our routing flipflop in the center is switched off by either of the timer chips emitting the negative going pulse when it times out, as well as when a ReadData pulse is emitted. It stops looking once it finds a 1 bit transition pulse, or when the timers run out.

When two transitions are close to each other, they seem to be shifted in time relative to each other, compared to the actual flux reversal written onto the media. Thus, this bit shift or pulse shift phenomenon is compensated for by the left side circuitry, If a 1 bit was detected, the routing resets earlier to look for a clock pulse.  

The fault I am seeing is the arrival of the subsequent clock pulse early enough that it is routed as a data bit, not a clock bit. This is a fault of a timer that is too long. It happens sporadically and rarely, but causes a checksum validation error since the clock pulse is lost and the deserialization becomes confused.

If the timers are shortened further, there is the risk that a data pulse will appear as a spurious clock pulse, the inverse error to the one I saw. Since Xerox chose to operate the Diablo disk drives outside of their spec, which intends for a 660 ns bit cell time, timing becomes more critical.

Our big enemy is jitter - not just the pulse shifting that is inherent with magnetic reading, but jitter from other sources. The result is that the arrival timing of clock and data pulses is shifting about often, thus solutions such as fixed timing or a PLL are not perfect.

The rotation rate of the disk can vary as much as 1% per specifications, which in practice means that arrival times can be offset by up to 2% in the worst case where the writing drive is 1% off in one sense and the reading drive varies by 1% in the opposite sense. This can give us a jitter from 0 to 12 ns.

The oscillator used to write the disk is an imperfect circuit, thus its nominal rate of 3.33 Mhz can deviate a bit leading to jitter in the pulse timing of a similar amount. This might be as much as 30 ns jitter, although the rate of change is low so the impact won't affect single bit cells.

The clock rate recorded on the surface is totally determined by the writing system, in this case an Alto which wrote to the cartridge many years ago. Variations of the clock frequency are locked into the recorded signal. The receiving system (my fpga) uses the recorded clock not its own, thus it can't contribute any significant jitter.

Still, we have a drive engineered for a 660 ns bit cell operating at 600 ns with jitter than can run to about 45 ns if the worst case of clock rate and drive speed variations were to occur. This gives us a bit cell that could be 555 ns at worst case. Since each pulse produced, for clock or data is 50 to 150 ns in width, we have a window of approximately 405 to 495 to look for the data pulse.

The data separator in the Diablo uses 440 or 460 ns for long or short, which should just squeak by in the worst case. That worst case is a compressed 555 ns bit cell, with the prior cell containing a data value of 1 so that pulse shifting occurs.

Looking at the pulses separated during the anomalous period, I see a set of ReadClock pulses 600 ns apart, then the next clock pulse follows in just 300 ns. The following pulse, 600 ns later, is routed as a ReadData. 600 ns after that and every 600 following we see ReadClock pulses.

The odd issue with that sequence is that a pulse arriving 300 ns after the last clock would be a data pulse, but if it were then the clock following it is missing because the gap is again 600 ns. Something has interrupted the pulse train, chopping 300 ns out of it. I don't understand how this happened.


I received most of the components on hand that will be used to build the driver and emulator boards, once the PCBs come in. The remaining components arrive on Monday, whereas the PCBs will take another week or so to show up.

I do have to prepare the cable for the emulator role, inserting the 40 pin connector on the cable half that runs to the Alto driver board. This has the metal ground plane bonded to the cable, just as the disk side of the cable had, and must be carefully peeled back in order to install the connector.

No comments:

Post a Comment