Sunday, October 30, 2016

Continuing work on Write Sector, now questioning operation of disk drive circuitry

ALTO DISK TOOL

I completed the reinstrumentation to watch the behavior of the WriteSector and to ensure that the transition stream matches what I expect to see emitted. I will also watch at the terminator side of the cable to be sure I see the same data arriving at the drive. 

Watching the outgoing stream at the terminator gives me nice clean data matching my sector data exactly. I then switched the probe to watch the ReadData line coming it, during the write, to see if it appears to be the same data we sent out.

It does appear to align during the WriteSector operation, although I saw a few funny glitches near where I had written one bits. My WriteData&Clock signal looked very clean at the terminator, with no ringing. I zoomed in on the period with the '1' bit transition to see if I could see any signal issues that might cause problems. 

Sicne that looked clean, I next watched the ReadGate signal to trigger the scope and watched the ReadData input for the sector I had just written. I saw random bits showing up at times when I had not written anything - different from the ReadData line monitored during writing. 

I also looked at ReadClock and saw periods without a clock pulse. I set the scope to a point where I was seeing that dropout of the clock pulse (one time in the sequence of clock pulses), and then monitored ReadData. I was seeing quite a bit of spurious detected 1 bits surrounding the dropout. 

I am definitely going to have to move inside the Diablo during this investigation. I need to see what comes off the heads before the data and clock separator, since the separation may be malfunctioning. I also need to monitor the erase signal to be sure that the erase head is really turned on.

If the erase head is not active, I am just layering transitions atop existing transitions, which would give erratic clock pulses and bad data bit recovery. If it is active, then I have a different insidious problem. Time to dig into the schematics and finalize test points. 

I found a resistor that should show voltage drop as the erase current flows through one of the heads. I also found four test points that together will give me a view of what is coming off the heads and heading into the separator circuit.

My first test, to watch the erase current, showed that it indeed activates when WriteGate is asserted. I then switched the probe to the test point showing me the raw transitions coming in off the head while reading. I triggered on ReadGate, the start of reading sector 0 which I had rewritten, 

I saw low voltage, basically noise like signals coming from the head, In order to compare to a good sector, I moved the arm to Cylinder 1 and did the same capture. Quite different - now I had very wide swings consistent with the flux transitions of clock pulses. 

I looked through the schematics to see if I could spot where the problem occurred, I realized that if the erase current were off, or if the write selection current wasn't working, the drive would detect a WriteCheck condition and alert me. It hadn't. 

Heads and related circutis, top view
Select current to read/write on head, plus erase current drivers, middle of schematic
This moved me to look at the logic driving the signal on head bus A and B, which is what will actually produce the flux transition. As I looked along the left of the top schematic, I spotted a D flipflop and I could hear a huge "DOH" echoing in my brain. 

The flipflop takes pulses from the WriteData&Clock line, alternating the state of the A and B bus levels on each one. It uses the incoming pulse from the terminator to flip the state. Therefore, it is causing the transitions and I should not be doing that in my logic. Rather, I should be emitting simple pulses to cause the transitions to happen. Doh.

I had misread the spec and overly complicated my driver logic. All I had to do was emit a 100ns pulse whenever a transition should occur, not reverse directions of the output line. That would explain why my logic was working as I intended, toggling the driver line, but it wasn't writing intelligible data on the drive. 

I dove into the fpga logic and converted the output to what it should be, pulses at the proper times. With this changed and the bitstream generated by midafternoon, I tested again, first checking carefully on logic analyzer to see that the logic is accomplishing my newly corrected intentions. 

With the timing module working properly, I appear to be emitting the proper signals but still don't get a good read. Further, when I trigger on ReadGate and look at the raw signal from the heads on testpoint 1, it still looks like noise while other cylinders have large clear swings.

The manual asserts that a head which is not selected will have its center tap sit around -1V - which I confirm - and a selected head with WriteGate on will sit at +14V - also confirmed. However, a selected head with ReadGate (or absence of WriteGate) is claimed to sit at about +1V but I instead see the noise coming in from the head, or a similar noise pattern at much lower level, closer to ground. 

I still believe there is something wrong with the write circuitry in the disk drive, since I never leave any transitions on the surface that will be picked up by the head. I have the +14V power level and the erase function certainly seems to function, but I don't see magnetization of the domains on the platter.

I will have to hook the scope up to the two head bus lines, trigger on WriteGate and watch to see whether I am swinging the current inversely on the two lines. That would tell me that I ought to be able to write a flux reversal at that point. 

I did find a flaw in my serializer triggered by the change in my output pulse logic, requiring a quick fix but slow pass to create the new bitstream. Once it was ready, I tested again. Now I was producing exactly the pulse stream I wanted, although the outcome on the disk was still not perfect.

I am still not sure what is happening and need more investigation, probing of the drive circuitry and other testing. For example, the scope on the two head bus lines will tell me if I am truly writing or not. 

Friday, October 28, 2016

Debugging write operation to disk drive

ALTO DISK TOOL

I had to drive over and swap cars with my daugther, hers has problems. After lunch, I got back to the project. I continued to map out test points to figure out why the WriteSector, which looks so good on the logic analyzer, failed to write on the actual disk (or wrote something so bad it remains unreadable).

The easiest spot to probe is the terminator, using micrograbbers on the resistors, but I had to have an accurate map before I could be sure I caught the right signals.

Terminator with points to capture connector signals
The terminator I used for the picture is not the same one used for either the prototype or production driver board. The positions are the same but there are more or less resistors on the other terminators.

I fired up the system with the scope capturing the WriteGate and WriteData&Clock signals, freezing when WriteGate first goes active. I see the write commanded and the transitions begin occurring, but the results when reading are still checksum errors on all three records in sector 0.

At this point there are two possibilities I see - the writing is not taking place due to some failure inside the Diablo drive, or I am writing something but what is read back isn't valid format. To decide between these, I will set up the logic analyzer to capture the incoming record on a read and will look to see what appears.

I had to reinstrument the signals output to the logic analyzer, plus I tweaked the startup of the transition FSM to give a 2 us delay from asserting WriteGate before I began issuing the clock and data transitions. This matches a recommendation in the Diablo manual. There is also setup time changing the logic analyzer configuration to match the emitted signals.

When the bitstream was ready I tested again. The goal here was to do a ReadSector and see what comes in from the sector that I just wrote (I think). If the data is like the last captured garbage, then I am not writing on the surface at all. If the data looks close to what I (attempted) to write, then I can debug how I write to get a clean read.

My test didn't meet my objectives because of a flaw in the instrumentation that meant I didn't have a good way to trigger at the beginning of a sector read, however I could do some other tests with the oscilloscope while executing WriteSector.

I stuck one scope probe on the read data line coming in from the drive and the other was triggering when the WriteGate was asserted. What I should see is an absence of data pulses after I began, since I would be writing almost 337 us of zero bits before emiting a sync word. Instead, the data bits continued to stream in even after my write began.

Since ReadGate is hardwired on, it is possible that the mechanism won't sync up until it sees a long stream of zeros, but once I wrote my preamble one time, that should have existed. Still, it could be a sync issue due to the permanent ReadGate. I don't have signal wires in the main cable to drive the ReadGate pin, but I think I can engineer up a twisted pair to add this to my test setup.

Another issue, however, is definitely a flaw in my logic that I have to chase down. Even though I tracked the bits being emitted yesterday and it was all good, what I saw from the WriteData&Clock line when I wrote was an excellent pattern of zeroes and the sync word at the magic time, but then I saw the wrong data bits following - not the two words of 0000 and the checksum of 0151, but something else.

I did make the seemingly minor change to add the 2 us delay before I began emitting clock pulses and asking for data bits. I don't see how that could cause a problem with the data being produced, but somehow the data being fetched is wrong.

I resynthesized to correct the instrumentation error, so I could read what was coming in from the sector. I might be writing wrong data with bad checksums, although it is equally likely that I am not truly writing. We shall see.

The addition of the ReadGate wiring to the fpga will involve wiring up a level shifter, removing the jumper on the terminator and wiring in the twisted pair I cut to match the length of the exiting cable. This will take careful work which I will do tomorrow. Instead,I will focus on what I can read back from the sector that I have written.

When I ran my test with a valid trigger condition, I found that the data coming in from the surface was not what I thought I was writing. It might be the residual junk already there, but my write logic looked wrong from my earlier scope work.

I think the next two things to do are to wire up the ReadGate signal including the level shifter/driver circuit, and to go back to capturing my transitions from the logic analyzer. I will add a channel that is the received WriteData&Clock line from the terminator, to check that it corresponds to what is sent to the level shifter. Both of these can't be done in the remaining time this evening. 

Write Sector logic working properly yet drive not actually writing

ALTO DISK TOOL

Step one in tightening up the WriteSector process is to draw up a precise timeline of the timing of each word to be written, relative to the sector mark at the beginning. I sat down with that timing and the test bed to capture and check everything against the list.

Anywhere there is a discrepancy, I can sort out the cause and correct for it. The objective is to have the sector end 3.120 us after the SM, which gives a comfortable margin within the 3.333 us sector duration.

I worked through the process, trimming excess cycles and ensuring I got the data lined up as close as reasonble to the ideal times. First step was to ensure that the header record was correct, a preamble of 34 word times of zero, a sync word of 0001, two header words of 0000, a checksum word of 0151, and five postamble words of 0000.

Early on, I had the data complete within the sector, but at this point I am checking even more rigorously, ensuring that the transitions match what should go out at all key points. The 1 bit that forms the sync word is occuring just a few hundred nanoseconds late compared to the ideal time, essentially spot on.

I decided that I need a status warning if the WriteSector ever does span past the end of the sector, putting bit 3 of Reg0001 to that purpose. I also set up the WriteField FSM to restore itself if the SM occurs, making this error recover the logic to accept subsequent transactions.

However, when verifying the timing, I was looking at the data coming out and see that I am misaddressing memory, essentially reading the wrong words from RAM. I spotted the flaw and corrected it.

After lunchtime, my testing had verified the timing all the way through the end of the second (label) record of the sector, to the tenth of a microsecond of what was expected. At this point, I brushed against the button to trigger a read of a sector, which gave me the data but triggered a powerdown of the disk drive.

Thinking about this, I may be causing it by modulating the WriteData&Clock signal without WriteGate active, particularly if after the final transition I leave the line at 1 instead of zero. The rest state of that signal should always be 0, but there was a condition where the WriteGate gets turned off and the transition state engine didn't stop with a 0. I corrected that.

After the obligatory 30 minute idle time while the toolchain worked, I fired up the testbed with the corrected logic and resumed testing, this time verifying the postamble and preamble between the label and data records.

All is perfect - timing within a tenth of a microsecond, data looks perfect, time to allow WriteGate to switch on the write electronics and try writing the contents of sector 0 from the disk archive file. Taking the mandatory 30 minute timeout then will attempt to write the sector and read it back to see if it appears good.

I performed the WriteSector transaction with the WriteData&Clock signal modulated and WriteGate turned on, but it didn't appear to do anything. Time to check on the delivered signals on the drive, ensuring that it did see the WriteGate activate and see the incoming transitions. I will have to find an appropriate place to hook up the scope.

There is an option on some Diablo drives where the EraseGate is an independent signal, instead of tied to the WriteGate in the standard models. Diablo does not mark which options are installed, thus the only way to tell is to examine circuit boards and wiring - quite annoying.

If it is separate, then I am not energizing it with the prototype board although the production board I built will drive it. I will hold off to test this until tomorrow, since I have to study schematics and find test points inside the drive to verify whether erase current is flowing and whether the write signal is passed to the heads. 

Thursday, October 27, 2016

Debugging write sector function, almost done but timing is a bit too long to fit in sector

ALTO DISK TOOL

I set up the first combination of signals to watch on the logic analyzer to get to the bottom of this misloading of the serializer from RAM. I also modified the WriteField logic to add a step to more cleanly load the serializer after memory access has finished. I will see if this is unnecessary during the test run.

First morning run, I captured the data and looked it over carefully. The change I made fixed the issue with reading from RAM and serializing. The output stream of transitions reflected the data put into the serializer (sync word, two header record words, checksum word). However, the WriteField FSM is stalling in the checksum writing state.

I tweaked a few signals and the FSM, but I am not exactly sure what is causing the problem. It may be a race hazard once again, this one between the serializer and this FSM. The serializer emits a signal getnewword for one cycle, after which the  FSM should issue loadword for one cycle duration, a few cycles later, to cause the serializer to pick up the new word to emit.

If we are missing the getnewword signal for some reason, or the serializer missed a loadword signal, we will stall forever because we have to load the serializer in order to get the getnewword signal when it shifts out the 16th bit.

I did a test run while my tweaks were synthesizing, hoping to spot the two signals and their timing. I know that the checksum was loaded, as I saw that on the analyzer last time, but I didn't look to see whether the zero word for the postamble was properly loaded.

This showed me a successful loadword to get the checksum into the serializer, then later the getnewword that should trigger the postamble phase. However, the WriteField FSM did not move out of the checksum step, while it should have.

What I should see is the checksum step, the load of the checksum value, the bits shifted out and then the getnewword signal which is supposed to trigger a move to the postamble step of the FSM. I see the getnewword but it never steps. We stay in checksum, loading words of zero in an infinite postamble.

I pored over the logic for the WriteField FSM to see if I could find a way it would malfunction as it has. I am left with race hazard as the only conclusion. In the checksum step, I see the word to load being set to zero, right after the getnewword signal is received. However, in the same logic group that sets the word to zero, it also should emit loadword but it doesn't, nor does it move on to the postamble step.

I now have verified operation from the request to write a sector all the way to the correct emission of the checksum word at the end of the header record (first field). What I need at this point is to get the postamble of five words writing properly and the WriteField FSM to go back to idle.

In the late morning, I had to cease work in order to get over to the CHM for the 1401 team meeting, but resumed work in the early evening and set up a clean new step in the WriteField FSM, between checksum and postamble, where I set up the zero word and load it.

With that change processed into a bitstream, I was ready to test again. I now found myself through the header and label records, apparently fine, and chugging through the data record of 256 words when it stalled in the postamble. The sector mark appeared and reset the WriteSector FSM but the WriteField remained stuck at postamble.

I was running various tests to look at parts of the sector and the exact behavior at the end of the data record, wanting to see it write the checksum and count through the postamble. However, at this point, the drive powered down by itself.

I touched the external power supply which provides the drive with its +15 and -15 levels, at which point it turned back on. I spun it up and tried for another test, but the trigger condition wasn't quite right. When I tried to cycle again for a new test, the drive went down and stayed down.

I will have to inspect the power being delivered by the external supply, to be sure it is good, in oder to decide whether my problem is in the supply or the drive. Before I do that, I will take a quick look over the WriteSector logic to see whether I can see any reason that the data record might stall when the first two records completed just fine.

One way this can go awry is if the total time to write out the sector is longer than the time between sector marks. As a safety measure, my logic will shut down the WriteSector logic and turn off the WriteGate when the following SM is detected.

If my logic for the sector was still in the process of writing the trailing five words of zero when this happened, then the overall process is taking too long. Right now I can't distinguish between this case, long but working properly, and the other case where the data record postamble is stalled.

The power supply appeared good by the time I tested it, and the disk began to spin up fine. I waited to put my latest diagnostic trace version of the fpga bitstream into a test, then went back to testing. What I discovered was that indeed I am taking too long to write out the sector, bumping into the next sector mark and resetting my WriteSector FSM.

It is time for me to go back to the idealized timeline and compare what I am writing, to figure out where I am overshooting or to trim some time off. 

Tuesday, October 25, 2016

Working through the WriteSector transaction, step by step, first part of Header record

ALTO DISK TOOL

Finally ran the tests to see what is happening when I attempt to write to RAM thru the USB transaction. I found a race hazard between my FSM that controls the write to RAM and the FSM that accesses memory, where I am moving forward too quickly. 

Changes made and back to testing after a half hour for synthesis, etc. The new interlock works great, I am now writing and reading the RAM properly from USB, with one teeny problem. My logic for writing has the two bytes in a word swapped. Should be an easy fix, although not quick due to the 30 minute cycle time for each change, plus time for lunch.

It does show the value of a fully interlocked set of FSMs, which I skipped falsely believing that the first (read) cycle would be back to idle before I requested the second (write-back) cycle. Everything worked properly with RAM. Testing moved on to the WriteSector functionality.

My first test showed my WriteSector FSM stalling waiting for the sector to come around. I adjusted the logic in the FSM and prepared to test again in a half hour. Basically, I am running an entire WriteSector transaction and ending with a completion status. 

It turns out I have some instrumentation error in that I am decoding the FSM states wrong. When I waited on the Start state, which has the bit clock for writing begin, I saw it move into the WriteField logic for the header record.

I also put a scope on the WriteDataClock output pin to see that I am emitting flux reversals at the timing. I see the WriteGate signal go on, the flux reversals begin, but they are occurring every 300 ns, when they should happen only once per 600 ns. 

Effectively, I am writing a string of 1 bits to the disk when I intend to write 0 bits for the duration of the preamble. I also noted that my decoding of the WriteField FSM states is equally amiss to that of WriteSector

I sat down with the logic and studied it to see where I am running afoul, as well as evaluating the FSM state discrepancy and the logic analyzer settings. The timing issue was the residue of a mental error I made and had caught earlier on my preambles and other wait states - the fpga clock of 50MHz I somehow began coding as 50ns, so that my timers for a given duration needed to be 2.5 times as large as I had.

With the basic bit timing adjusted, I tested again. The WriteDataClock output during the preamble was quite good and I scoped the output driven by my level shifter board which looked superb, exhibiting nice clean levels, along with fast rise and fall times. 

WriteClock&Data signal output of my level shifter/driver board
I looked forward to the point where I sent the first sync word, which seemed to be modulating the  output appropriately for the one bits (that is, adding a transition at the 300 ns point in between the clock transitions every 600 ns.)

However, I saw too many one bits going out, plus the WriteField record one, the header of two words, was producing enough bits for more than seven complete words by the time I ran out of logic analyzer memory. 

Thus, I must now study the logic, examining how it sets up the word count and addresses, then begin checking those out carefully. While I did this, I started thinking about my state decoding issues, which will be resolved in the logic analyzer rather than fpga logic I think.

Instrumentation corrected and I could immediately see my problem. The logic wrote out the two header words, 32 bits total after the sync bit. What it wrote is not correct, but it did write the proper two words, then it sat in the checksum writeout stage for many word times. I suspect it is stalled there.

I will begin monitoring the RAM address passed to the logic, check that the correct word is returned from RAM, and then we can see if the serializer is processing it properly. Meanwhile I will look over the checksum logic to find the flaw there.

I remembered that my logic writes the postamble - that is, five words of all zero contents that follow each record written to the sector. That means that I would legitimately write 96 bits after the two header words, which runs out just past the end of the logic analyzer capture. Therefore, I may not have the error I think.

What I do have problems I can see from the trace. I see the wrong data stuffed into the serializer, which accounts for the wrong bits coming out to the write head. This is likely an error reading and transferring data from the memory access FSM, likely a race condition I can correct with an extra step in the WriteField FSM. I have to study this and turn on recording of the memory access as a cross check. 

I have verified that my WriteField logic takes the following steps correctly and at the intended times:

  1. Waits until the sector number is matched, so that the target sector is now under the heads
  2. Turns on the WriteGate to enable writing (I have blocked this from actually reaching the drive)
  3. Begins emitting transitions with mixed clock and data values
  4. Writes the 34 word times of all zero data bits
  5. Emits the sync word 0001 and begins to write the first record
  6. Fetches location 0001 from RAM, which is the first word of the header to write
It is at this point that I no longer see correct behavior. The fetched word from RAM should be 0000 and it would be loaded into the serializer. Instead, I see a word whose lower half is 30 going into the serializer. The successive word has a lower half of 00, then the calculated checksum has 61 in its lower half. 

The serializer delivers bits, which may or may not correspond to the loaded word (since I don't capture the top half of the word), but are definitely not the 0000 0000 words I should be sending. The serializer than delivers what it thinks is the checksum and proceeds to deliver what should be five words of 0000.

I need to zero in on the loading of the serializer from the RAM fetches, to be sure this is correct. Once it is, the next check is that the serializer returns the proper bits from the word that was loaded. When that works properly, I must see the checksum and 0000 values get stuffed properly into the serializer and come out to the write head. 


It is getting late, thus I will work on instrumenting what needs to be captured and possibly producing the bitstream, but will cease testing for the day. 

Diablo 31 drive under test
PC talking over USB link and logic analyzer to capture status
FPGA board to left and driver role extender/level shifter board on right

Monday, October 24, 2016

Prepping for write function tests, debugging RAM load

ALTO DISK TOOL

Setting up for the disk writing tests was quite involved. I prepared a memory file of the original sector 0 value for the disk cartridge, from the archive on bitsavers. I set up the most useful signals on the logic analyzer and triple checked that my tool was not going to actually write data onto the drive during the early tests.

First up, I had to verify that the file I created was correctly loaded into RAM on the board. All my testing to date verified that I could write to RAM from data extracted off disk, but now I would use the USB link mechanism. I need a known image to write in order to validate the bitstream sent to the heads.

My logic is not updating RAM from the USB link, either from a file or via direct manipulation of the registers. I set up diagnostics to figure this out.  I didn't have much time available today but should have this working imminently.

Connector built for new disk tool board, board looks good, also beginning tests of writing to disk

ALTO DISK TOOL

I managed to peel back the insulation and expose, then bend away the copper mesh that acts as a groundplane behind all forty of the signal lines in the ribbon cable. I could then trim it away leaving just the forty signal lines into which the vampire taps will connect.

Copper mesh groundplane peeled away from signal conductor ribbon
Groundplane trimmed back leaving an area to connect press-on IDC connector
I used a press-on IDC 40 connector, whose vampire taps will bite into the forty signal lines if I align it just right. I used my clamp tool that will squeeze the connector halves together. I had failed to get the lineup close enough on my first few tries, but today I ended up with no shorts and good connectivity to every signal line.

Clamping tool plus the two halves of the IDC connector

Connector on cable, ready to plug into the driver role extension board
If I had tried to substitute a plain 40 signal ribbon cable, it would have lacked the ground plane. The solution for many is to alternate ground lines between the signal lines, but that would have forced me to upscale to an 80 signal ribbon, a bit unwieldy, or switched to a large bundle of twisted pairs.

This cable gives a consistent 80 ohm impedance, important to handle the 3+ MHz signal generated to write clock plus data onto the drive. It must keep transitions within a 50 ns maximum interval, which is supported by high drive current and good termination.

I did work on the WriteSector logic and am preparing to run the logic analyzer to capture all the transitions I write, checking for a proper sector pattern. It will be painful, since what matters is the flip in state, not the absolute value of 1 or 0, and the relative timing of the flip compared to the 'clock' flips.

I will think a bit about how to best capture and test what occurs during the writing of a sector.