Sunday, October 30, 2016

Still trying to find cause of failure to write

ALTO DISK TOOL

Some conditions are tested for by the circuitry in the Diablo and would result in a WriteCheck condition:

  • WriteGate on but no write current
  • Head current when WriteGate is off
  • No Erase current when EraseGate on
  • Erase current when EraseGate off
  • Erase current through both heads at same time
  • Write current through both heads at same time
  • Voltage dips below 13.5 V


Presumably, my problems in the Diablo drive don't include those situations, or I would have an immediate WriteCheck. I do have the sporadic situation where the drive powers down, flashing both FileReady and ReadyToSeek/Read/Write signals a couple of times. 

To recap some other symptoms observed so far:

As noted yesterday, the voltage on the center tap of the upper head should be about +1V when selected, -1V when not the selected head and +14V when writing. I confirmed both -1 and 14V levels, but the selected level is noiselike at dozens of millivolts rather than roughly 1V. 

Also noted yesterday, when I measure the output of the differential amplifier responding to the read head output (testpoint 1 on the board), I see a dozens of millivolt noiselike signal when reading the sectors on cylinder 0, even after I write, but I get multivolt wide swings on any other cylinder. 

When I write sector 0, I see the WriteGate activate and the proper pulses delivered to the WriteData&Clock line on the terminator. Reading back the sector still has the essentially erased output, millivolt noise, but no signal swings.

Therefore I need to check step by step to verify that my write signal is delivered to the heads properly. It would be easy if I had an extender card to push card J10 back giving me direct access to the various probing points, but to use it I would also need an extender for the cable from the read/write heads. Don't have either.

First new observation - I set the scope to trigger on the WriteData&Clock signal coming from the fpga and put the other probe on testpoint 1 to observe the flux reversals. I saw an odd dip in the midst of each transition, and the pattern for when I had a 1 bit wasn't correct either. 
Top line shows flux reversals I should see on testpoint 1


Test point 1 with odd decay in each signal


Clipped signal at TP2 seems almost the reverse of what I should see - dips when bit is 0
I am writing multiplexed pulses correctly, but write current might be wrong

The testpoint 1 is at one output of the differential amplifier on the read head. The trace without the dips would be seen when reading this sector. Since the Alto docs mention leaving ReadGate on during a write and observing the written bit stream, I presume that the testpoint 1 signal should look legitimate, without the dips. The dips are bound to cause false recognition of both clock and data bits. 

Here is the sequence of observations that must be made to determine from where the malfunction stems:
  1. Scope on output of the D flipflop that causes flux reversals, triggered by WriteData&Clock
  2. Scope on inverse output of the D flipflop  triggered by WriteData&Clock
  3. Scope on head bus A, triggered by WriteData&Clock
  4. Scope on head bus B, triggered by WriteData&Clock
  5. Scope lower surface head to verify its +1, -1 and +14V behavior

I set up for test 1 and 2, putting micrograbbers on the flipflop pins. Both sides of the flipflop show transitions just as expected. I have labeled these A and B respectively in the closeup of the schematic below.
Testing at Q and notQ output of D flipflop

Expected signal at point B (notQ)
Signal at point A (Q)

Next up is to scope on head bus A and again on head bus B. The bus A signal looked reasonable, although it only swung down to +5V from +14, not sure if that is correct. On to look at head bus B to compare. 

Head bus A signal
The other bus looked similar, evincing swings from 5V upwards, but I can see that the upper half of the bus A waveform is clipped off compared to the bus B version. I will now look at the bus A path for components that might cause this distortion. B is nicely symmetric while A is not. 

Head bus B signal

The above two views of bus A and bus B are taken from points C and D of the schematic excerpt below. Next I moved over to points A and B to see the drivers of the two bus lines.

Probe spots for head bus A and B

The signals at points A and B above both look good, equally symmetric. Whatever is clipping the peaks of the signal at point C (my Head bus A signal above the schematic) occurs on the right hand side of diode B81. 

point C - bus A driver
point D - bus B driver

Now, I move to figure out what is clipping the tops of the bus A signals once it moves through the diode D81 I will repeat the view of points A and C, the ones that showed clipping, but while writing on the lower surface (head 1). This will eliminate the head itself as a causative agent. 

I ran the tests and saw no clipping on the head bus A or B when writing on head 1. I then switched back to head 0 (the upper surface) and captured bus A again. Now I am confused - this time I saw no clipping. 
Retest of point C on upper head - this time no clipping

Still, once the sector was written, when I tried to read it back the signal was like noise, not the magnetization level I would expect. At this point, I am still mystified. 

Musings - could the voltage swing on the bus be too small to flip magnetic domains? Seems unlikely given how symmetric the behavior is. Is something wrong with the erase winding or driver? 

One final test of the evening - monitoring the erase driver input to the drive transistor, just to be certain that it thinks it should fire. I suppose one cause for the lack of discernable transitions when reading is that the write is actually not doing an erase, thus layering so many transitions that there is no clear signal to read. 
Probing point to check erase operation

Erase driver definitely turning on

Now that I see the drive voltage firing up for the erase driver, and had previously seen a current draw curve that was similar to this, I have to again assume that erasing is working properly. I remain mystified as to what is happening on the drive. 

Oddly, after I write the sector, I see checksum errors on the label and data record, but not header record when I try to read it back. Since the write is producing its own checksums, that is definitely odd.

Saturday, October 29, 2016

Continuing work on Write Sector, now questioning operation of disk drive circuitry

ALTO DISK TOOL

I completed the reinstrumentation to watch the behavior of the WriteSector and to ensure that the transition stream matches what I expect to see emitted. I will also watch at the terminator side of the cable to be sure I see the same data arriving at the drive. 

Watching the outgoing stream at the terminator gives me nice clean data matching my sector data exactly. I then switched the probe to watch the ReadData line coming it, during the write, to see if it appears to be the same data we sent out.

It does appear to align during the WriteSector operation, although I saw a few funny glitches near where I had written one bits. My WriteData&Clock signal looked very clean at the terminator, with no ringing. I zoomed in on the period with the '1' bit transition to see if I could see any signal issues that might cause problems. 

Sicne that looked clean, I next watched the ReadGate signal to trigger the scope and watched the ReadData input for the sector I had just written. I saw random bits showing up at times when I had not written anything - different from the ReadData line monitored during writing. 

I also looked at ReadClock and saw periods without a clock pulse. I set the scope to a point where I was seeing that dropout of the clock pulse (one time in the sequence of clock pulses), and then monitored ReadData. I was seeing quite a bit of spurious detected 1 bits surrounding the dropout. 

I am definitely going to have to move inside the Diablo during this investigation. I need to see what comes off the heads before the data and clock separator, since the separation may be malfunctioning. I also need to monitor the erase signal to be sure that the erase head is really turned on.

If the erase head is not active, I am just layering transitions atop existing transitions, which would give erratic clock pulses and bad data bit recovery. If it is active, then I have a different insidious problem. Time to dig into the schematics and finalize test points. 

I found a resistor that should show voltage drop as the erase current flows through one of the heads. I also found four test points that together will give me a view of what is coming off the heads and heading into the separator circuit.

My first test, to watch the erase current, showed that it indeed activates when WriteGate is asserted. I then switched the probe to the test point showing me the raw transitions coming in off the head while reading. I triggered on ReadGate, the start of reading sector 0 which I had rewritten, 

I saw low voltage, basically noise like signals coming from the head, In order to compare to a good sector, I moved the arm to Cylinder 1 and did the same capture. Quite different - now I had very wide swings consistent with the flux transitions of clock pulses. 

I looked through the schematics to see if I could spot where the problem occurred, I realized that if the erase current were off, or if the write selection current wasn't working, the drive would detect a WriteCheck condition and alert me. It hadn't. 

Heads and related circutis, top view
Select current to read/write on head, plus erase current drivers, middle of schematic
This moved me to look at the logic driving the signal on head bus A and B, which is what will actually produce the flux transition. As I looked along the left of the top schematic, I spotted a D flipflop and I could hear a huge "DOH" echoing in my brain. 

The flipflop takes pulses from the WriteData&Clock line, alternating the state of the A and B bus levels on each one. It uses the incoming pulse from the terminator to flip the state. Therefore, it is causing the transitions and I should not be doing that in my logic. Rather, I should be emitting simple pulses to cause the transitions to happen. Doh.

I had misread the spec and overly complicated my driver logic. All I had to do was emit a 100ns pulse whenever a transition should occur, not reverse directions of the output line. That would explain why my logic was working as I intended, toggling the driver line, but it wasn't writing intelligible data on the drive. 

I dove into the fpga logic and converted the output to what it should be, pulses at the proper times. With this changed and the bitstream generated by midafternoon, I tested again, first checking carefully on logic analyzer to see that the logic is accomplishing my newly corrected intentions. 

With the timing module working properly, I appear to be emitting the proper signals but still don't get a good read. Further, when I trigger on ReadGate and look at the raw signal from the heads on testpoint 1, it still looks like noise while other cylinders have large clear swings.

The manual asserts that a head which is not selected will have its center tap sit around -1V - which I confirm - and a selected head with WriteGate on will sit at +14V - also confirmed. However, a selected head with ReadGate (or absence of WriteGate) is claimed to sit at about +1V but I instead see the noise coming in from the head, or a similar noise pattern at much lower level, closer to ground. 

I still believe there is something wrong with the write circuitry in the disk drive, since I never leave any transitions on the surface that will be picked up by the head. I have the +14V power level and the erase function certainly seems to function, but I don't see magnetization of the domains on the platter.

I will have to hook the scope up to the two head bus lines, trigger on WriteGate and watch to see whether I am swinging the current inversely on the two lines. That would tell me that I ought to be able to write a flux reversal at that point. 

I did find a flaw in my serializer triggered by the change in my output pulse logic, requiring a quick fix but slow pass to create the new bitstream. Once it was ready, I tested again. Now I was producing exactly the pulse stream I wanted, although the outcome on the disk was still not perfect.

I am still not sure what is happening and need more investigation, probing of the drive circuitry and other testing. For example, the scope on the two head bus lines will tell me if I am truly writing or not. 

Friday, October 28, 2016

Debugging write operation to disk drive

ALTO DISK TOOL

I had to drive over and swap cars with my daugther, hers has problems. After lunch, I got back to the project. I continued to map out test points to figure out why the WriteSector, which looks so good on the logic analyzer, failed to write on the actual disk (or wrote something so bad it remains unreadable).

The easiest spot to probe is the terminator, using micrograbbers on the resistors, but I had to have an accurate map before I could be sure I caught the right signals.

Terminator with points to capture connector signals
The terminator I used for the picture is not the same one used for either the prototype or production driver board. The positions are the same but there are more or less resistors on the other terminators.

I fired up the system with the scope capturing the WriteGate and WriteData&Clock signals, freezing when WriteGate first goes active. I see the write commanded and the transitions begin occurring, but the results when reading are still checksum errors on all three records in sector 0.

At this point there are two possibilities I see - the writing is not taking place due to some failure inside the Diablo drive, or I am writing something but what is read back isn't valid format. To decide between these, I will set up the logic analyzer to capture the incoming record on a read and will look to see what appears.

I had to reinstrument the signals output to the logic analyzer, plus I tweaked the startup of the transition FSM to give a 2 us delay from asserting WriteGate before I began issuing the clock and data transitions. This matches a recommendation in the Diablo manual. There is also setup time changing the logic analyzer configuration to match the emitted signals.

When the bitstream was ready I tested again. The goal here was to do a ReadSector and see what comes in from the sector that I just wrote (I think). If the data is like the last captured garbage, then I am not writing on the surface at all. If the data looks close to what I (attempted) to write, then I can debug how I write to get a clean read.

My test didn't meet my objectives because of a flaw in the instrumentation that meant I didn't have a good way to trigger at the beginning of a sector read, however I could do some other tests with the oscilloscope while executing WriteSector.

I stuck one scope probe on the read data line coming in from the drive and the other was triggering when the WriteGate was asserted. What I should see is an absence of data pulses after I began, since I would be writing almost 337 us of zero bits before emiting a sync word. Instead, the data bits continued to stream in even after my write began.

Since ReadGate is hardwired on, it is possible that the mechanism won't sync up until it sees a long stream of zeros, but once I wrote my preamble one time, that should have existed. Still, it could be a sync issue due to the permanent ReadGate. I don't have signal wires in the main cable to drive the ReadGate pin, but I think I can engineer up a twisted pair to add this to my test setup.

Another issue, however, is definitely a flaw in my logic that I have to chase down. Even though I tracked the bits being emitted yesterday and it was all good, what I saw from the WriteData&Clock line when I wrote was an excellent pattern of zeroes and the sync word at the magic time, but then I saw the wrong data bits following - not the two words of 0000 and the checksum of 0151, but something else.

I did make the seemingly minor change to add the 2 us delay before I began emitting clock pulses and asking for data bits. I don't see how that could cause a problem with the data being produced, but somehow the data being fetched is wrong.

I resynthesized to correct the instrumentation error, so I could read what was coming in from the sector. I might be writing wrong data with bad checksums, although it is equally likely that I am not truly writing. We shall see.

The addition of the ReadGate wiring to the fpga will involve wiring up a level shifter, removing the jumper on the terminator and wiring in the twisted pair I cut to match the length of the exiting cable. This will take careful work which I will do tomorrow. Instead,I will focus on what I can read back from the sector that I have written.

When I ran my test with a valid trigger condition, I found that the data coming in from the surface was not what I thought I was writing. It might be the residual junk already there, but my write logic looked wrong from my earlier scope work.

I think the next two things to do are to wire up the ReadGate signal including the level shifter/driver circuit, and to go back to capturing my transitions from the logic analyzer. I will add a channel that is the received WriteData&Clock line from the terminator, to check that it corresponds to what is sent to the level shifter. Both of these can't be done in the remaining time this evening. 

Thursday, October 27, 2016

Write Sector logic working properly yet drive not actually writing

ALTO DISK TOOL

Step one in tightening up the WriteSector process is to draw up a precise timeline of the timing of each word to be written, relative to the sector mark at the beginning. I sat down with that timing and the test bed to capture and check everything against the list.

Anywhere there is a discrepancy, I can sort out the cause and correct for it. The objective is to have the sector end 3.120 us after the SM, which gives a comfortable margin within the 3.333 us sector duration.

I worked through the process, trimming excess cycles and ensuring I got the data lined up as close as reasonble to the ideal times. First step was to ensure that the header record was correct, a preamble of 34 word times of zero, a sync word of 0001, two header words of 0000, a checksum word of 0151, and five postamble words of 0000.

Early on, I had the data complete within the sector, but at this point I am checking even more rigorously, ensuring that the transitions match what should go out at all key points. The 1 bit that forms the sync word is occuring just a few hundred nanoseconds late compared to the ideal time, essentially spot on.

I decided that I need a status warning if the WriteSector ever does span past the end of the sector, putting bit 3 of Reg0001 to that purpose. I also set up the WriteField FSM to restore itself if the SM occurs, making this error recover the logic to accept subsequent transactions.

However, when verifying the timing, I was looking at the data coming out and see that I am misaddressing memory, essentially reading the wrong words from RAM. I spotted the flaw and corrected it.

After lunchtime, my testing had verified the timing all the way through the end of the second (label) record of the sector, to the tenth of a microsecond of what was expected. At this point, I brushed against the button to trigger a read of a sector, which gave me the data but triggered a powerdown of the disk drive.

Thinking about this, I may be causing it by modulating the WriteData&Clock signal without WriteGate active, particularly if after the final transition I leave the line at 1 instead of zero. The rest state of that signal should always be 0, but there was a condition where the WriteGate gets turned off and the transition state engine didn't stop with a 0. I corrected that.

After the obligatory 30 minute idle time while the toolchain worked, I fired up the testbed with the corrected logic and resumed testing, this time verifying the postamble and preamble between the label and data records.

All is perfect - timing within a tenth of a microsecond, data looks perfect, time to allow WriteGate to switch on the write electronics and try writing the contents of sector 0 from the disk archive file. Taking the mandatory 30 minute timeout then will attempt to write the sector and read it back to see if it appears good.

I performed the WriteSector transaction with the WriteData&Clock signal modulated and WriteGate turned on, but it didn't appear to do anything. Time to check on the delivered signals on the drive, ensuring that it did see the WriteGate activate and see the incoming transitions. I will have to find an appropriate place to hook up the scope.

There is an option on some Diablo drives where the EraseGate is an independent signal, instead of tied to the WriteGate in the standard models. Diablo does not mark which options are installed, thus the only way to tell is to examine circuit boards and wiring - quite annoying.

If it is separate, then I am not energizing it with the prototype board although the production board I built will drive it. I will hold off to test this until tomorrow, since I have to study schematics and find test points inside the drive to verify whether erase current is flowing and whether the write signal is passed to the heads. 

Wednesday, October 26, 2016

Debugging write sector function, almost done but timing is a bit too long to fit in sector

ALTO DISK TOOL

I set up the first combination of signals to watch on the logic analyzer to get to the bottom of this misloading of the serializer from RAM. I also modified the WriteField logic to add a step to more cleanly load the serializer after memory access has finished. I will see if this is unnecessary during the test run.

First morning run, I captured the data and looked it over carefully. The change I made fixed the issue with reading from RAM and serializing. The output stream of transitions reflected the data put into the serializer (sync word, two header record words, checksum word). However, the WriteField FSM is stalling in the checksum writing state.

I tweaked a few signals and the FSM, but I am not exactly sure what is causing the problem. It may be a race hazard once again, this one between the serializer and this FSM. The serializer emits a signal getnewword for one cycle, after which the  FSM should issue loadword for one cycle duration, a few cycles later, to cause the serializer to pick up the new word to emit.

If we are missing the getnewword signal for some reason, or the serializer missed a loadword signal, we will stall forever because we have to load the serializer in order to get the getnewword signal when it shifts out the 16th bit.

I did a test run while my tweaks were synthesizing, hoping to spot the two signals and their timing. I know that the checksum was loaded, as I saw that on the analyzer last time, but I didn't look to see whether the zero word for the postamble was properly loaded.

This showed me a successful loadword to get the checksum into the serializer, then later the getnewword that should trigger the postamble phase. However, the WriteField FSM did not move out of the checksum step, while it should have.

What I should see is the checksum step, the load of the checksum value, the bits shifted out and then the getnewword signal which is supposed to trigger a move to the postamble step of the FSM. I see the getnewword but it never steps. We stay in checksum, loading words of zero in an infinite postamble.

I pored over the logic for the WriteField FSM to see if I could find a way it would malfunction as it has. I am left with race hazard as the only conclusion. In the checksum step, I see the word to load being set to zero, right after the getnewword signal is received. However, in the same logic group that sets the word to zero, it also should emit loadword but it doesn't, nor does it move on to the postamble step.

I now have verified operation from the request to write a sector all the way to the correct emission of the checksum word at the end of the header record (first field). What I need at this point is to get the postamble of five words writing properly and the WriteField FSM to go back to idle.

In the late morning, I had to cease work in order to get over to the CHM for the 1401 team meeting, but resumed work in the early evening and set up a clean new step in the WriteField FSM, between checksum and postamble, where I set up the zero word and load it.

With that change processed into a bitstream, I was ready to test again. I now found myself through the header and label records, apparently fine, and chugging through the data record of 256 words when it stalled in the postamble. The sector mark appeared and reset the WriteSector FSM but the WriteField remained stuck at postamble.

I was running various tests to look at parts of the sector and the exact behavior at the end of the data record, wanting to see it write the checksum and count through the postamble. However, at this point, the drive powered down by itself.

I touched the external power supply which provides the drive with its +15 and -15 levels, at which point it turned back on. I spun it up and tried for another test, but the trigger condition wasn't quite right. When I tried to cycle again for a new test, the drive went down and stayed down.

I will have to inspect the power being delivered by the external supply, to be sure it is good, in oder to decide whether my problem is in the supply or the drive. Before I do that, I will take a quick look over the WriteSector logic to see whether I can see any reason that the data record might stall when the first two records completed just fine.

One way this can go awry is if the total time to write out the sector is longer than the time between sector marks. As a safety measure, my logic will shut down the WriteSector logic and turn off the WriteGate when the following SM is detected.

If my logic for the sector was still in the process of writing the trailing five words of zero when this happened, then the overall process is taking too long. Right now I can't distinguish between this case, long but working properly, and the other case where the data record postamble is stalled.

The power supply appeared good by the time I tested it, and the disk began to spin up fine. I waited to put my latest diagnostic trace version of the fpga bitstream into a test, then went back to testing. What I discovered was that indeed I am taking too long to write out the sector, bumping into the next sector mark and resetting my WriteSector FSM.

It is time for me to go back to the idealized timeline and compare what I am writing, to figure out where I am overshooting or to trim some time off. 

Tuesday, October 25, 2016

Working through the WriteSector transaction, step by step, first part of Header record

ALTO DISK TOOL

Finally ran the tests to see what is happening when I attempt to write to RAM thru the USB transaction. I found a race hazard between my FSM that controls the write to RAM and the FSM that accesses memory, where I am moving forward too quickly. 

Changes made and back to testing after a half hour for synthesis, etc. The new interlock works great, I am now writing and reading the RAM properly from USB, with one teeny problem. My logic for writing has the two bytes in a word swapped. Should be an easy fix, although not quick due to the 30 minute cycle time for each change, plus time for lunch.

It does show the value of a fully interlocked set of FSMs, which I skipped falsely believing that the first (read) cycle would be back to idle before I requested the second (write-back) cycle. Everything worked properly with RAM. Testing moved on to the WriteSector functionality.

My first test showed my WriteSector FSM stalling waiting for the sector to come around. I adjusted the logic in the FSM and prepared to test again in a half hour. Basically, I am running an entire WriteSector transaction and ending with a completion status. 

It turns out I have some instrumentation error in that I am decoding the FSM states wrong. When I waited on the Start state, which has the bit clock for writing begin, I saw it move into the WriteField logic for the header record.

I also put a scope on the WriteDataClock output pin to see that I am emitting flux reversals at the timing. I see the WriteGate signal go on, the flux reversals begin, but they are occurring every 300 ns, when they should happen only once per 600 ns. 

Effectively, I am writing a string of 1 bits to the disk when I intend to write 0 bits for the duration of the preamble. I also noted that my decoding of the WriteField FSM states is equally amiss to that of WriteSector

I sat down with the logic and studied it to see where I am running afoul, as well as evaluating the FSM state discrepancy and the logic analyzer settings. The timing issue was the residue of a mental error I made and had caught earlier on my preambles and other wait states - the fpga clock of 50MHz I somehow began coding as 50ns, so that my timers for a given duration needed to be 2.5 times as large as I had.

With the basic bit timing adjusted, I tested again. The WriteDataClock output during the preamble was quite good and I scoped the output driven by my level shifter board which looked superb, exhibiting nice clean levels, along with fast rise and fall times. 

WriteClock&Data signal output of my level shifter/driver board
I looked forward to the point where I sent the first sync word, which seemed to be modulating the  output appropriately for the one bits (that is, adding a transition at the 300 ns point in between the clock transitions every 600 ns.)

However, I saw too many one bits going out, plus the WriteField record one, the header of two words, was producing enough bits for more than seven complete words by the time I ran out of logic analyzer memory. 

Thus, I must now study the logic, examining how it sets up the word count and addresses, then begin checking those out carefully. While I did this, I started thinking about my state decoding issues, which will be resolved in the logic analyzer rather than fpga logic I think.

Instrumentation corrected and I could immediately see my problem. The logic wrote out the two header words, 32 bits total after the sync bit. What it wrote is not correct, but it did write the proper two words, then it sat in the checksum writeout stage for many word times. I suspect it is stalled there.

I will begin monitoring the RAM address passed to the logic, check that the correct word is returned from RAM, and then we can see if the serializer is processing it properly. Meanwhile I will look over the checksum logic to find the flaw there.

I remembered that my logic writes the postamble - that is, five words of all zero contents that follow each record written to the sector. That means that I would legitimately write 96 bits after the two header words, which runs out just past the end of the logic analyzer capture. Therefore, I may not have the error I think.

What I do have problems I can see from the trace. I see the wrong data stuffed into the serializer, which accounts for the wrong bits coming out to the write head. This is likely an error reading and transferring data from the memory access FSM, likely a race condition I can correct with an extra step in the WriteField FSM. I have to study this and turn on recording of the memory access as a cross check. 

I have verified that my WriteField logic takes the following steps correctly and at the intended times:

  1. Waits until the sector number is matched, so that the target sector is now under the heads
  2. Turns on the WriteGate to enable writing (I have blocked this from actually reaching the drive)
  3. Begins emitting transitions with mixed clock and data values
  4. Writes the 34 word times of all zero data bits
  5. Emits the sync word 0001 and begins to write the first record
  6. Fetches location 0001 from RAM, which is the first word of the header to write
It is at this point that I no longer see correct behavior. The fetched word from RAM should be 0000 and it would be loaded into the serializer. Instead, I see a word whose lower half is 30 going into the serializer. The successive word has a lower half of 00, then the calculated checksum has 61 in its lower half. 

The serializer delivers bits, which may or may not correspond to the loaded word (since I don't capture the top half of the word), but are definitely not the 0000 0000 words I should be sending. The serializer than delivers what it thinks is the checksum and proceeds to deliver what should be five words of 0000.

I need to zero in on the loading of the serializer from the RAM fetches, to be sure this is correct. Once it is, the next check is that the serializer returns the proper bits from the word that was loaded. When that works properly, I must see the checksum and 0000 values get stuffed properly into the serializer and come out to the write head. 


It is getting late, thus I will work on instrumenting what needs to be captured and possibly producing the bitstream, but will cease testing for the day. 

Diablo 31 drive under test
PC talking over USB link and logic analyzer to capture status
FPGA board to left and driver role extender/level shifter board on right

Monday, October 24, 2016

Prepping for write function tests, debugging RAM load

ALTO DISK TOOL

Setting up for the disk writing tests was quite involved. I prepared a memory file of the original sector 0 value for the disk cartridge, from the archive on bitsavers. I set up the most useful signals on the logic analyzer and triple checked that my tool was not going to actually write data onto the drive during the early tests.

First up, I had to verify that the file I created was correctly loaded into RAM on the board. All my testing to date verified that I could write to RAM from data extracted off disk, but now I would use the USB link mechanism. I need a known image to write in order to validate the bitstream sent to the heads.

My logic is not updating RAM from the USB link, either from a file or via direct manipulation of the registers. I set up diagnostics to figure this out.  I didn't have much time available today but should have this working imminently.

Sunday, October 23, 2016

Connector built for new disk tool board, board looks good, also beginning tests of writing to disk

ALTO DISK TOOL

I managed to peel back the insulation and expose, then bend away the copper mesh that acts as a groundplane behind all forty of the signal lines in the ribbon cable. I could then trim it away leaving just the forty signal lines into which the vampire taps will connect.

Copper mesh groundplane peeled away from signal conductor ribbon
Groundplane trimmed back leaving an area to connect press-on IDC connector
I used a press-on IDC 40 connector, whose vampire taps will bite into the forty signal lines if I align it just right. I used my clamp tool that will squeeze the connector halves together. I had failed to get the lineup close enough on my first few tries, but today I ended up with no shorts and good connectivity to every signal line.

Clamping tool plus the two halves of the IDC connector

Connector on cable, ready to plug into the driver role extension board
If I had tried to substitute a plain 40 signal ribbon cable, it would have lacked the ground plane. The solution for many is to alternate ground lines between the signal lines, but that would have forced me to upscale to an 80 signal ribbon, a bit unwieldy, or switched to a large bundle of twisted pairs.

This cable gives a consistent 80 ohm impedance, important to handle the 3+ MHz signal generated to write clock plus data onto the drive. It must keep transitions within a 50 ns maximum interval, which is supported by high drive current and good termination.

I did work on the WriteSector logic and am preparing to run the logic analyzer to capture all the transitions I write, checking for a proper sector pattern. It will be painful, since what matters is the flip in state, not the absolute value of 1 or 0, and the relative timing of the flip compared to the 'clock' flips.

I will think a bit about how to best capture and test what occurs during the writing of a sector. 

Saturday, October 22, 2016

Mod made to Diablo disk, no change in error rate, board wiring completed

ALTO DISK TOOL

I completed the modifications to the Diablo drive, such that its short and long times are now 440 ns and 460 ns, in compensation for the 'overclocking' done by Xerox Alto computers when writing 10% above the Diablo factory spec.

I ran the ReadEntireCartridge function again, both with and without the auto-retry on checksum errors, and got essentially the same results. I had opened the cartridge, inspected both platter surfaces, cleaned the disk and also cleaned the Diablo drive heads.

There remains the possibility that this drive and the one that originally wrote the data are not aligned exactly the same. The practice with drives of this era was to use a special "CE" cartridge and an oscilloscope to adjust the arm positioning of every drive, thereby hoping that two things that are closely aligned to a third thing would be closely aligned to each other.

If each adjustment is off by 1%, the two drives could be 2% different in the worst case. With the track positioning of .01" and a guard band between recorded tracks of .003" it wouldn't take much misalignment to have the head reading near the edge of a track rather than directly over the recorded signal.

This suggests to me that we may not be able to retrieve every sector of every cartridge perfectly, instead settling for > 99%. Another factor to consider is the weak error detection mechanism used, a simple XOR checksum which means any even number of bit errors in a 'column', e.g. modulo 16 in the bitstream, would falsely pass the checksum test.

Since the raw rate - a single read of each sector - hit a bit over 6%, but was dropped to 0.6% by retrying any bad sector up to 31 times, we may have errors in content that are misreported as good sectors.

This weakness in checksum strength was a fact of life with the Alto. Every time a user ran the system, they faced the chance that multiple bit errors would slip under the radar and deliver bad data to the user.

I worked further on the new disk driver board, such that by midafternoon I had only 7 fpga signals left to wire before I do the extensive checkout. At dinnertime, the entire board was wired and I went through a connectivity and shorts test. Everything checked out, so it is time for a function test tomorrow. 

Friday, October 21, 2016

Doing Alto modification to Diablo disk drive, working on Write logic and building up new driver board

ALTO DISK TOOL

I set in to the wiring of the 3.3V fpga signal lines to all the level shifters. By lunchtime I had 7 of the 27 wires run. When I got back from some chores in the midafternoon, I went back to working on the board. In addition, I worked through my WriteField and WriteSector logic, to begin testing the writing/updating functionality of the tool.

I will initially test the write functions with the WriteGate and EraseGate held in the off condition, so that I can watch the flux reversals being emitted without committing them to disk until I am satisfied. This required a big revamp of the logic analyzer signal assignments and trigger conditions.

By 3PM, all 12 of the input signals were wired to the fpga and the 'personality' signals, Sense20 and Sense22, were grounded to indicate that this is the Alto style driver role board, distinct from the oddball format driver role board I built around an existing cable, and distinct from the disk emulator role board.

I have 15 more 3.3V lines to wire, for the outputs from the fpga, then I can test the board carefully. I have been checking for shorts to adjacent pins on the relatively high density FX2 and IDC 40 connectors, but also on the all component connections.

Next up would be power, ground and other shorting tests. If the board seems good, I hook it up to power and test by driving various input and output lines to see that the corresponding side responds appropriate.

There will be some logic changes for the fpga, since this board has sector number data but no index marker pulses, thus I bypass my counting circuit and use the Diablo derived sector number. Those changes were made, but will have to wait until the board is ready before being tested.

I hooked up the scope to the testpoints on the disk drive board in order to test the current timings before I make any changes to the drive. I saw a nice clean pulse on TP3 but the one coming from TP5 was ugly - ringing and jitter. Timing was very hard to read but I thought I had finally figured it out.

I removed the two fixed resistors, inserted potentiometers in their place, and tried to adjust the timing properly. I found the scope display so obscure I am not sure whether I got it right. The test point is a mixture of three signals, two of them the 440 and 460 pulses. They are switched based on whether the prior bit was a 1 or a 0, to compensate for bit shift.

I saw all three signals, jumping around, and was just not sure that I saw the long time (460 ns) edge. Thus, I feel decent about the short time, 440 ns, but far less so for the other one. I am not ready to use the drive and need to get the timing more exactly adjusted.

As a consequence, I am going to add micrograbbers to the output pins of the two timing circuits, that generate the 440 and the 460 ns intervals, to directly observe them. I will either see the pulse or not, depending on whether a 1 bit or 0 bit was seen, but when it fires it will always be one duration.

With the better scope points, I was easily able to distinguish and adjust the two timers to dead on the 440 ns and 460 ns targets. I then measured the potentiometers to determine that I need a 144.5K and a 53.1K fixed resistor to make the drive run with these timings. I am off to Anchor Electronics in the morning to pick them up.

 In the early evening, I printed out the WriteField, ReadField, WriteSector and ReadSector FSMs in order to apply the learnings from the read functions to my write logic. I also worked on the board a bit more, wiring up 4 of the 15 output signals from the fpga. 

Thursday, October 20, 2016

Added automatic reread for sectors with checksum errors, discovered need for Diablo mod, working on new board

ALTO DISK TOOL

My first test of the morning showed that the USB link was no longer working, which I think I had traced to an unconnected signal from my new register logic. Half an hour later, I tested that hypothesis and the corrected code.

I retrieved the checksum status fields perfectly and began to map out the bad sectors. I noticed that head 1 (lower surface) began to get quite a few sector errors starting around cylinder 128 and continuing out to nearly 150, but were most clustered around the 130s. This would be consistent with a surface defect on the disk.

Of course, cyl 0 head 0 was erased by my procedural error of the day before, but I should be able to rewrite those with my disk tool. I will go over my WriteSector logic, enable it and give a test on one of the sectors of that first track.

I made the decision to add an auto-recovery option to the ReadEntireCartridge function, one that will iterate up to 32 times if a given sector has checksum errors. If it gets a clean read, it moves on, otherwise it tries up to the maximum. This will be controlled by one of the slide switches on the board.

Initially this was not included on the recommendation of one of the team, based on his experience were lingering over a bad spot resulted in the failure increasing in seriousness. After some analysis of the practical effect with and without the auto-recovery, I decided to go ahead.

The disk head flies over the surface of the platter, thus lingering will only be a problem if the surface is raised enough to impinge on the head or disturb its air cushion. Thus, for any defect that does not involve a substantially raised surface (relatively speaking, since we are talking about a flying height of 7 thousands of an inch.

Next, the reality is that the head is flying somewhere on the disk from the moment the heads are loaded at startup until it is switched off. In practice, the head flies above the last cylinder that the arm did a seek to, until the next time a seek is performed. It does this whether or not we read anything.

The cylinders are roughly 1/100 of an inch apart, so that the low spot on the head where contact might occur is wide enough to span over a dozen or so cylinders. Thus, lingering anywhere in that 12-ish cylinder zone around the high spot is a risk.

Whether one reads the sector 1000 times in a row or not, the arm is sitting at some cylinder and the disk is spinning under the flying head. From a risk standpoint,  the only way we lower the risk is by moving the arm away from the defect. The must be more than 12 cylinders away to be safe, thus the risk zone is 24-ish cylinders wide or about 18% of the entire disk.

My ReadEntireCartridge  function takes only a few seconds to complete. We need no more than four rotations, 1/10 second, to read all 24 sectors, plus the time to seek to the next cylinder. If it were to reread 5% of all the sectors, and those were to require 20 retries each, it would only double the linger time.

Finally, consider that manual rereading involves seeking to the cylinder, where the head will fly for much longer than then time to auto-recovery and move on. The most serious risks are when the raised flaw is within the first 12 or last 12 cylinders, as the arm hovers at 0 when loaded and hovers at 202 when my function is completed.

No amount of caution will protect against damage if the flaw is in those two critical areas. Practically speaking, a user attempting to recover sectors missed by a one-pass, non recovering function would park the head in the remaining 179 cylinders far longer than would the autorecover version.

I completed the change to the logic, which was more minor than I expected, and set in to test in the late afternoon. The logic ran through all the sectors, pausing for noticeable time to handle the retries, and ultimately completing. The checksum status vector was dumped and shows 30 sectors that weren't recovered even with 31 retries apiece, but that is a failure rate of 0.615%, an order of magnitude better than the one-pass method.

Al raised the point of the timing board modification made to Diablo 31 drives for use with the Alto - two resistors, F28 and H53, on board J10 are replaced with values that accommodate the slightly faster bit rate of the Alto compared to the Diablo spec.

I looked at the board and the resistors don't look reworked, the seem to be the original versions. Every drive has hand selected resistors for these two components, making it impossible to check against a schematic, but my guess is this drive is not from an Alto.

Al confirmed that this is a stock Diablo drive, which could explain the errors I am getting on reading. I could modify this drive or just accept the error rates for now and see how well things read when on the drive attached to the Alto.

I have picked up a pair of 200K potentiometers, which are required to tune the Diablo board to the window duration needed for the Alto - 440 to 460ns - rather than the factory default 450 to 470ns. They will arrive tomorrow when I can determine the needed resistances, then pick up fixed resistors of those values and solder them onto the board.

I continued wiring up the new disk driver board, with all the ground wires now added and four of the last 15 signal wires from the Diablo cable wired in by later afternoon. After dinner, I had another four of the Diablo signal wires hooked up, just seven to go.

By bedtime, all the +5V lines were completed and all that is left are the 3.3V lines between the fpga connector and the level shifter components.

I continue to have issues with the really hard, stiff ribbon cable connected to the Diablo connector. It is way too tough for press-on IDC connectors. Still thinking about the best way to deal with this. I have a female Diablo terminator that could be wired to a regular ribbon cable - thus have ordered the parts to try this.

Wednesday, October 19, 2016

Wiring new board, creating checksum status upload function, issue with track 0

Today is the day I spend with the 1401 restoration team at CHM. Everything was working well, so I left early and returned to work on the disk tool.

ALTO DISK TOOL

I built up a wiring plan for the new disk driver role extension board, the one that will go to Al Kossow for use in archiving Alto cartridges. This involved signal assignments to various level shifter components and careful recording of the IDC40 pins for each signal.

I am more and more convinced that I need an easy way to display or extract the checksum error status bits that reflect whether a checksum verifiction error occured in each of the three records in each of the sectors. With 4, 872 sectors and three records apiece, it is far more information that I can readily display with the LEDs and seven-segment displays on the fpga board.

I had hoped to record some of the status on the VGA display, although even here there is no reasonable way to display 14, 616 separate status bits on a single screen, yet be able to see where the errors occured.

With 8 lights per seven-segment display, four digits of those, and eight separate LEDs, I could display only 40 bits of data at a time, needing to cycle through that 366 times to see the entire cartridge results.

I have a switch option to stop on the first error during the ReadEntireCartridge function, allowing me to potentially run with stops for each bad sector. I will test this right away, but I think I need a more permanent record anyhow, to go along with the extracted disk files I produce.

My test showed that the logic will stop when one or more of the checksum errors are detected, and will restart with the next sector when the transaction button is pushed again. While doing this, I discovered that some sectors that had been reading clean were now getting errors.

I found some with errors that would read good sometimes, bad sometimes. Others, including sector 0 which I have been using quite a bit, now read bad on all three records and do so consistently. Worried about the condition of the heads and platter, I shut down and inspected.

The disk platter appears superficially good, but I will disassemble it to look more thoroughly. The bottom head is easy to see and looks okay, while the top head is nearly impossible to image clearly with a camera. I applied a hand mirror and both heads look very clean.

Only three possibilities left to explain why S0 might be bad now - one is that the arm is out of alignment and increasing its deviance, which seems unlikely, the other two are that somehow the write gate is turning on and spewing zeroes over the first track, cyl 0 head 0, for all 12 sectors. The latter two may be due to procedural errors.

If it is the latter, I can rewrite the track using the stored image from bitsavers, also debugging my write sector logic, but first I have to figure out what caused the problem and prevent it from ever happening again.

I suppose it could have been procedural, when I used the Adept utility to dump memory with a method that loads its own bitstream to the fpga, overwriting what is there, and resetting the board to run that new logic. If I left the drive spinning when this happened, it could have turned on the write and erase gates, affecting the current cylinder and head.

I will piggyback on the File I/O capability of the Adept utility, adding another set of memory interface registers 16 to 22 which will read out the sector status bits, one byte per sector. I just have to copy, paste and edit the existing logic to create the new functionality. Reading the disk buffer RAM uses registers 8 to 14, with the utility set to 14, thus reading the status vector needs to set the utility to register 22 which is the corresponding mechanism.

I loaded the new bitstream with the checkbit vector upload functionality and did some testing. While I was at it, I hooked the scope to the ReadData and ReadClock lines to see what is coming in from track 0, which I suspect was erased.

The data is definitely irregular, especially the clock signal, which tells me that the track was definitely erased. I can use my WriteSector logic to restore the contents, although I am not yet ready to test that out.

The checksum bit reading logic did not appear to work on my first test, after having read the entire cartridge and observed a number of checksum error flickers on the LEDs. I fired up for another test, manually working the registers to see if the transaction appears to be working.

After a bit of debugging, I figured out the problem and created a new bitstream, but it was time for bed. Tomorrow morning I will test again and expect this to work.

Working on the new board, I ran all the +5V and +3.3V power wiring, then began on the ground wiring. I have all the input signals routed from the IDC connector to the level shifters. More tomorrow. 

Tuesday, October 18, 2016

Read entire cartridge works, file produced as desired

ALTO DISK TOOL

My address incrementing during the ReadEntireCartridge transaction is working properly and the stall condition is fixed now that the sector matching FSM is always producing results. At worst, we will miss one rotation of the disk rather than stall.

I am now verifying the addresses being used to write extracted words to RAM, to be sure my changes that reverse word order and that use concatenation instead of addition/subtraction are all working properly.

I am setting up the proper addresses to the ReadField transaction, based on what I see from the logic analyzer trace, but now I need to zoom in on the memory addresses going to the RAM access FSM to be sure that the decrementing makes sense and addresses are proper.

I ran tests after lunch, found a few defects and pressed on. Eventually, I noticed three odd things. First, I saw two cycles of the memory request trigger signal in some cases, one cycle in others. Second, the logic analyzer only captured 10 or so memory writes in a sector, then it failed to spot others. Third, the label field in memory seemed to have written the checksum as the first word.

I needed to understand these conditions,requiring a mix of pondering, logic inspection and testing.I decided that it would be better to convert the memory access FSM and the four FSMs that call it to use an interlocked protocol - hold the request high until the memory FSM is done, then drop the request allowing the memory machine to go idle.

I completed those changes, built a bitstream and tested in the evening after dinner. It is behaving better but I still have some issues with the countdown logic. I changed some more logic, burned another half hour and tested.

I spotted another flaw in my logic and corrected it, built a bitstream by 9PM and did my final testing of the day. Good news - it worked well. The files produced are exactly what I expect to see and match to the extent I can check random sectors to the xmsmall.dsk file.

I did see some checksum error lights flickered - perhaps 1-2% of the sectors had an error. I need to build out a transaction to fetch the error bit vector into a file, allowing someone to know which sectors had problems and in what record. Potentially we can reread problem sectors several times until we get a good version of the sector.

I installed all the terminator resistors for the inbound signals from the Diablo to the fpga board, on the new second driver board I am building. I have a ribbon cable with the disk connector on one end, to which I tried to fit an IDC 40 cable connector on the other side. This type of connector should pierce the cable to make a 'vampire' connection to its intended wire.

After I pressed the connector onto the wire, I used the VOM to check the wiring connectivity to the pins of the disk connector, but quickly found that all pins are shorted together end to end. The plastic cable is apparently too tough for the vampire metal to cut down into, forcing it to spread as it bit into the cable and to short all adjacent connections.

I tried a second time, same result. It might be a misalignment problem, but I may have to consider replacing the entire ribbon cable. I disassembled the disk connector side but the cable connects to the PC board with three rows, not two rows, blocking me from installed an IDC40 socket.

Tried another connector to press on, but got the same result of complete shorting. I suspect I am going to have to separate the individual leads in this ribbon cable and a length of more typical cable, graft the ends together and thereby get the IDC connector on the line. 

Monday, October 17, 2016

Debugging of ReadEntireCartridge transaction of disk tool, beginning build of second disk tool

ALTO DISK TOOL

Ken Shirriff has put together a Python program to transform the PC files captured from my tool into the disk archive format used on bitsavers and with the Contralto simulator. This saves me from having to do the same thing and lets me focus on other tasks still required to complete the disk tool.

This morning I completed the refinement of the RAM addressing change (using asssembly of fixed strings rather than addition/subtraction) and some diagnostic outputs to allow me to watch the incrementing of cylinder, head and sector values during the ReadEntireCartridge transaction.

I had to rewire the probes to the logic analyzer to suit the major change in status signals needed to watch the address incrementing. With all that done, it was time to fire up the testbed. I did several runs.

The results were inconclusive, I need to monitor other signals first to see what it is doing during the incrementing of the registers. Of course this will cause another 30 minute pass to generate the bitstream, but at least I can eat lunch during the delay.

My addressing changes did appear to have broken the write into RAM, or stored data somewhere different, so that is another aspect to debug. The results of the new testing showed me a flaw in my ReadEntireCartridge transaction where I was not sufficiently interlocked with the lower level ReadSector and Seek FSMs.

I redesigned the high level FSM, keeping in mind that there is a one cycle delay in actions that are taken to change the Regxxxx registers including triggering low level transactions, seeing completion and resetting completion. I am working my way through debugging the new design, which is more interlocked and therefore can stall if I get things wrong.

My new interlocking ran afoul of a change I made a few days ago to allow me to reset transactions that stalled. The way I attempted to handle the reset was triggered by the operation of the ReadEntireCartridge  transaction with its new interlocking.

The changes were backed out for the stall reset logic, keeping the new high level FSM design, and I spent a quick 30 minutes preparing for the last test of the evening. What I see is that I am now processing the bump of the address correctly, incrementing by one.

The stall occurs when trying to read after completing a seek. Sometimes it gets through several seeks correctly, other times it fails on the first seek. I believe that I am either failing to trigger the FSM that waits for a target sector to come under the heads, or failing to see the completion signal when the sector is matched.

I will work out an interlocking between the sector match FSM and the transactions that use it, build those into the logic tomorrow and then test. It makes sense that I will change out the diagnostic signals to the logic analyzer, in anticipation of the areas I am checking out tomorrow. I am done with testing for today.

I began constructing the driver role board for Al Kossow's version of the tool, now that I am comfortable that all the analog stuff works well for reading and archiving packs. I will make use of a ribbon cable and connector for the Alto to Diablo link that he provided to me, wiring it to the board and hooking up the fpga board he purchased.

I think I will make use of an IDE style socket, although pin 20 of the cable is used while the classic IDE socket has a blank in that position to ensure the cable is not reversed. Fortunately, I have a pure socket with 2 x 20 female positions which provides the needed pin 20 signal.

Supporting his cable means that I have to deal with the signals as they exist on the cable, which are different from the ones on my disk cable on the prototype. I went through all the documentation and decided that I had to produce 16 output signals and handle 12 input signals when using the Alto-Diablo cable.

Sunday, October 16, 2016

Read Sector working well with data in (un)reversed order, working on reading entire cartridge

ALTO DISK TOOL

I have confirmed that the data I extracted to a PC file from cylinder 0, head 0 and sector 0 of the real disk cartridge exactly matches the data on the disk since it matches the archive file of that disk that exists on bitsavers.org

My data is stored in disk order - where the data is stored with the last word of a record first, reversed so that the last word to come off disk is the first word of the record. The Alto loads the data into memory in reverse order, starting at the highest address in the buffer for the first word received and ending with the last word from disk placed at the lowest address, what we think of as the first word of that buffer.

The Alto uses that fact cleverly during boot. The boot firmware puts a spin instruction at location 0 which just loops to itself, then initiates a read from disk of cyl 0, head 0, sector 0 to place the 256 word data record of that sector in memory starting at location 0000.

The disk contents are stored in memory, starting with the first word from the disk at address 255 decimal, and proceeding down to lower addresses. As the disk read proceeds, location 0 is still a jump to itself keeping the processor in a tight loop. When the final word from disk is stored in memory, at location 0000, it overwrites the jump instruction and thus begins executing instructions from the disk sector from 0 onwards.

I modified the operation of my ReadField and WriteField FSMs to match the memory ordering, reversing the storing of words from disk. This should produce an exact match for the 266 words of each sector between what is on a disk archive file and what I produce.

Testing with my disk cartridge, I was able to match it up byte for byte with the bitsavers disk archive named xmsmall.dsk which does match the label on the physical cartridge. Other than the extraneous word at the front, it is now a match for the content of the sectors.

I had also instrumented the LEDs to show me any FSM that was not at its idle position, so that when the disk tool stalls again I can immediately spot which FSM(s) are confused. I set up for a test this morning with the new bitstream.

It never stalled - evoking the old adage that a watched pot never boils - but it did give me the data as I wished, in reversed word order, to match online disk archives and what the programmer would see in memory after issuing a read.

I do see that the online disk archive files have an extraneous word in front of the sector contents, which needs explanation before I can transform my output files into an acceptable archive file. The Contralto simulator uses the files and it simply ignores the first word from the file.

I moved on to test out my transaction to read an entire cartridge into RAM. I discovered two problems with this - first, it halts as soon as the finger is off the btn to trigger the transaction, and second it is not setting up the memory address properly so that I see the last cylinder's contents written in the position for cylinder 0.

I did a test to see whether the read field logic is not properly addressing cylinders above 0, or whether this is restricted to the read cartridge transaction. I saw that data was stored in the appropriate locations in RAM, so this was an artifact induced in the 'read entire cartridge' function.

I found the cause of the FSM needing button 1 to be depressed for it to continue up to cylinder 202, corrected the problem, and made a change to add an additional cycle for the bump of cylinder, head and sector numbers before triggering the read of the next sector. Half an hour later, the bitstream was ready for a test.

Nothing was working properly, all out of proportion to the two small changes I had made, indicating this is one of those odd times when the toolchain produces useless dreck as a bitstream. I made some other changes and synthesized again, now approaching late afternoon.

I also decided to swap the fpga board from the 1200K gate version down to the 500K gate variant that will be used for the tool in production, both the one used with the Alto restoration and a second tool to be used by Al Kossow to archive the contents of various disk cartridges.

I had to turn off the video display,which used quite a few gates, in order to fit in the 500K capacity. Other changes were also necessary, all in all the rework burned up a few hours.

When I did my evening testing, all appeared to be back to where I expected. The ReadSector function works fine, the Seek function works fine, and the ReadEntireCartridge function now runs to completion (cylinder 202) without having to hold button1 down continuously.

I dumped the file and checked, but found that only the even numbered sectors were read, implying I have a flaw in the incrementing logic for the ReadEntireCartridge function. It seemed that the logic stepped through every single cylinder, but I couldn't see how it alternated the head and sectors.

I did think of a way to improve some of my logic. The way I wrote the VHDL, it uses a chain of two adders to calculate the starting memory address as S + C - 1 (S is high order bits from cyl, head and sector and C is the word count), but I can assemble this without using an adder.

I have a memory address that has a high part generated from cylinder, head and sector values. It has a low part that is 9 bits representing the word within a sector. The high part is assembled from 8 bits of cylinder, 1 bit of head and 4 bits of sector number. All together, 22 bits of address.

I realized that the low part of the memory address is one of three constant values, x0002 for the header record, x000A for the label record and x100A for the data record. There are no carries that will propagate to the high 13 bits of the address.

I will also change instrumentation to let me watch, with the logic analyzer, as the  ReadEntireCartridge  function advances through all the sectors. That way, when I test tomorrow, I can fix up this last problem and be ready to capture the contents of the four cartridges we have with the Alto and the two that I have at home.

Saturday, October 15, 2016

Read sector functionality of the Alto-Diablo disk tool is working well, minor issues to address overall

Morning spent as Volunteer Examiner administering ham radio license tests, back to tool in the afternoon.

ALTO DISK TOOL

I came up with a new scheme for the deserializer operation during idle time at the exam session, then implemented it when I got back to my house. It has separated the bit timing into discrete state machines.

One FSM is tracking the timing of the Read Clock pulses, another will look for a '1' data bit value on Read Data only at the appropriate portion of the Read Clock cycle, and the third will handle recognition of the sync word 0000000000000001 when beginning to read any of the three records in a sector.

I trigger the deserializer to collect bit values with some combinatorial logic that looks at the timing in a Read Clock cycle and the sync or unsynced state. This should tie into the deserializer module itself without requiring a rewrite.

Another thing I had suspected from the data recorded on the disk is that the Head selection signal is inverted (logically) compared to what the software requests. That is, when the disk interface signal is on (0V level), the top head is selected.

This would normally represent a '1' on the Head line, but an extra inverter is introduced on the Alto disk controller card (via an Intel 3404 latch) such that the software uses a '0' to drive a '1' on the head line which means it drops to 0V and the drive uses the top head. I have to invert my logic in the disk tool to match this usage.

After it was turned into a bitstream, I fired up the testbed and read a sector again. The logic seems to be working better, but I have a problem with timing, between when I sync and when I record bits for deserialization.

I added an 'armed' stop to the sync state FSM as a way of blocking that bad behavior. I ran again and did indeed sync properly and deserialize the data words properly. The flaw I am currently working on is that the ReadSector FSM is jumping to read the middle record, the eight byte label record, rather than knowing that it is handling the two word header record.

This flaw causes it to read the checksum word correctly but it is not recognizing it as the checksum nor doing the comparison for validity. I have to debug this condition but feel that the basics of extracting parallel words from records is in good shape.

What I saw was the logic trying to sync up at less than 80 us into the sector, far too early. It should be obeying a preamble delay of 201.6 us before looking at any incoming data bits. I have to correct this. I do see the proper sync word at around the nominal 336 us into the sector.

I found the problem. Doh. For some reason, when I calculated the number of fpga cycles to produce given durations, I was using 50ns as the cycle time but this is running at 20ns. That means my delay counts need to be 2 1/2 times as big to produce the intended delay. I made the corrections, invested the half hour in producing a bitstream, and tested again.

The first time I read a sector after the disk tool is reset, it appears to read the entire sector properly with no checksum errors. I verified that the first two records were properly read and their checksum matched what was read from disk, but I didn't go all the way to the end of the disk to check the tail of the long data record.

However, each subsequent try seemed to ignore the work I did on the long preamble, instead syncing on the earliest 1 bit it found regardless of the preamble. I have some defect in how one or more state machines resets when it completes one cycle.

I had made great strides by dinner time, with a few bugs left to iron out and some steps left to check but significant parts of the functionality working properly.
After dinner, I looked over logic and set up various diagnostic signals to help me sort out what was happening. I discovered the counter was too small inside the ReadField state machine to handle the new, larger and now correct counts being used for delays.

It was also helpful to build up a table of cycle counts from the sector mark to various words that were part of the sector being read, so that I could program in reasonable delay counts to the logic analyzer trigger to observe any 80 us wide interval I wished. There are roughly 42 such intervals in one sector, stacked end to end, making the observations slightly tedious. 

I worked through several sectors and can now consistently read the entire sector correctly, matching checksums, storing contents in memory and restore to where I can read another sector. I checked against the full set of items to verify and have met all of them:
  1. Readsector logic is waiting for a sector match
  2. Indexmarker occurs, signalling that next sector mark is for sector 0
  3. Sectormark occurs, thus we are in sector 0 now
  4. Gotsector is emitted to indicate we have a match
  5. Readsector logic moves on to setup for record 1 of the sector
  6. Readfield logic is triggered
  7. Readfield waits for approximate 200 us preamble before looking at incoming data
  8. Roughly 120 us of zero data bits are seen
  9. A 1 data bit completes the sync word
  10. the logic recognizes the synced state
  11. Two words of the header record are deserialized, extracted and saved
  12. The checksum is calculated correctly for the two words of the header record
  13. The next word is deserialized, extracted and used as a checksum
  14. The checksum verification test occurs properly
  15. The readfield logic completes
  16. The deserializer goes to the unsynchronized state
  17. The readfield logic for the next record, label, begins
  18. The appropriate preamble is passed before we look for sync
  19. Enough zero bits are read to properly set up sync logic
  20. A 1 bit is read and the sync condition is attained
  21. Eight words of the label record are deserialized, extracted and saved
  22. The checksum is properly calculated for the 8 label words
  23. The next word is deserialized, extracted and used as a checksum
  24. The checksum verification test is done correctly
  25. The readfield logic ends
  26. Sync is dropped
  27. The readfield logic is entered for the data record
  28. A suitable preamble is passed before attempting sync
  29. Enough zeroes are read for the sync engine to work properly
  30. A 1 bit arrives and we attain the sync condition
  31. 256 words are deserialized, extracted and saved as the data record
  32. The checksum is properly calculated
  33. The next word is deserialized, extracted and saved as the checksum
  34. The final checksum test is done properly
  35. Readfield logic ends
  36. Sync is dropped
  37. The readsector logic completes
  38. Appropriate completion status is set in Reg0001
  39. The next sectormark does not occur until after step 36
I observed one problem and one discrepancy from the model of a disk sector to which I am engineering. The discrepancy is that we seem to be displaced about 60-70 us, from the times I would have expected from the Alto microcode. The problem is that the disktool sometimes gets into a state where it will no longer read sectors, although disk seeks are still functional.

The length of a sector mark is only 5 us, far too short to account for the added time. I am forced to speculate that the word task is doing other things or the execution time of the sector task is long enough to result in this delay from the 'ideal' timing.

Ideally, a sync word will exist right around the 34-35th word time, roughly 330 us into the sector, but real data on the drive has the sync word in the 390 to 400 us zone, Everything from that point is correspondingly shifted later in the sector, but it all fits neatly into the 3,333 us available.

The long preambles at the beginning of a sector will accommodate these delays in the start of a sector when reading and real Alto systems should be able to tolerate an earlier sync word being written by this tool. Therefore, I won't do anything based on this observation.

The hang is troubling. It occurs some random time after the tool is initialized, such that I am doing a number of seeks when abruptly the button does not respond any more. I believe that I have one or more of the state machines wedged in a non-idle state. I will test this with new instrumentation displaying the idle status of eight of my FSMs; any light that is out indicates a wedged machine. 

As a final test, I examined a file dump after having read sector 0, but all I can see is that the data recorded is plausible. I see the proper disk address in the header words and have eight label words that look very much like those on the label field of the two online disk images I grabbed from bitsavers. 

I found two online images, one named diags.dsk and the other xmsmall.dsk, but the contents of cyl 0, head 0, sector 0 which should be the boot sector does not match what I see on my disk. The sector 0 contents of those two images are different from each other, too, which adds to the mystery.

I will pass the data along to Ken to see if he can interpret it, certainly the header and label fields should pass some clear validity checks, and then he can determine if the data sector looks like well formed Nova instructions. 

I ran out of energy as it got late, but will work on the interpretation of contents, tracking the wedged state machine(s), and move forward on the high level 'Read entire cartridge' state machine. I am pleased that every sector I have read has come through clean, with all checksums matching.