Saturday, October 22, 2016

Mod made to Diablo disk, no change in error rate, board wiring completed


I completed the modifications to the Diablo drive, such that its short and long times are now 440 ns and 460 ns, in compensation for the 'overclocking' done by Xerox Alto computers when writing 10% above the Diablo factory spec.

I ran the ReadEntireCartridge function again, both with and without the auto-retry on checksum errors, and got essentially the same results. I had opened the cartridge, inspected both platter surfaces, cleaned the disk and also cleaned the Diablo drive heads.

There remains the possibility that this drive and the one that originally wrote the data are not aligned exactly the same. The practice with drives of this era was to use a special "CE" cartridge and an oscilloscope to adjust the arm positioning of every drive, thereby hoping that two things that are closely aligned to a third thing would be closely aligned to each other.

If each adjustment is off by 1%, the two drives could be 2% different in the worst case. With the track positioning of .01" and a guard band between recorded tracks of .003" it wouldn't take much misalignment to have the head reading near the edge of a track rather than directly over the recorded signal.

This suggests to me that we may not be able to retrieve every sector of every cartridge perfectly, instead settling for > 99%. Another factor to consider is the weak error detection mechanism used, a simple XOR checksum which means any even number of bit errors in a 'column', e.g. modulo 16 in the bitstream, would falsely pass the checksum test.

Since the raw rate - a single read of each sector - hit a bit over 6%, but was dropped to 0.6% by retrying any bad sector up to 31 times, we may have errors in content that are misreported as good sectors.

This weakness in checksum strength was a fact of life with the Alto. Every time a user ran the system, they faced the chance that multiple bit errors would slip under the radar and deliver bad data to the user.

I worked further on the new disk driver board, such that by midafternoon I had only 7 fpga signals left to wire before I do the extensive checkout. At dinnertime, the entire board was wired and I went through a connectivity and shorts test. Everything checked out, so it is time for a function test tomorrow. 

Friday, October 21, 2016

Doing Alto modification to Diablo disk drive, working on Write logic and building up new driver board


I set in to the wiring of the 3.3V fpga signal lines to all the level shifters. By lunchtime I had 7 of the 27 wires run. When I got back from some chores in the midafternoon, I went back to working on the board. In addition, I worked through my WriteField and WriteSector logic, to begin testing the writing/updating functionality of the tool.

I will initially test the write functions with the WriteGate and EraseGate held in the off condition, so that I can watch the flux reversals being emitted without committing them to disk until I am satisfied. This required a big revamp of the logic analyzer signal assignments and trigger conditions.

By 3PM, all 12 of the input signals were wired to the fpga and the 'personality' signals, Sense20 and Sense22, were grounded to indicate that this is the Alto style driver role board, distinct from the oddball format driver role board I built around an existing cable, and distinct from the disk emulator role board.

I have 15 more 3.3V lines to wire, for the outputs from the fpga, then I can test the board carefully. I have been checking for shorts to adjacent pins on the relatively high density FX2 and IDC 40 connectors, but also on the all component connections.

Next up would be power, ground and other shorting tests. If the board seems good, I hook it up to power and test by driving various input and output lines to see that the corresponding side responds appropriate.

There will be some logic changes for the fpga, since this board has sector number data but no index marker pulses, thus I bypass my counting circuit and use the Diablo derived sector number. Those changes were made, but will have to wait until the board is ready before being tested.

I hooked up the scope to the testpoints on the disk drive board in order to test the current timings before I make any changes to the drive. I saw a nice clean pulse on TP3 but the one coming from TP5 was ugly - ringing and jitter. Timing was very hard to read but I thought I had finally figured it out.

I removed the two fixed resistors, inserted potentiometers in their place, and tried to adjust the timing properly. I found the scope display so obscure I am not sure whether I got it right. The test point is a mixture of three signals, two of them the 440 and 460 pulses. They are switched based on whether the prior bit was a 1 or a 0, to compensate for bit shift.

I saw all three signals, jumping around, and was just not sure that I saw the long time (460 ns) edge. Thus, I feel decent about the short time, 440 ns, but far less so for the other one. I am not ready to use the drive and need to get the timing more exactly adjusted.

As a consequence, I am going to add micrograbbers to the output pins of the two timing circuits, that generate the 440 and the 460 ns intervals, to directly observe them. I will either see the pulse or not, depending on whether a 1 bit or 0 bit was seen, but when it fires it will always be one duration.

With the better scope points, I was easily able to distinguish and adjust the two timers to dead on the 440 ns and 460 ns targets. I then measured the potentiometers to determine that I need a 144.5K and a 53.1K fixed resistor to make the drive run with these timings. I am off to Anchor Electronics in the morning to pick them up.

 In the early evening, I printed out the WriteField, ReadField, WriteSector and ReadSector FSMs in order to apply the learnings from the read functions to my write logic. I also worked on the board a bit more, wiring up 4 of the 15 output signals from the fpga. 

Thursday, October 20, 2016

Added automatic reread for sectors with checksum errors, discovered need for Diablo mod, working on new board


My first test of the morning showed that the USB link was no longer working, which I think I had traced to an unconnected signal from my new register logic. Half an hour later, I tested that hypothesis and the corrected code.

I retrieved the checksum status fields perfectly and began to map out the bad sectors. I noticed that head 1 (lower surface) began to get quite a few sector errors starting around cylinder 128 and continuing out to nearly 150, but were most clustered around the 130s. This would be consistent with a surface defect on the disk.

Of course, cyl 0 head 0 was erased by my procedural error of the day before, but I should be able to rewrite those with my disk tool. I will go over my WriteSector logic, enable it and give a test on one of the sectors of that first track.

I made the decision to add an auto-recovery option to the ReadEntireCartridge function, one that will iterate up to 32 times if a given sector has checksum errors. If it gets a clean read, it moves on, otherwise it tries up to the maximum. This will be controlled by one of the slide switches on the board.

Initially this was not included on the recommendation of one of the team, based on his experience were lingering over a bad spot resulted in the failure increasing in seriousness. After some analysis of the practical effect with and without the auto-recovery, I decided to go ahead.

The disk head flies over the surface of the platter, thus lingering will only be a problem if the surface is raised enough to impinge on the head or disturb its air cushion. Thus, for any defect that does not involve a substantially raised surface (relatively speaking, since we are talking about a flying height of 7 thousands of an inch.

Next, the reality is that the head is flying somewhere on the disk from the moment the heads are loaded at startup until it is switched off. In practice, the head flies above the last cylinder that the arm did a seek to, until the next time a seek is performed. It does this whether or not we read anything.

The cylinders are roughly 1/100 of an inch apart, so that the low spot on the head where contact might occur is wide enough to span over a dozen or so cylinders. Thus, lingering anywhere in that 12-ish cylinder zone around the high spot is a risk.

Whether one reads the sector 1000 times in a row or not, the arm is sitting at some cylinder and the disk is spinning under the flying head. From a risk standpoint,  the only way we lower the risk is by moving the arm away from the defect. The must be more than 12 cylinders away to be safe, thus the risk zone is 24-ish cylinders wide or about 18% of the entire disk.

My ReadEntireCartridge  function takes only a few seconds to complete. We need no more than four rotations, 1/10 second, to read all 24 sectors, plus the time to seek to the next cylinder. If it were to reread 5% of all the sectors, and those were to require 20 retries each, it would only double the linger time.

Finally, consider that manual rereading involves seeking to the cylinder, where the head will fly for much longer than then time to auto-recovery and move on. The most serious risks are when the raised flaw is within the first 12 or last 12 cylinders, as the arm hovers at 0 when loaded and hovers at 202 when my function is completed.

No amount of caution will protect against damage if the flaw is in those two critical areas. Practically speaking, a user attempting to recover sectors missed by a one-pass, non recovering function would park the head in the remaining 179 cylinders far longer than would the autorecover version.

I completed the change to the logic, which was more minor than I expected, and set in to test in the late afternoon. The logic ran through all the sectors, pausing for noticeable time to handle the retries, and ultimately completing. The checksum status vector was dumped and shows 30 sectors that weren't recovered even with 31 retries apiece, but that is a failure rate of 0.615%, an order of magnitude better than the one-pass method.

Al raised the point of the timing board modification made to Diablo 31 drives for use with the Alto - two resistors, F28 and H53, on board J10 are replaced with values that accommodate the slightly faster bit rate of the Alto compared to the Diablo spec.

I looked at the board and the resistors don't look reworked, the seem to be the original versions. Every drive has hand selected resistors for these two components, making it impossible to check against a schematic, but my guess is this drive is not from an Alto.

Al confirmed that this is a stock Diablo drive, which could explain the errors I am getting on reading. I could modify this drive or just accept the error rates for now and see how well things read when on the drive attached to the Alto.

I have picked up a pair of 200K potentiometers, which are required to tune the Diablo board to the window duration needed for the Alto - 440 to 460ns - rather than the factory default 450 to 470ns. They will arrive tomorrow when I can determine the needed resistances, then pick up fixed resistors of those values and solder them onto the board.

I continued wiring up the new disk driver board, with all the ground wires now added and four of the last 15 signal wires from the Diablo cable wired in by later afternoon. After dinner, I had another four of the Diablo signal wires hooked up, just seven to go.

By bedtime, all the +5V lines were completed and all that is left are the 3.3V lines between the fpga connector and the level shifter components.

I continue to have issues with the really hard, stiff ribbon cable connected to the Diablo connector. It is way too tough for press-on IDC connectors. Still thinking about the best way to deal with this. I have a female Diablo terminator that could be wired to a regular ribbon cable - thus have ordered the parts to try this.

Wednesday, October 19, 2016

Wiring new board, creating checksum status upload function, issue with track 0

Today is the day I spend with the 1401 restoration team at CHM. Everything was working well, so I left early and returned to work on the disk tool.


I built up a wiring plan for the new disk driver role extension board, the one that will go to Al Kossow for use in archiving Alto cartridges. This involved signal assignments to various level shifter components and careful recording of the IDC40 pins for each signal.

I am more and more convinced that I need an easy way to display or extract the checksum error status bits that reflect whether a checksum verifiction error occured in each of the three records in each of the sectors. With 4, 872 sectors and three records apiece, it is far more information that I can readily display with the LEDs and seven-segment displays on the fpga board.

I had hoped to record some of the status on the VGA display, although even here there is no reasonable way to display 14, 616 separate status bits on a single screen, yet be able to see where the errors occured.

With 8 lights per seven-segment display, four digits of those, and eight separate LEDs, I could display only 40 bits of data at a time, needing to cycle through that 366 times to see the entire cartridge results.

I have a switch option to stop on the first error during the ReadEntireCartridge function, allowing me to potentially run with stops for each bad sector. I will test this right away, but I think I need a more permanent record anyhow, to go along with the extracted disk files I produce.

My test showed that the logic will stop when one or more of the checksum errors are detected, and will restart with the next sector when the transaction button is pushed again. While doing this, I discovered that some sectors that had been reading clean were now getting errors.

I found some with errors that would read good sometimes, bad sometimes. Others, including sector 0 which I have been using quite a bit, now read bad on all three records and do so consistently. Worried about the condition of the heads and platter, I shut down and inspected.

The disk platter appears superficially good, but I will disassemble it to look more thoroughly. The bottom head is easy to see and looks okay, while the top head is nearly impossible to image clearly with a camera. I applied a hand mirror and both heads look very clean.

Only three possibilities left to explain why S0 might be bad now - one is that the arm is out of alignment and increasing its deviance, which seems unlikely, the other two are that somehow the write gate is turning on and spewing zeroes over the first track, cyl 0 head 0, for all 12 sectors. The latter two may be due to procedural errors.

If it is the latter, I can rewrite the track using the stored image from bitsavers, also debugging my write sector logic, but first I have to figure out what caused the problem and prevent it from ever happening again.

I suppose it could have been procedural, when I used the Adept utility to dump memory with a method that loads its own bitstream to the fpga, overwriting what is there, and resetting the board to run that new logic. If I left the drive spinning when this happened, it could have turned on the write and erase gates, affecting the current cylinder and head.

I will piggyback on the File I/O capability of the Adept utility, adding another set of memory interface registers 16 to 22 which will read out the sector status bits, one byte per sector. I just have to copy, paste and edit the existing logic to create the new functionality. Reading the disk buffer RAM uses registers 8 to 14, with the utility set to 14, thus reading the status vector needs to set the utility to register 22 which is the corresponding mechanism.

I loaded the new bitstream with the checkbit vector upload functionality and did some testing. While I was at it, I hooked the scope to the ReadData and ReadClock lines to see what is coming in from track 0, which I suspect was erased.

The data is definitely irregular, especially the clock signal, which tells me that the track was definitely erased. I can use my WriteSector logic to restore the contents, although I am not yet ready to test that out.

The checksum bit reading logic did not appear to work on my first test, after having read the entire cartridge and observed a number of checksum error flickers on the LEDs. I fired up for another test, manually working the registers to see if the transaction appears to be working.

After a bit of debugging, I figured out the problem and created a new bitstream, but it was time for bed. Tomorrow morning I will test again and expect this to work.

Working on the new board, I ran all the +5V and +3.3V power wiring, then began on the ground wiring. I have all the input signals routed from the IDC connector to the level shifters. More tomorrow. 

Tuesday, October 18, 2016

Read entire cartridge works, file produced as desired


My address incrementing during the ReadEntireCartridge transaction is working properly and the stall condition is fixed now that the sector matching FSM is always producing results. At worst, we will miss one rotation of the disk rather than stall.

I am now verifying the addresses being used to write extracted words to RAM, to be sure my changes that reverse word order and that use concatenation instead of addition/subtraction are all working properly.

I am setting up the proper addresses to the ReadField transaction, based on what I see from the logic analyzer trace, but now I need to zoom in on the memory addresses going to the RAM access FSM to be sure that the decrementing makes sense and addresses are proper.

I ran tests after lunch, found a few defects and pressed on. Eventually, I noticed three odd things. First, I saw two cycles of the memory request trigger signal in some cases, one cycle in others. Second, the logic analyzer only captured 10 or so memory writes in a sector, then it failed to spot others. Third, the label field in memory seemed to have written the checksum as the first word.

I needed to understand these conditions,requiring a mix of pondering, logic inspection and testing.I decided that it would be better to convert the memory access FSM and the four FSMs that call it to use an interlocked protocol - hold the request high until the memory FSM is done, then drop the request allowing the memory machine to go idle.

I completed those changes, built a bitstream and tested in the evening after dinner. It is behaving better but I still have some issues with the countdown logic. I changed some more logic, burned another half hour and tested.

I spotted another flaw in my logic and corrected it, built a bitstream by 9PM and did my final testing of the day. Good news - it worked well. The files produced are exactly what I expect to see and match to the extent I can check random sectors to the xmsmall.dsk file.

I did see some checksum error lights flickered - perhaps 1-2% of the sectors had an error. I need to build out a transaction to fetch the error bit vector into a file, allowing someone to know which sectors had problems and in what record. Potentially we can reread problem sectors several times until we get a good version of the sector.

I installed all the terminator resistors for the inbound signals from the Diablo to the fpga board, on the new second driver board I am building. I have a ribbon cable with the disk connector on one end, to which I tried to fit an IDC 40 cable connector on the other side. This type of connector should pierce the cable to make a 'vampire' connection to its intended wire.

After I pressed the connector onto the wire, I used the VOM to check the wiring connectivity to the pins of the disk connector, but quickly found that all pins are shorted together end to end. The plastic cable is apparently too tough for the vampire metal to cut down into, forcing it to spread as it bit into the cable and to short all adjacent connections.

I tried a second time, same result. It might be a misalignment problem, but I may have to consider replacing the entire ribbon cable. I disassembled the disk connector side but the cable connects to the PC board with three rows, not two rows, blocking me from installed an IDC40 socket.

Tried another connector to press on, but got the same result of complete shorting. I suspect I am going to have to separate the individual leads in this ribbon cable and a length of more typical cable, graft the ends together and thereby get the IDC connector on the line. 

Monday, October 17, 2016

Debugging of ReadEntireCartridge transaction of disk tool, beginning build of second disk tool


Ken Shirriff has put together a Python program to transform the PC files captured from my tool into the disk archive format used on bitsavers and with the Contralto simulator. This saves me from having to do the same thing and lets me focus on other tasks still required to complete the disk tool.

This morning I completed the refinement of the RAM addressing change (using asssembly of fixed strings rather than addition/subtraction) and some diagnostic outputs to allow me to watch the incrementing of cylinder, head and sector values during the ReadEntireCartridge transaction.

I had to rewire the probes to the logic analyzer to suit the major change in status signals needed to watch the address incrementing. With all that done, it was time to fire up the testbed. I did several runs.

The results were inconclusive, I need to monitor other signals first to see what it is doing during the incrementing of the registers. Of course this will cause another 30 minute pass to generate the bitstream, but at least I can eat lunch during the delay.

My addressing changes did appear to have broken the write into RAM, or stored data somewhere different, so that is another aspect to debug. The results of the new testing showed me a flaw in my ReadEntireCartridge transaction where I was not sufficiently interlocked with the lower level ReadSector and Seek FSMs.

I redesigned the high level FSM, keeping in mind that there is a one cycle delay in actions that are taken to change the Regxxxx registers including triggering low level transactions, seeing completion and resetting completion. I am working my way through debugging the new design, which is more interlocked and therefore can stall if I get things wrong.

My new interlocking ran afoul of a change I made a few days ago to allow me to reset transactions that stalled. The way I attempted to handle the reset was triggered by the operation of the ReadEntireCartridge  transaction with its new interlocking.

The changes were backed out for the stall reset logic, keeping the new high level FSM design, and I spent a quick 30 minutes preparing for the last test of the evening. What I see is that I am now processing the bump of the address correctly, incrementing by one.

The stall occurs when trying to read after completing a seek. Sometimes it gets through several seeks correctly, other times it fails on the first seek. I believe that I am either failing to trigger the FSM that waits for a target sector to come under the heads, or failing to see the completion signal when the sector is matched.

I will work out an interlocking between the sector match FSM and the transactions that use it, build those into the logic tomorrow and then test. It makes sense that I will change out the diagnostic signals to the logic analyzer, in anticipation of the areas I am checking out tomorrow. I am done with testing for today.

I began constructing the driver role board for Al Kossow's version of the tool, now that I am comfortable that all the analog stuff works well for reading and archiving packs. I will make use of a ribbon cable and connector for the Alto to Diablo link that he provided to me, wiring it to the board and hooking up the fpga board he purchased.

I think I will make use of an IDE style socket, although pin 20 of the cable is used while the classic IDE socket has a blank in that position to ensure the cable is not reversed. Fortunately, I have a pure socket with 2 x 20 female positions which provides the needed pin 20 signal.

Supporting his cable means that I have to deal with the signals as they exist on the cable, which are different from the ones on my disk cable on the prototype. I went through all the documentation and decided that I had to produce 16 output signals and handle 12 input signals when using the Alto-Diablo cable.

Sunday, October 16, 2016

Read Sector working well with data in (un)reversed order, working on reading entire cartridge


I have confirmed that the data I extracted to a PC file from cylinder 0, head 0 and sector 0 of the real disk cartridge exactly matches the data on the disk since it matches the archive file of that disk that exists on

My data is stored in disk order - where the data is stored with the last word of a record first, reversed so that the last word to come off disk is the first word of the record. The Alto loads the data into memory in reverse order, starting at the highest address in the buffer for the first word received and ending with the last word from disk placed at the lowest address, what we think of as the first word of that buffer.

The Alto uses that fact cleverly during boot. The boot firmware puts a spin instruction at location 0 which just loops to itself, then initiates a read from disk of cyl 0, head 0, sector 0 to place the 256 word data record of that sector in memory starting at location 0000.

The disk contents are stored in memory, starting with the first word from the disk at address 255 decimal, and proceeding down to lower addresses. As the disk read proceeds, location 0 is still a jump to itself keeping the processor in a tight loop. When the final word from disk is stored in memory, at location 0000, it overwrites the jump instruction and thus begins executing instructions from the disk sector from 0 onwards.

I modified the operation of my ReadField and WriteField FSMs to match the memory ordering, reversing the storing of words from disk. This should produce an exact match for the 266 words of each sector between what is on a disk archive file and what I produce.

Testing with my disk cartridge, I was able to match it up byte for byte with the bitsavers disk archive named xmsmall.dsk which does match the label on the physical cartridge. Other than the extraneous word at the front, it is now a match for the content of the sectors.

I had also instrumented the LEDs to show me any FSM that was not at its idle position, so that when the disk tool stalls again I can immediately spot which FSM(s) are confused. I set up for a test this morning with the new bitstream.

It never stalled - evoking the old adage that a watched pot never boils - but it did give me the data as I wished, in reversed word order, to match online disk archives and what the programmer would see in memory after issuing a read.

I do see that the online disk archive files have an extraneous word in front of the sector contents, which needs explanation before I can transform my output files into an acceptable archive file. The Contralto simulator uses the files and it simply ignores the first word from the file.

I moved on to test out my transaction to read an entire cartridge into RAM. I discovered two problems with this - first, it halts as soon as the finger is off the btn to trigger the transaction, and second it is not setting up the memory address properly so that I see the last cylinder's contents written in the position for cylinder 0.

I did a test to see whether the read field logic is not properly addressing cylinders above 0, or whether this is restricted to the read cartridge transaction. I saw that data was stored in the appropriate locations in RAM, so this was an artifact induced in the 'read entire cartridge' function.

I found the cause of the FSM needing button 1 to be depressed for it to continue up to cylinder 202, corrected the problem, and made a change to add an additional cycle for the bump of cylinder, head and sector numbers before triggering the read of the next sector. Half an hour later, the bitstream was ready for a test.

Nothing was working properly, all out of proportion to the two small changes I had made, indicating this is one of those odd times when the toolchain produces useless dreck as a bitstream. I made some other changes and synthesized again, now approaching late afternoon.

I also decided to swap the fpga board from the 1200K gate version down to the 500K gate variant that will be used for the tool in production, both the one used with the Alto restoration and a second tool to be used by Al Kossow to archive the contents of various disk cartridges.

I had to turn off the video display,which used quite a few gates, in order to fit in the 500K capacity. Other changes were also necessary, all in all the rework burned up a few hours.

When I did my evening testing, all appeared to be back to where I expected. The ReadSector function works fine, the Seek function works fine, and the ReadEntireCartridge function now runs to completion (cylinder 202) without having to hold button1 down continuously.

I dumped the file and checked, but found that only the even numbered sectors were read, implying I have a flaw in the incrementing logic for the ReadEntireCartridge function. It seemed that the logic stepped through every single cylinder, but I couldn't see how it alternated the head and sectors.

I did think of a way to improve some of my logic. The way I wrote the VHDL, it uses a chain of two adders to calculate the starting memory address as S + C - 1 (S is high order bits from cyl, head and sector and C is the word count), but I can assemble this without using an adder.

I have a memory address that has a high part generated from cylinder, head and sector values. It has a low part that is 9 bits representing the word within a sector. The high part is assembled from 8 bits of cylinder, 1 bit of head and 4 bits of sector number. All together, 22 bits of address.

I realized that the low part of the memory address is one of three constant values, x0002 for the header record, x000A for the label record and x100A for the data record. There are no carries that will propagate to the high 13 bits of the address.

I will also change instrumentation to let me watch, with the logic analyzer, as the  ReadEntireCartridge  function advances through all the sectors. That way, when I test tomorrow, I can fix up this last problem and be ready to capture the contents of the four cartridges we have with the Alto and the two that I have at home.