Saturday, October 15, 2016

Read sector functionality of the Alto-Diablo disk tool is working well, minor issues to address overall

Morning spent as Volunteer Examiner administering ham radio license tests, back to tool in the afternoon.

ALTO DISK TOOL

I came up with a new scheme for the deserializer operation during idle time at the exam session, then implemented it when I got back to my house. It has separated the bit timing into discrete state machines.

One FSM is tracking the timing of the Read Clock pulses, another will look for a '1' data bit value on Read Data only at the appropriate portion of the Read Clock cycle, and the third will handle recognition of the sync word 0000000000000001 when beginning to read any of the three records in a sector.

I trigger the deserializer to collect bit values with some combinatorial logic that looks at the timing in a Read Clock cycle and the sync or unsynced state. This should tie into the deserializer module itself without requiring a rewrite.

Another thing I had suspected from the data recorded on the disk is that the Head selection signal is inverted (logically) compared to what the software requests. That is, when the disk interface signal is on (0V level), the top head is selected.

This would normally represent a '1' on the Head line, but an extra inverter is introduced on the Alto disk controller card (via an Intel 3404 latch) such that the software uses a '0' to drive a '1' on the head line which means it drops to 0V and the drive uses the top head. I have to invert my logic in the disk tool to match this usage.

After it was turned into a bitstream, I fired up the testbed and read a sector again. The logic seems to be working better, but I have a problem with timing, between when I sync and when I record bits for deserialization.

I added an 'armed' stop to the sync state FSM as a way of blocking that bad behavior. I ran again and did indeed sync properly and deserialize the data words properly. The flaw I am currently working on is that the ReadSector FSM is jumping to read the middle record, the eight byte label record, rather than knowing that it is handling the two word header record.

This flaw causes it to read the checksum word correctly but it is not recognizing it as the checksum nor doing the comparison for validity. I have to debug this condition but feel that the basics of extracting parallel words from records is in good shape.

What I saw was the logic trying to sync up at less than 80 us into the sector, far too early. It should be obeying a preamble delay of 201.6 us before looking at any incoming data bits. I have to correct this. I do see the proper sync word at around the nominal 336 us into the sector.

I found the problem. Doh. For some reason, when I calculated the number of fpga cycles to produce given durations, I was using 50ns as the cycle time but this is running at 20ns. That means my delay counts need to be 2 1/2 times as big to produce the intended delay. I made the corrections, invested the half hour in producing a bitstream, and tested again.

The first time I read a sector after the disk tool is reset, it appears to read the entire sector properly with no checksum errors. I verified that the first two records were properly read and their checksum matched what was read from disk, but I didn't go all the way to the end of the disk to check the tail of the long data record.

However, each subsequent try seemed to ignore the work I did on the long preamble, instead syncing on the earliest 1 bit it found regardless of the preamble. I have some defect in how one or more state machines resets when it completes one cycle.

I had made great strides by dinner time, with a few bugs left to iron out and some steps left to check but significant parts of the functionality working properly.
After dinner, I looked over logic and set up various diagnostic signals to help me sort out what was happening. I discovered the counter was too small inside the ReadField state machine to handle the new, larger and now correct counts being used for delays.

It was also helpful to build up a table of cycle counts from the sector mark to various words that were part of the sector being read, so that I could program in reasonable delay counts to the logic analyzer trigger to observe any 80 us wide interval I wished. There are roughly 42 such intervals in one sector, stacked end to end, making the observations slightly tedious. 

I worked through several sectors and can now consistently read the entire sector correctly, matching checksums, storing contents in memory and restore to where I can read another sector. I checked against the full set of items to verify and have met all of them:
  1. Readsector logic is waiting for a sector match
  2. Indexmarker occurs, signalling that next sector mark is for sector 0
  3. Sectormark occurs, thus we are in sector 0 now
  4. Gotsector is emitted to indicate we have a match
  5. Readsector logic moves on to setup for record 1 of the sector
  6. Readfield logic is triggered
  7. Readfield waits for approximate 200 us preamble before looking at incoming data
  8. Roughly 120 us of zero data bits are seen
  9. A 1 data bit completes the sync word
  10. the logic recognizes the synced state
  11. Two words of the header record are deserialized, extracted and saved
  12. The checksum is calculated correctly for the two words of the header record
  13. The next word is deserialized, extracted and used as a checksum
  14. The checksum verification test occurs properly
  15. The readfield logic completes
  16. The deserializer goes to the unsynchronized state
  17. The readfield logic for the next record, label, begins
  18. The appropriate preamble is passed before we look for sync
  19. Enough zero bits are read to properly set up sync logic
  20. A 1 bit is read and the sync condition is attained
  21. Eight words of the label record are deserialized, extracted and saved
  22. The checksum is properly calculated for the 8 label words
  23. The next word is deserialized, extracted and used as a checksum
  24. The checksum verification test is done correctly
  25. The readfield logic ends
  26. Sync is dropped
  27. The readfield logic is entered for the data record
  28. A suitable preamble is passed before attempting sync
  29. Enough zeroes are read for the sync engine to work properly
  30. A 1 bit arrives and we attain the sync condition
  31. 256 words are deserialized, extracted and saved as the data record
  32. The checksum is properly calculated
  33. The next word is deserialized, extracted and saved as the checksum
  34. The final checksum test is done properly
  35. Readfield logic ends
  36. Sync is dropped
  37. The readsector logic completes
  38. Appropriate completion status is set in Reg0001
  39. The next sectormark does not occur until after step 36
I observed one problem and one discrepancy from the model of a disk sector to which I am engineering. The discrepancy is that we seem to be displaced about 60-70 us, from the times I would have expected from the Alto microcode. The problem is that the disktool sometimes gets into a state where it will no longer read sectors, although disk seeks are still functional.

The length of a sector mark is only 5 us, far too short to account for the added time. I am forced to speculate that the word task is doing other things or the execution time of the sector task is long enough to result in this delay from the 'ideal' timing.

Ideally, a sync word will exist right around the 34-35th word time, roughly 330 us into the sector, but real data on the drive has the sync word in the 390 to 400 us zone, Everything from that point is correspondingly shifted later in the sector, but it all fits neatly into the 3,333 us available.

The long preambles at the beginning of a sector will accommodate these delays in the start of a sector when reading and real Alto systems should be able to tolerate an earlier sync word being written by this tool. Therefore, I won't do anything based on this observation.

The hang is troubling. It occurs some random time after the tool is initialized, such that I am doing a number of seeks when abruptly the button does not respond any more. I believe that I have one or more of the state machines wedged in a non-idle state. I will test this with new instrumentation displaying the idle status of eight of my FSMs; any light that is out indicates a wedged machine. 

As a final test, I examined a file dump after having read sector 0, but all I can see is that the data recorded is plausible. I see the proper disk address in the header words and have eight label words that look very much like those on the label field of the two online disk images I grabbed from bitsavers. 

I found two online images, one named diags.dsk and the other xmsmall.dsk, but the contents of cyl 0, head 0, sector 0 which should be the boot sector does not match what I see on my disk. The sector 0 contents of those two images are different from each other, too, which adds to the mystery.

I will pass the data along to Ken to see if he can interpret it, certainly the header and label fields should pass some clear validity checks, and then he can determine if the data sector looks like well formed Nova instructions. 

I ran out of energy as it got late, but will work on the interpretation of contents, tracking the wedged state machine(s), and move forward on the high level 'Read entire cartridge' state machine. I am pleased that every sector I have read has come through clean, with all checksums matching. 

Friday, October 14, 2016

All apps and diagnostics workign on Xerox Alto II, redesigning key component of disk tool

RESTORATION OF XEROX ALTO II

We spent the morning working on the Alto. The machine, as we received it, had a 1K CRAM (RAM for microcode) board and a 2K/3K Control board attached to it. The Alto can be configured with 1K of microcode ROM, 1K of ROM and 1K of RAM, 2K of ROM and 1K of RAM, or 1K of ROM and 3K of RAM.

With the Control board wired for 3K, but a 1K board we had incompatibilities including one address control line that was hooked directly to ground, making the 1K of RAM inoperative. If the control board was only a 2K or 1K control board, it would have been compatible with the 1K RAM board.

We switched the original boards for a pair of boards we received, a 3K control board and a 3K CRAM board. Everything worked perfectly, we could run the diagnostics and all the applications with no failures.

With that fixed, the two remaining bits of hardware to work on were the ethernet board and the mouse. The optical mouse seemed in good shape, but the Alto didn't detect any movements or clicks. We disconnected the mouse cable from its connector on keyboard and checked for availability of +5V and ground.

Immediately, we saw the problem. The connector on the mouse cable is a DB-9 which has a row of 5 pins over a row of 4 pins. The female connector on the keyboard is a DE-19, which has a top row of 6 receptacle pins, a middle row of 7 and a bottom row of 6 pins. Even though the connectors mated physically and could be screwed together, the pins didn't line up properly.

We will need to do some work to make the mouse work, with several possible approaches:

  1. Get our hands on an Alto mouse with the proper DE19 connector
  2. Make a cable converter to connect the 9 active wires from the DB9 to the DE19
  3. Add a secondary connector to the keyboard, a DB9 one, to use the original mouse
  4. Find a DE19 connector and rewire the mouse
  5. Replace the DE19 connector on the keyboard with a female DB9

ALTO DISK TOOL

The steps while reading a sector must be verified, with completed steps in gray:
  1. Readsector logic is waiting for a sector match
  2. Indexmarker occurs, signalling that next sector mark is for sector 0
  3. Sectormark occurs, thus we are in sector 0 now
  4. Gotsector is emitted to indicate we have a match
  5. Readsector logic moves on to setup for record 1 of the sector
  6. Readfield logic is triggered
  7. Readfield waits for approximate 200 us preamble before looking at incoming data
  8. Roughly 120 us of zero data bits are seen
  9. A 1 data bit completes the sync word
  10. the logic recognizes the synced state
  11. Two words of the header record are deserialized, extracted and saved
  12. The checksum is calculated correctly for the two words of the header record
  13. The next word is deserialized, extracted and used as a checksum
  14. The checksum verification test occurs properly
  15. The readfield logic completes
  16. The deserializer goes to the unsynchronized state
  17. The readfield logic for the next record, label, begins
  18. The appropriate preamble is passed before we look for sync
  19. Enough zero bits are read to properly set up sync logic
  20. A 1 bit is read and the sync condition is attained
  21. Eight words of the label record are deserialized, extracted and saved
  22. The checksum is properly calculated for the 8 label words
  23. The next word is deserialized, extracted and used as a checksum
  24. The checksum verification test is done correctly
  25. The readfield logic ends
  26. Sync is dropped
  27. The readfield logic is entered for the data record
  28. A suitable preamble is passed before attempting sync
  29. Enough zeroes are read for the sync engine to work properly
  30. A 1 bit arrives and we attain the sync condition
  31. 256 words are deserialized, extracted and saved as the data record
  32. The checksum is properly calculated
  33. The next word is deserialized, extracted and saved as the checksum
  34. The final checksum test is done properly
  35. Readfield logic ends
  36. Sync is dropped
  37. The readsector logic completes
  38. Appropriate completion status is set in Reg0001
  39. The next sectormark does not occur until after step 36
To complete step 11, I need to redesign the deserializer function, since it is sometimes detecting 1 bits when they did not occur in the incoming data stream. This is a problem with glitching/bouncing of the incoming line, which did not occur when I generated a perfectly spaced test pattern but happens with the real-life signals and their associated jitter. 

Verified data and formating on disk cartridge, working through my logic to read the header field

ALTO DISK TOOL

The steps that will occur and must be verified, with completed steps in gray:
  1. Readsector logic is waiting for a sector match
  2. Indexmarker occurs, signalling that next sector mark is for sector 0
  3. Sectormark occurs, thus we are in sector 0 now
  4. Gotsector is emitted to indicate we have a match
  5. Readsector logic moves on to setup for record 1 of the sector
  6. Readfield logic is triggered
  7. Readfield waits for approximate 200 us preamble before looking at incoming data
  8. Roughly 120 us of zero data bits are seen
  9. A 1 data bit completes the sync word
  10. the logic recognizes the synced state
  11. Two words of the header record are deserialized, extracted and saved
  12. The checksum is calculated correctly for the two words of the header record
  13. The next word is deserialized, extracted and used as a checksum
  14. The checksum verification test occurs properly
  15. The readfield logic completes
  16. The deserializer goes to the unsynchronized state
  17. The readfield logic for the next record, label, begins
  18. The appropriate preamble is passed before we look for sync
  19. Enough zero bits are read to properly set up sync logic
  20. A 1 bit is read and the sync condition is attained
  21. Eight words of the label record are deserialized, extracted and saved
  22. The checksum is properly calculated for the 8 label words
  23. The next word is deserialized, extracted and used as a checksum
  24. The checksum verification test is done correctly
  25. The readfield logic ends
  26. Sync is dropped
  27. The readfield logic is entered for the data record
  28. A suitable preamble is passed before attempting sync
  29. Enough zeroes are read for the sync engine to work properly
  30. A 1 bit arrives and we attain the sync condition
  31. 256 words are deserialized, extracted and saved as the data record
  32. The checksum is properly calculated
  33. The next word is deserialized, extracted and saved as the checksum
  34. The final checksum test is done properly
  35. Readfield logic ends
  36. Sync is dropped
  37. The readsector logic completes
  38. Appropriate completion status is set in Reg0001
  39. The next sectormark does not occur until after step 36
I brought out several more signals and wired them to the logic analyzer, before running more tests. They helped quite a bit in testing the next few items on the list above. I came to a few immediate conclusions:
  • the checksum word is not read properly from disk - comes in as 0000
  • i am matching on the proper sector number
  • my logic to save the checksum status in Reg0001 is not correct
The signal patterns coming in from disk as Read Clock and Read Data are well formed. The have transitions must faster than the 50ns maximum and are a nominal 100ns wide. The clock exhibits some jitter from pulse to pulse which is understandable. 

All of my testing to date is predicated on the belief that the disk cartridge I received, labeled Alto Smalltalk, is in fact written by an ALTO and in the proper three record per sector format of that machine. 

I decided to record all the data bits on sector zero and see if they match up to what I should be seeing. I checked cylinders 0, 5 and 10 with sector 0, all good. I will check a few other cylinders and sectors for their formatting.What I noticed is that when I set Head to 0, I see a header field with the bit x0004 set, and Head of 1 gets x0000 so I probably have the sense of Head inverted. 

I made some tweaks to deal with the checksum error status recording and to reset all transactions anytime the command register (Reg0000) goes to zero. Time to resume testing. It does appear I am recording the checksum errors now

I see that my deserializer is not properly handling the serial stream. Where the disk has 002C my extraction was 602C. It began with spurious bits, but when I look on the logic analyzer the Read Data line was always 0 and should not have triggered a 1 into the shift register.

I have not completed step 11 above, but feeling good about the progress. I am going to stare at my deserializer and think about what I can do to improve its accuracy, perhaps by recoding. 

Thursday, October 13, 2016

Working carefully through entire read sector using logic analyser

Today I spent time with the rest of the 1401 Restoration Team at CHM.

ALTO DISK TOOL

I have begun a methodical check of the entire process, the 40ms in the life of a read sector transaction. For each step, I will set up the logic analyzer to verify that things occur as I expect them to. For example, I need to verify that my 'got sector' is emitted for the correct sector number, not just that I get that signal during a rotation.

The steps that will occur and must be verified:
  1. Readsector logic is waiting for a sector match
  2. Indexmarker occurs, signalling that next sector mark is for sector 0
  3. Sectormark occurs, thus we are in sector 0 now
  4. Gotsector is emitted to indicate we have a match
  5. Readsector logic moves on to setup for record 1 of the sector
  6. Readfield logic is triggered
  7. Readfield waits for approximate 200 us preamble before looking at incoming data
  8. Roughly 120 us of zero data bits are seen
  9. A 1 data bit completes the sync word
  10. the logic recognizes the synced state
  11. Two words of the header record are deserialized, extracted and saved
  12. The checksum is calculated correctly for the two words of the header record
  13. The next word is deserialized, extracted and used as a checksum
  14. The checksum verification test occurs properly
  15. The readfield logic completes
  16. The deserializer goes to the unsynchronized state
  17. The readfield logic for the next record, label, begins
  18. The appropriate preamble is passed before we look for sync
  19. Enough zero bits are read to properly set up sync logic
  20. A 1 bit is read and the sync condition is attained
  21. Eight words of the label record are deserialized, extracted and saved
  22. The checksum is properly calculated for the 8 label words
  23. The next word is deserialized, extracted and used as a checksum
  24. The checksum verification test is done correctly
  25. The readfield logic ends
  26. Sync is dropped
  27. The readfield logic is entered for the data record
  28. A suitable preamble is passed before attempting sync
  29. Enough zeroes are read for the sync engine to work properly
  30. A 1 bit arrives and we attain the sync condition
  31. 256 words are deserialized, extracted and saved as the data record
  32. The checksum is properly calculated
  33. The next word is deserialized, extracted and saved as the checksum
  34. The final checksum test is done properly
  35. Readfield logic ends
  36. Sync is dropped
  37. The readsector logic completes
  38. Appropriate completion status is set in Reg0001
  39. The next sectormark does not occur until after step 36
Each set of testing and each tweaking of logic takes about 30 minutes through the toolchain. In some cases, the 4K cycles of a logic analyzer capture, accounting for about 204 us, will let me check off multiple steps. The entire trace buffer is less than 22 words of deserialized data, and preamble and sync times account for multiple word times as well. 

Thus, we might view the entire header record in one shot, but the label record may take two capture intervals. The data record, needing almost 270 word times of data, either needs a dozen captures or some faith that allows capture of just the beginning and end. 

By the time I left for the CHM, I had verified steps 1 to 4 of the list and was attempting to work through the looooong preamble before the header record. I see that we are good through part of step 7, in that I see spurious data bits arriving in the first 30 us of the sector, and they are indeed being ignored because we are still in the preamble state. 

I set the logic analyzer to wait 4032 fpga clock cycles after the sector mark occurs, then watch for the synchronizing behavior. This is the preamble duration of the Alto microcode, 21 word times.  I am seeing the sync bit arrive just about where expected. Further, the collected data from the disk does appear to match the two header words I would expect, with 0000 0000 from sector 0.

The scope pattern is consistent with the checksum I would expect, 0141 in hex, but I don't yet have a good logic analyzer signal to validate the checksum testing results. That will be the last set of tests for tonight. I am comfortable that steps 1 to 11 are verified and probably that the next couple are good, but time will tell. 

I made a one line change to route a different signal to an LED and the logic to match sectors stopped working, such that my logic analyzer trigger no longer was recording. I suspect this is one of those random Xilinx abject failure modes where some other set of trivial and unrelated changes will result in a bitstream that once again works properly. Time to sleep, since progress for tonight is blocked. 

Tuesday, October 11, 2016

Have matching board sets for Alto, debugging read sector operation on real disk drive

ALTO II RESTORATION

Our working hypothesis for our current problem with the Alto is that our CRAM board is mismatched with our Control board - having a 1K CRAM but a control board that is 2K wired with a 3K engineering change. 

Al Kossow loaned us four 3K CRAM and 3K Control board sets, plus some spare PROMs for the control boards. We are pretty certain to have a properly working and matching pair, which should resolve our current problem.

The Alto fails whenever a program tries to execute microcode that was written into the CRAM, because the mode switch from ROM to RAM fails to take. This is due to a grounding of the control pulse as it travels over a cable between the two cards. We suspect that the 1K CRAM board has the pin grounded that would be used to accept the signal on a 3K board. 

ALTO DISK TOOL

Today I cleaned up some signals, wired up the logic analyzer and began collecting traces to debug the disk tool executing its 'read sector' transaction. When I run the 'read entire cartridge' transaction, it stalls at some random cylinder location, apparently because the seek didn't complete properly in my state machine.

The first look at the data being captured looks plausible, but frankly until I look more closely I won't be certain. Ideally I would have a known data pattern to match. The good news is the repeatability of the data that is captured, but it has to be the actual data.

I completed the wiring of the logic analyzer signals to the fpga board so that I can look more granularly at the behavior of the 'read sector' FSM. I am slogging through the states checking that things are occurring as I expect.

With only 4K entries in the logic analyzer, each snapshot at 50ns intervals, I can only see 200 microseconds of activity. A single sector read occurs over 3,333 microseconds. I need to be extremely clever in triggering conditions to spot the portion of the read that I want.

I worked until the evening, without spotting any obvious problems but I know I have conditions where a 'read sector' will stall and of course the stalls that happen trying to read an entire cartridge in one transaction.

This portion of the testing will focus on glitches, race hazards and other situations that can cause my logic to malfunction. I entered this with logic that worked properly with the built-in pattern generator, but that delivers pulses with a orderliness that doesn't exist on a real disk drive. Erratic timing is what can bring out the latent defects. 

New level shifting board appears sound, testing of read sector logic underway

ALTO DISK TOOL

I began the morning with a careful wiring check of the new level shifter/driver board, ensuring no solder bridges, all signals wired correctly and that outputs corresponded to the intended inputs. With that done, it was time to set up the testbed.

My tests with the drive switched off looked good - I saw the signals pulled nice and low, with decent speed edges. As a safety measure, I am going to check the critical signals - Write Gate and Write Data plus Clock, as these must absolutely remain high to protect the disk cartridge from inadvertent writing/erasure.

Once that was confirmed, it was time to spin up the disk, try some seeks and attempt to read a sector. The seeks appeared to work perfectly, including errors if I attempted an invalid cylinder, one above 202.

An afternoon test will check to verify the positioner indicator on the drive against the cylinder I attempted, to test that all the address bits work and are in the proper position. I found that cylinder bits 64 and 128 were swapped, which was easy to correct in the fpga logic, otherwise seeking works exactly as desired.

The read test results were mixed. It appears to read sector 0 of each cylinder, although I have yet to validate the data recorded by the logic. If I select any non-zero sector number, it hangs, indicating that my logic to count sector numbers and match is going awry.

My late afternoon testing session would monitor the Read Data and Read Clock signals, as well as the sector marker and index marker inputs, to be sure they were being read properly. I will use the oscilloscope for this task.

The Read Data and Read Clock signals looked good. I am less sure about the sector and index marks This will be tested more completely tomorrow

I will also hook the logic analyzer back to the fpga board, so that I can record what the logic believes is being decoded and stored, comparing it to what is written to the file and sanity-testing it against what is likely to be on the real disk sector. 

Sunday, October 9, 2016

Analog challenge addressed in Disk tool to Diablo interfacing

ALTO DISK TOOL

Level shifting circuits still not performing to my satisfaction. The reference driver logic from Diablo is a pair of MC858 NAND gates in tandem to sink 95 ma to pull a signal down to near 0V. The various gates I have tried don't have the drive power and few of the usual logic chips do.

I am going to take my time and try out various circuits until I have one that clearly produces the right results as measured by scope and voltmeter. I looked to see what circuits were used by other makers of drive interfaces for the same generation of drives.

DEC used a high current driver chip that could sink up to 300ma and I will try a similar chip, the SN75452 dual NAND driver which seems to provide similar or better performance. Fortunately I found I had 10 of them in stock, allowing me to breadboard one chip and do tests right away.

I am going to prototype and try each potential circuit on the real disk drive to ensure that my fpga, level shifter, cable and Diablo drive will produce the pulse voltages and shapes that I need. First up was the 75452B chip, which is an inverter thus requiring minor changes to the fpga for the two signal lines it will control.

With the logic inverted, I put on the scope and voltmeter, brought up the testbed and triggered a seek with those two cylinder values logically on. This will pulse them from ground to high when the seek machine fires, giving me a drop of the output from is high level to near ground.

It worked spectacularly, pulling the disk interface line down to near ground sharply. Time to rewire the board to use these chips for all the output lines and remove the level shifter circuits that were there. This will take a bit of time but I should be ready for real testing by tomorrow morning latest.

Having the potential that a max of 10 of the 13 lines could be swinging low at any time, I know I will be requiring about 1A of power supply to the driver board. I had to beef up the power supply wiring to handle the current.