Thursday, May 18, 2017

Issues found and hopefully fixed in disktool writing, plus good day with the 1401 computers

ALTO DISK TOOL

Counting clocks and laying out the data bits into words was the key to spotting problems. Immediately, I saw a few issues in the stream I wrote. I have to work through the logic in the tool, instrument some signals to help me debug it with a logic analyzer and resolve these.

First, I see that the bits that looked correct for the header record were actually incorrect. The header record consists of the sync bit, two words of zero (32 bits of 0), and a checksum word of 0x0151, then there will be ten words of zero following it. 

The checksum on the disk is a bitwise XOR of each word with the running total, initially seeded with 0x0151. Thus, if we start with 0x0151 and XOR by 0x0000 twice, we still have 0x0151 in the running total. The checksum word on disk, 0x0151, is compared to the running total 0x0151 and we have a validation of the record integrity.

However, I found that I wrote the sync bit, only one word or 16 bits of zero, 0x0151, and a stream of zero words after. I dropped one of the two words of the header (0000 instead of 0000 0000). This coincidentally passes the checksum validation (!) thus giving the misleading symptom that my reading problems only occurred on the following label record.

To see why this still passes muster, consider the process in the microcode. After it finds the sync bit, it reads two words in and XORs them to calculate the checksum. It then reads the third word from disk and compares to the running total. Here is what happens with my abbreviated record:

  • Initial seed for checksum sets it to 0x0151
  • First word of zero is read, XOR leaves running total at 0x0151
  • Second word is 0x0151 since I omitted the true second word
  • XOR of second word with running total sets it to 0x0000
  • Third word read is really one of the zeroes that sit between records
  • Check of the third word - 0x0000 - with the running total 0x0000 yields equality
  • Checksum appears to validate integrity of the record
I have a miscount in writing out the data, omitting one of the header record words of 0x0000. That must be addressed, But wait, there is more! The gap between the header and label records should be exactly ten words of 0x0000 followed by the sync word of 0x0001 but the timing shows there are a few extra clock cycles or bits of zero in the gap. This must be addressed too,

Looking at the label record, which should be a sync bit, eight words of the label data, a checksum for the record and another gap filling set of ten words of zero, I find that the first data word is correct but the second is not right. Therefore, the problems are not simply a failure to write all N words of a record, but are worse. 

My logic is fetching words from RAM to fill a shift register, shifting the bits out serially to output each word of a record. Either I didn't fetch the second and ensuing words properly or there is some other failure in coordination between shifting, reading, loading and managing the record. 

The first word of each record comes out right, which makes the results on disk superficially appear correct. It is only when carefully counting more than 5000 clock cycles and laying out the words they represent that the problems are clearly visible. 

I think I fixed several of the problems through a refactoring of the write field logic in the disktool, but I also set up key signals to be monitored on the four PMOD connectors on the fpga board. I can hook these to the logic analyzer and debug more deeply, if what I see on the scope doesn't match what should be written.

IBM 1401 RESTORATION

I rebuilt four more of the power sequencing SMS cards today, replacing three resistors with two of different values and one transistor from type 26 to type 30. They were all tested in the Connecticut machine and performed perfectly. We now have both machines with two new cards in the sequencing gate, plus two spares placed into our inventory.

We seem to have a problem on one of the 026 keypunches which were converted from selenium rectifiers to modern silicon diodes. We think that at least one of the six diodes is malfunctioning, although the full wave bridge is working fine. More diagnosis needed to find and replace the pesky critter.

A new volunteer, Nora, is working on organizing all the parts in the workroom. It will be a delight to have everything well labeled and easily accessible. The challenge in front of her is formidable, as we have many boxes or bags that contain a huge jumble of components.

Imaging a few hundred assorted transistors in a plastic baggy, then multiply that by many such collections. Finding a part of a given type, such as the type 30 transistors I needed, is a painstaking process thumbing through everything. When Nora is done, we will have these sorted, separated and labeled.

One of the 1401 systems forgot how to stop today. A program would be started but none of the Stop buttons on the CPU or peripherals would cause it to stop the execution. Bill traced it to a failed Stop Latch and replaced the card. I will diagnose and repair the card itself next week and put it back into spares. 

No comments:

Post a Comment