Saturday, December 31, 2016

Friday, December 30, 2016

Testing of disk tool on Alto system Diablo drive highlights remaining issues


I brought the new board, fpga and related objects over to the Alto, hooked it up to the Diablo on that system and read some cartridges for archival purposes. Unfortunately, I was still seeing sectors with errors.

Something is going wrong, leading to errors which we are not experiencing when the Alto reads the same cartridges. Unless the Alto is ignoring errors and never is affected by any incorrectly read sectors, there is something I can improve in my fpga logic.

We did boot up all the other cartridges we had, which all ran well, but that was with the Alto driving the Diablo. The original disk cartridge we received had been cleared by a utility that overwrote the media with a pseudo-random stream of words. This seemed ideal to use to write an archived disk image which had many Alto games.

This was the first time we tried to use the WriteEntireCartridge transaction, which appeared to work. However, when we tried to boot the cartridge on the Alto, we heard it seek and read from two locations but then stop. Clearly the written data was not done well enough to work properly.

My wishful thinking yesterday, that the surface scratching on the cartridge at my house was the sole reason for checksum errors while reading, is proven wrong. I have to keep working until my tool reads my cartridge entirely correctly (and writes sectors properly). 

Thursday, December 29, 2016

Validated new board and connectors, ready to be used on the Alto system to archive disk images tomorrow


I set up the testbed and began to check the functionality and signal integrity of the new board and connectors. Seeking was checked to be sure all the address bits were working - but in fact the 4 and 16 lines were not working. I found that the chip pins involved were not well soldered, fixed it and went back to test.

In addition, I have a Digilent Nexys2 FPGA board that was purchased by Al Kossow in order to give him a working disk driver and emulator. It had connectivity problems in the USB connector, with the connector itself bent and some other problems where it was attached. I used my heat gun to remove the connector, bought replacements and will surface mount the new part when it arrives.

The disk drive testing resumed and my seek addresses were all working correctly. Next up is getting the ReadGate and other signals tested. I am emitting '1' to ReadGate which should pull the output of chip U2 side 1 down to ground, thus activating pin 39 on the ribbon cable to signal pin E.

However, when I monitor pin E on the terminator, the line is not going down to ground. I have continuity from the fpga connector pin to chip U2, from the chip output to pin 39 on the connector, and all the way through the cable and drive to the terminator.

I might have a flaw with chip U2 such that it is not working, even though all the connections to and from it are good. I have to do a power-on test and verify the voltage levels at the chip. Watching the scope, I could see that the signal at the terminator changes as I probe the chip pins, indicating I probably have a poor solder contact there.

I decided to remove the chip entirely with my hot air rework gun, then resolder it properly. With that done, I verified good solid behavior, including the Seek, ReadSector and ReadEntireCartridge transactions. I uploaded the contents of the pack, along with the checksum validation array, so that I could check that the data was extracted properly.

The results were just as before, so I now have the production level board tested and ready to use tomorrow on the drive attached to the Alto and the other cartridges we have there.

As I packed up everything for transport, I decided to inspect the disk cartridge surface and the heads. I can see scraping and discoloration at the low cylinder numbers near where I am experiencing almost all of the checksum errors reading that cartridge.

Surface marking that is probably the cause of my errors reading some sectors

Surface marking on the other side of the platter
I will tentatively ascribe the errors to the surface condition, thus hope that my tool works well with undamaged cartridges on the other Diablo drive, the one that came with the Alto. We will know more tomorrow after we read and upload all the content. 

Saturday, December 24, 2016

Holiday downtime

Due to the holidays, my test setup is stored away and nothing can be done on the project. Back soon!

Thursday, December 22, 2016

Changed emulator board design and sent to fab


My early testing with the Emulator role PCB shows that it is not going to work properly. The signals going from the fpga to the Alto require pull-up and pull-down resistors to give the ribbon cable termination. This should have been placed on the Alto side, but they were not, instead terminating the signals that ran from Diablo to Alto by resistors on the drive side.

I need to expand the PCB to add a pair of resistors per outgoing signal, a total of 22 added resistors. Good thing I only ordered one board for $66, of course with shipping and tax on top, as it is wasted. Time to get to the designing (and to order more components).

I had the 1401 restoration team meeting yesterday and spent quite a bit of time chasing down a flaw that resulted in card reader error stops. Greatly improved now, so that programs which use the read then print instruction can get through hundreds of cards before stopping. It had been failing on the third card pretty regularly.

Thursday morning I changed the emulator card design to add in the termination resistors and decided to send out for a new card through the same fast turnaround service I used previously. Another $100 and should be about two weeks to receive it.

Tuesday, December 20, 2016

Construction progress on PCBs and cabling, plus more testing


Parts are on order to complete the remaining driver and emulator boards - shipped today but not likely to get here before Xmas due to the heavy volume at all the delivery services. I did put on some of the components I still had - capacitors, IDC connectors and driver chips, for instance - before putting the boards aside.

Testing out the new driver board involved checking a number of items, as listed in yesterday's post. Step 1 was already complete.  
  1. Verify that SelectUnit1 is switchable and not permanently on from the fpga
  2. Verify that the terminator has the pull up and pull down resistors installed for all the signal lines I am driving, but none on the lines returning from the Diablo to my fpga.
  3. Monitor signals incoming to the PCB to validate their correctness
  4. Monitor signals at the terminator when my PCB is driving the Diablo to validate their correctness.
Steps 2 and 3 were relatively straightforward, as there is easy access to the pins needed to monitor those circuits. Step 4 is more challenging and will require hooking on micrograbbers and multiple cycles of removing and installing the terminator. 

I found several of my incoming signals had termination on the sending end, as well as on my board. I clipped off the resistors on those signals. Everything else checked on with termination resistors, allowing me to complete step 2 above.

Step 3 will be done by a simple power up and attaching the scope probes to the tops of the three level shifter mini-boards across the top of my PCB. 

I took some time to put the IDC connector on the ribbon cable that attaches to the Alto disk controller card on its other end. This cable is now attached to the emulator board. 

Monday, December 19, 2016

Powered on new driver PCB and tested its correct functioning, also mostly build emulator role board


Today I worked on the connector from the ATX PC power supply to the driver board of the disk tool, or alternatively a direct wired cable from the power supply to the board. I went with a modification of the cable I had been connecting via wire nuts, but soldered everything to a 4 pin connector. It has an indicator light, the load power resistor and the connector to fit to my PCB.

Power is good, so I proceeded to power on testing. I had mixed results from the first test, other than it passed the magic smoke and hot component tests. I could set up and accomplish seeks, although this worked with or without the SelectUnit1 line activated which is NOT what should happen.

The drive should ignore all signals unless SelectUnit1 is pulled to ground and while it is ignoring incoming signals, it should not drive any return signals to the PCB. Instead, I see status, sector marks, sector numbers and other signals driven regardless of my Select slide switch. 

Seeks worked, but no response to a ReadSector or ReadEntireCartridge transaction. Time to do a few things before I proceed:
  1. Verify that SelectUnit1 is switchable and not permanently on from the fpga
  2. Verify that the terminator has the pull up and pull down resistors installed for all the signal lines I am driving, but none on the lines returning from the Diablo to my fpga.
  3. Monitor signals incoming to the PCB to validate their correctness
  4. Monitor signals at the terminator when my PCB is driving the Diablo to validate their correctness.
Step 1 yielded the insight that I had changed the logic in the fpga to hold SelectUnit1 on, thus the behavior was what should be expected. Eventually, this will be switched off except for the duration of transactions, but during testing it is easier if I can observe ReadData, ReadClock, SectorMark and other signals all the time.

I began looking at the terminator resistors and cross checking it all to the schematics and design of my board. I didn't complete this before my shipment of the emulator board PCB blank arrived.

In addition, my emulator role PCB arrived today and I began to assemble components on it. It is time consuming due to the small size of the parts. The resistors and capacitors are size 0805, one of which is held in tweezer tips in the photo below, on lined writing pad paper with one line visible for scale.

0805 size component on lined paper, in tweezer tips
I ran out of the header strips I need to mount the last two components, each a level shifting mini-board that connects 3.3V and 5V sides for four circuits each. Once these come in, I can test out the wiring of the board prior to firing it up to test the disk emulator role.

As you can see, the disk driver and disk emulator boards are fairly similar in overall appearance, but are quite different in details. They are electronically keyed, using pins 20 and 21 of the fpga, so that when the board is connected, the fpga logic knows whether it is operating in emulator or driver mode by which of the pins are grounded.

disk emulator role board, assembled but for two level shifter miniboards

Sunday, December 18, 2016

Built additional boards and tested them, ready to connect power and run the Diablo


I began to build a second driver board, but after putting on the resistors and preparing to install capacitors, I found the VCC and ground lines were shorted. I couldn't find any spot that seemed to be causing the problem, even after lifting as much solder as I could off the board. I put it aside as a potential bad board.

Taking a third PCB card, I did a full installation of all components, with no signs of shorts or other problems. It still must pass the wiring and correct operation tests, just as the first board I built must. The first board passed all its connectivity tests by mid afternoon, then I went back to the bad board so see what is wrong.

Board 1 installed on the extension board
To ensure that it wasn't a solder bridge under one of the resistors or the power connector, I removed them all and used solder braid to wick off as much solder as possible. The board continued to show that all four pins of the power connector were bridged together, thus shorting +5V, +3.3V and ground. It is unusable.

Board 2 with shorted power connector pins - unusable

I put the third PCB (2nd is bad) on the test bench. All the connectivity and short circuit tests were passed, visual inspection is good too. All that remains is to power it up and check the function of the drive with the fpga logic.

Board 2, in place waiting for power connector hookup
I mounted the driver PCB board on the extension board and hooked that to the fpga, ready to test as soon as I wire up the main power connector. Instead of wire nuts hooked to the existing power supply connector, I will make up a proper cable which bridges between the PC power supply main connector, that would normally be plugged onto a motherboard, and the four pin connector to drive my PCB.

Power supply connector
One wire has to be connected to ground to 'turn on' the supply, and the power resistor you see on the bottom has to provide a load in order for the supply to work, but many of the voltages supplied are unneeded. All it will take to build this is an ATX power supply plug, some heavy wire and a 4 pin Molex plug for the other end. 

Saturday, December 17, 2016

Building new driver board


The new driver board PCBs arrived and I began soldering components on one of them. Surface mount components are sooooo tiny, it is painfully slow going. When I found my soldering tweezers, which are spring loaded closed, I could hold and maneuver the parts.

Once it was apparently all soldered in place, I had to carefully check every circuit for connectivity and shorts, as well as appropriate resistance. This took the remainder of the day, since I want to be sure that this board is correct.

New driver board, assembled and under checkout
The 40 pin connector on the right attaches to the cable from the Diablo disk drive. The four pin connector at the left top is the power connector to the board. Another 40 pin connector is at the left, but mounted on the underside of the board. This underside connector attaches to the FPGA and its extension board. 

Friday, December 16, 2016

Built PLL, integrating it into the driver logic; PCBs shipped


Today, my driver boards are on their way, That would put the boards in my hand sometime on the 20th. The emulator board also shipped, but that will arrive midweek, a bit after the driver boards.

Monday night, at a dinner meeting with a number of technology enthusiasts that began life decades ago as the Motorola 6800 computer club, we discussed the data separation issues and a number of potential solutions. PLL is the consensus approach. .

I ran a few reads of sectors with repeatable errors and looked at the digital and analog results carefully to draw some conclusions if possible. Slow going. Knowing exactly what fails and why is important to devising a solution, otherwise I am just using brute force enhancements in the hope I hit the target.

I set up a phase locked loop and am doing some experimenting to see how well it does at data separation as a possible replacement for the one built into the Diablo drive. Tedious work to set up various jittery input patterns and read the simulator output to judge its response. Bashed through the day working on it.

Once I had the PLL doing a decent job syncing to the clock bits, I had to modify it to produce a fixed duration pulse of 100 ns, as the PLL produces 50% duty cycle outputs. Once that is done, I need to sort out the separator logic and the startup state which routes the first transition as a clock. This will be tricky.

Monday, December 12, 2016

Waiting on PCBs, refining retry logic while reading cartridges


Final components are on hand, now watching PCB move through the foundries. The driver PCB is currently being plated, presumably the next steps are creating the sandwich of layers and QA inspection before shipping. They estimate shipping by December 15th, with three day shipping in place, which means the boards likely arrive early next week.
Driver boards in process, arrive early next week
The single emulator board is assigned to a large panel to be manufactured. Once the entire 4 layer panel is built and cut apart to the separate boards, mine can be checked and shipped. This board will likely arrive late next week.

Digilent had an online sale, where I picked up both extension boards and FX2 connectors at a very attractive rate. These will allow me to build out all the boards I expect to make in the coming weeks.
Extension boards

FX2 connector to use in lieu of extension board
I tweaked the ReadEntireCartridge transaction to let me control the number of retries when a checksum error is detected. I can have none, 31 or 62 based on fpga board switch settings. I will run some experiments - seeing the number of error sectors with no retry, then assess the results of 31 or 62.

This afternoon, I ran those tests and see that very few are corrected by retry. Performing a single read across the entire cartridge resulted in 123 sectors with a checksum validation error. Whether retry was done for 31 or 62 times, we had 119 bad sectors. That is, only four sectors had temporary errors that were resolved by rereading. The bad sectors are extremely consistent from run to run, as well.

It is time to study failing sectors in more detail, to discover where the fault happens and the characteristics. If there is a pattern that I can detect and if a more intelligent data separator would avoid the fault, then I will build it.

To do separation, I must first OR together ReadClock and ReadData to give me the original combined pulse stream. That is simple to accomplish. Separating, depending on the full list of errors I find, might be a bit challenging but perfect extraction is the goal. 

Sunday, December 11, 2016

Working on the drive and fpga to further improve reading accuracy


I bought some 9601 timer chips, the same as used in the Diablo board, in order to experiment with the timing and appropriate component values. The 'long' timer might be running a hair short, hard to say for sure based only on the logic analyzer traces. I tried to set up the scope to get a more precise value.

Using the scope and some calculations, I worked out some fine tuning to get the timers closer to the ideal timing for the Alto, reworked the board and validated the timings afterwards. Resistances increased to make the times a bit longer.

With that change in place, I did another ReadEntireCartridge transaction and uploaded the resulting image and checksum status files for processing. Immediately I saw that I had very long stretches where no errors at all occurred.

I did the postprocessing and analysis on my laptop, running the Python programs I derived from Ken's work, and looked at the success rate of reading this cartridge with the improved board. I am seeing differences in 4% of the sectors and 2.1% of the total words, statistically about the same as before, but actually a lot fewer sectors had errors.

The only checksum validation errors were on cylinders 5-15, 26 and 44, just 12 out of 203 cylinders on the cartridge. with the errors concentrated so that about half the sectors on each cylinder had errors.

I am going to tweak my ReadEntireCartridge logic to support additional retries and make it more robust. At this point, it is possible that there is some corruption on the cartridge on those low cylinders, because I have an exact match between the bitsaver archive image of the cartridge and the contents I read, other than on those 12 low cylinders.

The errors occur sporadically across the cylinders, not bunched together. If it were surface defects or deterioration of the media, I would expect to have errors on all records of the sector and several consecutive sectors, not the pattern I see; there are errors only on the data record and they are well distributed across each head and cylinder.

If I can get the errors down to essentially zero, I will be satisfied. Until then, I want to continue researching and honing the system.

Saturday, December 10, 2016

Still investigating the read error cases, plus working on the new PCBs


I am still pondering the best way to handle the data separation for the Diablo drive. Yesterday I was looking at using a phase locked loop, but am now considering other means as alternatives. The point I am set on is the need to reverse the disk drive's separation and then accomplish this myself in the fpga.
Data Separator circuit inside Diablo drive
The circuit above accomplishes data separation from the signal coming off the disk surface. In other circuitry, not shown here, the flux reversals detected by the disk head are converted into a positive going pulse of fixed duration, somewhere between 50 and 100 ns in duration. Thus, each reversal delivers on pulse to the circuitry above.

We will discuss its behavior proceeding from a point in time just before the arrival of a pulse that represents a clock. The routing flip-flop in the center is initially reset, thus the incoming pulse is routed to the lower right gates to become a ReadClock pulse.

The pulse that is sent as ReadClock will also serve as the clock to the routing flip-flop, which has its D input tied high so when the clock pulse arrives, the flip-flop sets to on. This will continue forever until the timer circuits to the left side switch off, this causing the routing flip-flop to again reset so that new pulses again go out on the ReadClock line.

However, while the routing flip-flop is set and has not yet been reset by the timer expiration, any pulse coming from the heads is routed out the upper gates as ReadData. Thus, from the arrival of a clock pulse until some time duration has elapsed, subsquent pulses are routed as data, but after the time elapses they again are processed as clock pulses.

The timer circuits on the left consist of two different timers, both are started by a ReadClock pulse, but lasting a short or a long duration. Which of the two timers is used depends on the leftmost flipflop, which is set to on when a ReadData pulse is delivered, but reset when the next ReadClock pulse arrives.

Thus, the default state once the ReadClock pulse arrives is that this is a zero bit, enabling the long timer to be used. This enables the long timer chip to produce a negative going output pulse for the duration of its time, kicked off when ReadClock line goes on.

However, when a data pulse arrives during the time interval, it sets the logic to instead use the short timer chip whose negative going output pulse occurs earlier after the clock. This occurs by setting the 'one bit' flipflop on the left. It is not reset until the next clock pulse ends, thus the timer started is based on the prior value at clock rising edge, before the falling edge resets this flipflop.

Our routing flipflop in the center is switched off by either of the timer chips emitting the negative going pulse when it times out, as well as when a ReadData pulse is emitted. It stops looking once it finds a 1 bit transition pulse, or when the timers run out.

When two transitions are close to each other, they seem to be shifted in time relative to each other, compared to the actual flux reversal written onto the media. Thus, this bit shift or pulse shift phenomenon is compensated for by the left side circuitry, If a 1 bit was detected, the routing resets earlier to look for a clock pulse.  

The fault I am seeing is the arrival of the subsequent clock pulse early enough that it is routed as a data bit, not a clock bit. This is a fault of a timer that is too long. It happens sporadically and rarely, but causes a checksum validation error since the clock pulse is lost and the deserialization becomes confused.

If the timers are shortened further, there is the risk that a data pulse will appear as a spurious clock pulse, the inverse error to the one I saw. Since Xerox chose to operate the Diablo disk drives outside of their spec, which intends for a 660 ns bit cell time, timing becomes more critical.

Our big enemy is jitter - not just the pulse shifting that is inherent with magnetic reading, but jitter from other sources. The result is that the arrival timing of clock and data pulses is shifting about often, thus solutions such as fixed timing or a PLL are not perfect.

The rotation rate of the disk can vary as much as 1% per specifications, which in practice means that arrival times can be offset by up to 2% in the worst case where the writing drive is 1% off in one sense and the reading drive varies by 1% in the opposite sense. This can give us a jitter from 0 to 12 ns.

The oscillator used to write the disk is an imperfect circuit, thus its nominal rate of 3.33 Mhz can deviate a bit leading to jitter in the pulse timing of a similar amount. This might be as much as 30 ns jitter, although the rate of change is low so the impact won't affect single bit cells.

The clock rate recorded on the surface is totally determined by the writing system, in this case an Alto which wrote to the cartridge many years ago. Variations of the clock frequency are locked into the recorded signal. The receiving system (my fpga) uses the recorded clock not its own, thus it can't contribute any significant jitter.

Still, we have a drive engineered for a 660 ns bit cell operating at 600 ns with jitter than can run to about 45 ns if the worst case of clock rate and drive speed variations were to occur. This gives us a bit cell that could be 555 ns at worst case. Since each pulse produced, for clock or data is 50 to 150 ns in width, we have a window of approximately 405 to 495 to look for the data pulse.

The data separator in the Diablo uses 440 or 460 ns for long or short, which should just squeak by in the worst case. That worst case is a compressed 555 ns bit cell, with the prior cell containing a data value of 1 so that pulse shifting occurs.

Looking at the pulses separated during the anomalous period, I see a set of ReadClock pulses 600 ns apart, then the next clock pulse follows in just 300 ns. The following pulse, 600 ns later, is routed as a ReadData. 600 ns after that and every 600 following we see ReadClock pulses.

The odd issue with that sequence is that a pulse arriving 300 ns after the last clock would be a data pulse, but if it were then the clock following it is missing because the gap is again 600 ns. Something has interrupted the pulse train, chopping 300 ns out of it. I don't understand how this happened.


I received most of the components on hand that will be used to build the driver and emulator boards, once the PCBs come in. The remaining components arrive on Monday, whereas the PCBs will take another week or so to show up.

I do have to prepare the cable for the emulator role, inserting the 40 pin connector on the cable half that runs to the Alto driver board. This has the metal ground plane bonded to the cable, just as the disk side of the cable had, and must be carefully peeled back in order to install the connector.

Friday, December 9, 2016

Contemplating replacing Diablo data separator circuit with fpga based phase locked loop


I developed a watchdog timer in the fpga that will emit a signal if the time between clock pulses is 700ns or longer - this will help me flag spots where the Diablo drive is misidentifying a 0 bit with late clock as a 1 bit and missing clock. I can then reliably trigger both logic analyzer and oscilloscope to see the behavior of interest.

Data is recorded on the disk with a non-return-to-zero encoding (also called frequency modulation which evolved to MFM for the first PC disks), where bit cells of 600 ns are written by reversing the magnetic flux once or twice. At the beginning of each bit cell, the flux is always flipped, which is the clock. In the middle of the cell, if the bit in question is to be a 1, another flux transition occurs, whereas a 0 bit value does not cause a reversal.

This is why disk records start with a long string of zero bits, to help the reading circuitry to synchronize on the clock bit as the start of a bit cell. The clock bits are the only transitions when the data values are all zero.

The start of meaningful data is signalled by a special non-zero pattern, which for the Diablo is a single bit cell with a data value of 1. The reading circuits pass through the flux reversal as a clock pulse and then flip the logic for less than 500 ns so that any reversal in that interval will be passed as a data bit value. This window of time where a flux reversal is a data bit is how the circuit separates clock from data pulses.

I have found spots on the disk where a record gets a checksum error, at least on some read attempts, because the clock pulse is misreported as a data pulse. Specifically, the bit cell is meant to contain a value of 0, so that the flux reversals are the clock at the start and end. In this case, the next reversal which is intended to be the clock is passed along as a data value of 1, because the timer window is still open. The clock pulse arrived a bit early.

Now, when the clock pulse following a 0 value bit cell is misrouted as a 1 data value, and the next bit cell is also a data value of 0, then we will have a longer than usual gap until the next flux reversal. The two 0 bit cells are collapsed together as a single 1 bit.

Out of the data separator circuit, we see the clock pulses skip a beat, with 1200 ns between pulses at this point. The data pulse is also wrong and more sinisterly, the serial train of data bits is compressed with two lost 0 bits reported as only a single 1 bit. This misaligns all the remaining data bits and checksum for the remainder of the record.

It is possible that I could detect and correct for this error, but it would be complex. I would need to establish the bit rate, adjust to stay synchronized with the clock transitions and then look for malformations. One such would be a skipped clock pulse coupled with a 1 data value. I could morph that into a pair of clock pulses representing the two data bits of 0.

I would need to look carefully at all the possible cases - 0 bit cell where this happens followed by a 1 bit cell, as well as other cases where clock and data are misrouted by the data separator circuitry. In essence, I will combine these to produce the flux transitions and separate them myself using more sophistication than a simple timing window.

I can also imagine errors where the pulse is split between data and clock, when the timing window elapses somewhere in the middle of a flux reversal pulse. If I combine the ReadClock and ReadData signals myself to form the original pulse train before the separator does its task, I can bypass the source of these errors and accomplish the detection of bit values correctly.

I need a second order digital phase locked loop to recover the clock properly, then use that to indicate bit cell boundaries and watch for the data 1 pulses. The incoming signal is messy, having timing jitter due to rotational variations on both write and read passes plus bit peak shifting. It may also suffer from minor variations in the media surface and from oscillations of the head heighth.

Beginning with single density floppy disks and PC era hard drives, the written signal was 'pre-compensated' to minimize the shifting of pulses upon reading, but no such technique was used with the Diablo drives.

Floppy disk data separator chips were available but are intended to support the much lower bit rates of the original SD and DD floppy drives, 250Kbps to 500Kbps whereas the Diablo is streaming data bits are 1.67 Mbps. The phase locked loops available on the Spartan 3E fpga chips won't operate at so low a frequency, requiring minimum frequencies in the range of 10Mhz.

Thus, I have to work out a different method. Opencores has a DPLL although I am not certain how good it will be at handling disk signals.The goal is to generate a train of clock outputs that is synchronized to the time averaged clock pulses observed from the head, then use that to cleanly separate transitions that are far enough from the clock to be considered a data bit value of 1.

I will implement this in the fpga, digitally mix the ReadData and ReadClock as an input, create the scaffolding for the data separator based on this, then compare the results of this versus the Diablo separator circuit. This is going to be complex design work. 

Thursday, December 8, 2016

Emulator board designed and fabbed out


Today I completed the emulator role board design and shipped it off to a different foundery,, as they would do a quantity one order. I need to be more confident in the correctness of the board before I order more.

Multilayer view of PCB design for emulator board

The above board is 6" wide by 4" high and has an 2x20 IDC type male shrouded connector on each side. The left one is mounted on the bottom of the board and plugs downward into the female connector on the Digilent FPGA extension board seen below. The right side connector is mounted on the top and accepts the 40 signal ribbon cable that runs to the Alto computer or other system that makes use of a Diablo 31 disk drive.
Extension board, fpga plugs to left and driver/emulator board sits on top
Below is the multilayer view of the driver board, the same size as the emulator board but with different numbers of input and output circuits and connections as befits its role. It also mounts atop a Digilent extension board that will in turn plug into a Nexys3 fpga board.

Multilayer view of PCB design for driver boar

Finally, to round out the pictures, here is the Nexys3 board onto which the extension board is connected. A USB connection on the left lower side communicates with a PC in order to load and store disk images.
Nexys3 fpga board hooks to extension board

Wednesday, December 7, 2016

Driver board released to foundry, beginning on emulator board.

Wednesdays as always I spent most of the day at CHM, meeting with the 1401 restoration team members. 


I spent another day iterating with the online free design verification tool associated with the 4PCB foundry and with my design on DesignSpark PCB. I really cleaned up the design substantially, ensured a good silkscreen layer, but had to accept some spurious errors that reflect limitations of the checking software.

Specifically, a metal plated hole through a board is called a 'via' and links a copper trace on one layer to another trace on a different layer. Mostly, these are linking the top and bottom layers of the board, where traces run left to right on the top layer and top to bottom for the bottom layer. Thus, a signal can pass over another using the 'other' layer.

I designed this for a four-layer board, which has two internal layers in addition to the top and bottom. One of the internal layers is ground and the other is the +5V power. If a component is connected to ground or +5V, it has a larger via which is electrically connected to one of the inner layers.

The design review software spots vias when they have signals connected to them on both top and bottom, the typical usage, but if the via is connected internally, the software confuses the plated hole with a hole that a component lead will fill. If a component is attached to the hole, it has to be soldered on.

Any spot where soldering occurs must have an opening on the solder mask layer, all other parts of the circuit board to and bottom are insulated with a thick green coating. Since the software falsely thinks the power or ground via is a solder pad for a through-hole component, it is flagging to me that the solder mask is covering this pad. That would render the pad or component hole un-solderable but in this case, no solder is needed.

I placed the order for four boards, to arrive in a bit more than a week, at a net cost of just under $300 or $75 per board. I should have all the components to attach by the time the boards arrive, thus I can assemble them at that time.

It is time to begin designing the complementary board, for the disk emulator role, which will use the tried and true components and circuit elements, only connecting them in a different order to suit the disk signals going in and out of the Alto computer. 

Driver board PCB designed, prepping for manufacturing and components ordered


I did battle most of the day with the DesignSpark PCB tool attempting to get a good quality board designed. A reasonably large board, 4 x 6", less than 40 components, and building it as a four layer board, which shouldn't be challenging. However, I may have to hand route everything.

I finally spotted a problem, that the ground and power planes in the four layer board weren't being used for ground and +5V. That cut down on routing and had the added advantage of handling the fairly beefy current requirements of the peripheral driver chips, which can sink 150ma per signal line when it has a logical 1 value. The worst case is during a seek command, with multiple track lines active plus the strobe.

The next step was some intelligent placement of the components, knowing how they need to be routed, to minimize crossing paths and conflicts. The pull up and pull down resistors that form the terminator for the input signals were the major factor in congestion, thus they were the major focus of my work.

I also moved the filtering capacitors and the peripheral driver chips to make more room for signal paths on the board. The result was a good clean routing with no manufacturing check conditions. With that complete, I produced the outputs needed for board manufacturing - gerber and excelon format files for the various layers and drill holes.

I will order two boards and enough components to build 4 or more boards- connectors, chips, resistors, and capacitors, which will let me complete the boards once everything gets here.

The foundry I chose,, has a free service to analyze the files for manufacturability issues. I made use of it and found quite a few niggling errors and one significant one that slipped past the design checks of the DesignSpark PCB software. Time to iterate until I get satisfactory results from the foundry's free check software.

Monday, December 5, 2016

Write sector confirmed to work with new driver board, time to convert it to a PCB


I set up the contents of cyl 0 head 0 sector 7 from the archived disk image, loading it into RAM, and then issued a WriteSector transaction. The sector read back in without checksum errors. I ran the tool to ReadEntireCartridge into RAM and dumped both the RAM and validity check vectors out.

Unfortunately, there is a complication with the Digilent utility that does the upload and download of RAM. It advances the RAM address registers Reg1001, Reg1010 and Reg1011 while it loaded RAM, but does not reset them. Thus, when I thought I had dumped RAM out, I was beginning in RAM at the wrong point, producing garbage. The validity bit vector contents use their own address registers, Reg10001, Reg10010 and Reg10011 so that they worked properly.

I had to fire up the testbed again, ReadEntireCartridge again and then dump this with the RAM address registers at 0. That would allow me to post-process the file and compare the archived disk to the bitsavers image.

I found that the file read from disk was about like the last time and that the sector I had written was retrieved with zero errors, as an exact match. This completes the checkout of the new board, which I can now use in all further activities.

It is time to turn the driver board into a PCB, eliminating any point to point wiring. I will fire up a PCB design tool, draw it out and submit it to a foundry to produce several copies.

Tomorrow it will be time to resume studying the exact conditions under which sectors get a temporary or permanent checksum error - looking for anything that can be mitigated, corrected or anticipated in the pursuit of an even lower error rate. 

Sunday, December 4, 2016

New driver board seems to be working properly; built logic to write an entire cartridge in one transaction


Today I switched back to testing the new driver role board, since the logic analyzer connections and setup are mostly intact. The goal is to verify that the unit with the new board still seeks, reads and writes correctly.

I verified that the SelectUnit1 signal works, in that the drive won't turn on FileReady or ReadyToSRW unless I am selecting it, nor will it respond to commands. I also stepped it through the binary powers of cylinder address to verify that each Trackxx signal is correctly wired. I watched the heads move to the commanded values 1, 2, 4 . . . 128

Next up, I did a read of Cylinder 0, Head 0, Sector 0 which completed reporting no detected checksum errors. I also did a ReadEntireCartridge transaction and watched it walk through all the sectors, retrying on those that have problematic signals with intermittent or permanent checksum errors.

All the above looked good and I monitored the WriteGate to be sure we weren't turning this on inadvertently. It is time to capture the cartridge image as a test of the recovered data images, comparing it to the bitsavers archive version and to previous recorded images. If that passes muster, the final test is to write a sector from RAM and verify that it is read back properly; the board will be fully tested.

The captured disk image and checksum information was captured and then compared to the archived versions to see whether we seemed to be more or less reading properly. The results were quite good, similar to what I experienced with the original board.

The final test, therefore, will be to load a specific sector's contents into RAM, write it to the disk, and then read it back to verify that this works. I will undertake this tomorrow.

I did want to create a WriteEntireCartridge function, analogous to the ReadEntireCartridge but that writes an archived pack image downloaded into RAM onto a blank or scratch cartridge. We have various images from archives that it would be great to have on a disk, such as the Smalltalk system.

In most cases, we will just use the disk emulator role and not need a physical cartridge, but there is something satisfying about running the Alto from a real disk. This cartridge writing function will be a tool used only a few times, invoked as transaction code 5.  Testing will be difficult, as I need a known blank or sacrificial cartridge.

The logic was completed and synthesized, using the ReadEntireCartridge logic as a template, so that it has decent chances of working correctly on the first try, but can't try it until I have a sacrificial cartridge installed. 

Saturday, December 3, 2016

Making progress on emulator role of disk tool


I worked my way through debugging the hangup of the state machines that should be continually 'reading' the contents of the current cylinder and head, making bits available to emit whenever ReadGate is on. I went through several rounds instrumenting the fpga until I found where it was hanging.

Looking at the code, my mistake glared out at me. In one step of the state machine, I wait for the SectorMark to go on, indicating we are at the beginning of a sector to be emitted. It was here I was stalling.

Since this logic supports two roles, a driver of a real disk drive and an emulator that substitutes for a disk drive, it has an incoming and an outgoing version of SectorMark. The emulator version is SectorMarkOut, while the inbound driver version is simply SectorMark. As you may suspect by now, I was waiting on the inbound SectorMark that will never arrive while in emulator mode.

After the usual lengthy run to create a new bitstream, I ran with the state machines hung in a different place, trying to generate the preamble. I needed to reload the serializer for each word, even though the contents didn't change, but I wasn't doing that during the countdowns.

My linkages to the serializer weren't working properly, which held up the state machine that emits ReadClockOut and ReadDataOut signals. Another round of synthesis and I was back to watching the behavior of the emulator. 

I am now seeing clock and data pulses emitted, but they don't seem to be in the proper relationship to the SectorMark pulse. Time to work out how to hook up the logic analyzer in order to debug this further. 

However, I want to go back to the driver mode, test out the new version of that board I built and make sure the tool is fully ready to read and archive all the cartridges we have on hand, including the personal cartridge of David Boggs which he has loaned us for this purpose. 

Getting all the data read and uploaded is a priority of the project, after which we can play around with some packs to install different software images such as Smalltalk. 

Ethernet board working in Alto, optical mouse working as well


Today we met to work on the Alto again. We had a guest, David Boggs, who is the co-inventor of Ethernet and the designer of the Alto ethernet controller board, accompanying Ron Crane. 

Ethernet efforts

We traced signals and worked on the board. I had brought an Ethernet transceiver, but we don't yet have the cable that connects the transceiver to the controller board. Therefore, we used Ken's testbed for his ethernet bridge, monitoring the packets sent from the system. 

By the end of the day, we had good packets, thus only the transceiver and cable needs testing to wrap this up. Of course, we have to test Ken's bridge once it is through development. 

Optical mouse efforts

The optical mouse we have came with the wrong connector on the cable, which was incapable to hooking to the Alto. Al Kossow had a broken mouse with the correct cable and connector for the Alto - he donated the PCB and cable from inside.

The broken mouse he had was the Hawley mechanical type, which direct soldered one end of the cable to the PCB, whereas the Lyons optical mouse has a ten pin connector on the mouse end. I unsoldered the cable and Marc soldered the new connector on. When I reassembled the mouse and tested it, it was working properly.


I finally found my working 500K gate fpga board and began to resynthesize my logic so that I could continue testing with that board later today when I returned home. However, events conspired so that I did not get to work on the debugging at all today.

Friday, December 2, 2016

Prepping for next Alto restoration session, debugging emulator function of disk tool


I made progress on the emulator role, which is now reliably modeling the rotation of the disk, emitting SectorMark signals and the proper SectorNumber values. However, I am getting nothing from the ReadData and ReadClock lines yet.

I saw a few polarity errors, where I was blocking the outgoing signals on the wrong condition of such states as SelUnit1 and FileReady, which I corrected. I still didn't see the read data emitted, which I traced down to a test bed flaw that was commanding continual seeks. This forces the ReadyToSeekReadWrite line off, blocking the read process.


I picked up a few items we need for our next stage of work on the Xerox Alto II we restored. One of the mice we have is the optical type, the Lyons mouse, but the cable on it did not match the connector on the Alto. As well, we are early in working on an Ethernet bridge but don't have ethernet working as yet.

Al Kossow had a broken Hawley (mechanical) mouse and donated the PCB with its cable to us. On the Hawley mouse, the cable is directly soldered onto the PCB but uses a ten pin Amp connector to hook the cable to the Lyons mouse.

I desoldered the cable from the PCB, soldered on pins and inserted then into the AMP housing. Hopefully it will work properly when we hook it onto the Lyons mouse tomorrow.

Al also loaned as an Ethernet transceiver, which  we did not have, so that it is easier to debug than our attempting to drive the controller card with a BeagleBone developed by Ken. Once we know the board is working and that we can study all the signal timings to make the BeagleBone unit work right.

Tim will bring over another transceiver too, just to be sure we have one that works properly. Both of them are set up to connect to a 75 ohm coaxial cable, the actual ethernet line used with the early 3 Mb LAN. We need a 75 ohm cable of at least a half wavelength and a terminator. Hopefully we will have a cable on hand. 

Wednesday, November 30, 2016

Switched over the 1200K fpga board as the other board was unusable


While waiting to find/buy a new USB cable with a better fit to the fpga board, I can't run the disk drive when the fpga is randomly resetting or emitting incorrect signals. I will switch over to debugging of my emulator role in the interim.

Unfortunately, the situation with the connection and the fpga board is bad enough that I couldn't get the board to respond even in emulator mode. I se up a few diagnostic signals and synthesized so that I could look again when I returned from my day with the 1401 restoration team at CHM.

I found another cable and had the same very erratic behavior with the board. Since I have another fpga board, I used that instead but it is the 1200K gate version, not the 500K gate variant. Something is wrong with the USB link on the 500K board.

I had to resynthesize for the 1200K configuration before I could test the new board. This also required some config changes to the UCF file because the 1200K version of the board hooks four of the LEDs to different FPGA signals than with the 500K board.

I began testing the emulator function, slowly working through the logic to get it working. Glad to be moving forward again. I also have a reliable board now which will let me hook up the new driver board and test it out.

Tuesday, November 29, 2016

Board ready but USB cable issue, finished coding emulator role


This morning I installed the cables on the Diablo disk drive and hooked up the fpga. It was still loaded with the prior bitstream, but the changes in this interface require an update. Among other changes, this version of the board has active SelectUnit1 and ReadGate signals, reads the SectorNumber from the drive and does not receive IndexMarker.

First up should be a test with the drive powered but not fully spinning, so that I can verify that key signals such as WriteGate are not asserted.  I did this and did verify those signals, but found other issues.
New board attached to FPGA
I couldn't find my original USB cable that connects the FPGA board to the PC, it disappearing during the holiday storage, but I substituted another I found. The connection is erratic, leading to random resets, spurious data transfers and likely unrequested transaction initiations. That isn't safe with the drive spinning, so I need to resolve this before doing anything else.

I concluded the coding of the emulator task with the process to write one or more records on the current sector. It was synthesized and set up to begin testing of the emulation function, initially with the stubby monitoring board attached to the fpga board. This will let me inject given inputs and watch the outputs with the scope and logic analyzer.

Stubby board to allow easy access to inputs and outpus

Monday, November 28, 2016

Board checkout completed, ready to fire up test bed


I finally found the intermittent contact in the new disk driver board and repaired it. Going on through the remaining to finish all the continuity/correctness/shorts testing. Finally, all the careful checking was done.

Next, I populated the chip sockets, turned on power to the board and validated that injecting 1 and 0 values to either side produces the proper results on the converse connector pin associated with each signal,

The twelve input signals worked properly, swinging the fpga input pin between 3.3 and near zero volts as the pin on the Diablo connector was grounded. The initial test on the output signals, where I supplied 3.3 or 0 volts to the fpga output pins and watched the Diablo connector, didn't produce any voltage swing.

It was only after a few minutes of puzzlement that I remembered these are open collector drivers which depend on the terminator resistor network to pull up when the driver is not grounding. I have to add on a pull-up resistor and supply +5V before I could get the outputs to swing as needed.

With the pull-up in place, I verified that all 15 of the output signals would swing up to 5V when the input was pulled to ground, but would drop to zero if the inputs floated up. It all looked good, each circuit performing well.

Tomorrow I connect the cable to the Diablo, plug in the terminator, and hook the fpga board to the driver board. With that, I can bring up the new board and disk drive to test its operation on all the driver functions that had previously tested as working correctly. 

Set up to test again, checking out new board


On the design front, I worked on the logic for the emulator role where it will handle WriteGate switching on. This occurs in two cases - when writing an entire sector of three records from scratch or when updating the later records within an existing sector.

The process that is continually reading sectors should continue to run, but its control of the RAM blocked whenever WriteGate is on. The reading process lets us know which record we are in, essential for the writing process to properly address RAM locations.We don't, however, need to be absolutely synchronized to when the read thinks it is in preamble, postamble, sync or specific words of a record.

The write process will decode the incoming WriteData&Clock signal to follow but throw away the preamble and sync word, before capturing and writing the header, label or data record words into RAM. It will verify checksums just for completeness although there is nothing appropriate that the emulator can present back to the Alto if the checksum does not match.

The write process will then follow and throw away the postamble of the record it is writing. If there are more records in the sector to write, it will iterate until done with the data record. A bit tricky to set up but will work on it after I set up the test bed again.

I felt it time to switch over and test the second version of the driver role interface board I had built, the one I will be passing along to Al Kossow once we are done with our testing on the restored Alto. I performed one more continuity/shorts/wiring test then populated the chip sockets and began circuit tests.

With power applied but the fpga and disk drive unconnected, I began to inject voltages to each side in order to check that the appropriate results appeared on the other connector. Only when these are judged correct would I cable everything together to test out some reading and writing.

Ended the day chasing some peculiarities, thus no full test yet.

Saturday, November 26, 2016

Emulator role built for reading every sector as it rotates under head


Guests departing, finally back to coding and later set up to test. Have to haul logic analyzers, scopes, drives, and other items back from storage and hook them all up.

I coded the read sector process for the emulator which will continually spit out the clock and data bits as the virtual cartridge rotates under the virtual head. As soon as the FileReady is on and we are not in the midst of a seek, the process will wait for the beginning of the next SectorMark and then generate continuously.

Testing is much easier this way, since the disk tool itself will start the drive, turn on FileReady and emit the clock and data bits. All I need to do for testing is override the ReadGate input with a slide switch so that it allows the ReadClock and ReadData pulses to emerge on the outputs.

The logic for the read sector process and the other new parts are all synthesizing without complaint, thus it is time for me to set up for testing. For that, I need to get all the gear out of storage and onto the workbench. I have a party to attend tonight so testing won't commence until Sunday.

Thursday, November 24, 2016

Finishing read sector process for disk emulator role

Holiday interruption, just when I am returning from vacation interruption! To prepare for the US Thanksgiving holiday and guests, I had to disassemble and store much of my testbed and development area. On the weekend, when guests have departed, I can return my house to a laboratory. 


After I returned home from my vacation, I got back to coding of the disk emulator role for the disk tool. I have a few final bits to add to the process that emits a record - the preamble of words of zero, a sync word, the appropriate count of words for the record and the four postamble words of zero.

This will be triggered by a higher level process for the entire sector. That will set the preamble count, data word count and RAM address for the record in question, then trigger the record emitting process. The higher level process will run whenever the FileReady status is on, constantly generating each sector as it flies under the virtual disk head. It will only be interrupted during seek operations.

I spent a bit of time blocking out the higher level process which will read an entire sector, executing iteratively as long as the virtual disk is rotating and online. 

Friday, November 18, 2016

Developed logic for seeks and data separation during write to emulated disk drive


Today I built the logic to handle seeks when the Alto system requests it of the disk emulator. It is essentially ready for testing. It should be accurate in timing, appearing to take the same time as a physical Diablo drive.

I set up the input and output signals to be sensitive to the Select Unit 1 signal and also refuse to respond if the virtual cartridge is not loaded. In addition, the ReadData and ReadClock outputs are gated by ReadGate.

A key component needed to handle writing to the emulated disk is a data separator. The computer sends 100 ns width pulses, with the timing between pulses determining whether it is sending a 0 or a 1 bit. That is, a zero bit value is transmitted by delivering a100ns pulse followed by 500ns of delay before the next 'clock' pulse. A one bit value is transmitted by sending the 100ns clock pulse, a delay of 200ns, a 100 ns data pulse, and a final delay of 200ns before the next clock pulse.
I receive bottom stream of pulses, must break out clock and data as above

The data separator sees the continual train of pulses every 600 ns as the clock pulses. It watches in the interval between clock pulses to see if a second pulse is transmitted midway between - that represents a 1 bit value. I built the logic to accomplish this and to drive my deserializer which shifts the incoming bit values into words which will be stored in RAM as the record is written.

Next up is the process that will continually read the sectors rotating under the virtual head, whether the results are transmitted to the CPU or not. This is the central process that will underlie the entire disk emulation. I spent the day building this in more detail, drawing Moore and Mealy diagrams until I felt it was ready but hadn't yet begun to code in VHDL,

Wednesday, November 16, 2016

Outline of the disk emulator role logic


The core of the disk emulator logic is a process that will start up with the first sector marker it sees while the disk is loaded and not currently seeking. This will generate the data and clock signals for every sector as it rotates past the virtual head, also keeping track of whether we are starting the header, label or data record of the sector.

Those data and clock signals will not be delivered over the interface unless the computer raised ReadGate. With ReadGate true, we pass along the signals to the computer as ReadClock and ReadData.

Thus, if the computer decides it is reading a given sector, it will turn on ReadGate and see the appropriate bits stream in. If it is an update operation, it sees the first record(s) using ReadGate and then starts writing by raising WriteGate.

When we see WriteGate go true, the separator watches for alternations of the WriteClockandData line, each such reversal is seen as a clock or a data bit, depending on the timing from when we see the very first transition.

A transition that occurs less than 500 ns after the prior clock transition is read as a 1 bit. This will be followed by another transition in much less than 600ns to represent the clock pulse ending the bit cell. If the transition occurs after a 500ns delay from the prior clock, it is the succeeding clock bit and there was a 0 data value in the bit cell.

Each bit cell injects the 0 or 1 bit value into the deserializer, shifting in to form 16 bit words. The deserializer signals when a word has been accumulated, so that it can be sampled by other processes.

When the disk is selected, loaded and WriteGate turns on, it begins a process to write one or more records in the sector. The continual read process tells us whether this is going to be the header, label or data record we are writing. We begin tracking the incoming words that the computer writes to us, looking for the words that will be stored into RAM at appropriate locations.

The write process drops zero bits, looking for a 1 bit that represents the sync word which begins a record. This is what actually syncs up the deserializer to begin informing us as words are ready for sampling.

After sync is achieved, we extract 2, 8 or 256 words, storing each in the RAM slot assigned to it for this sector. Meanwhile, we calculate a running checksum using bitwise XOR and a seed value of 0x0151.

Following the last data word of the record, we extract a checksum and verify that it matches the checksum we calculated. There is no way to reflect the error back to the processor, but we can flag it for our awareness.

After a checksum word, there is a stream of up to 4 words of zeroes, which tells us to drop sync and prepare to write the next record of the sector (unless we just finished the data sector). The deserializer drops the sync condition, ignores zero bits and waits for the first 1 bit to act as the sync word.

If WriteGate is dropped, we stop and go back to idle state. The read process continues to run at all times the disk is loaded, fetching the newly written words once the virtual platter rotates the sector back under the virtual head.

We always address RAM with the current sector number and the current value of the Head signal coming from the computer. The cylinder value that addresses RAM is initially zero and is updated by any seek process.

The seek process sits at idle until the disk is selected, loaded and the Strobe signal is activated. ReadytoSeekReadWrite goes false in 2.5 us and we wait another 27.5us before signalling either AddressAcknowledge (or LogicalAddressInterlock if the cylinder requested is >202).

The process calculates the movement distance based on the prior cylinder address and the value presented by the computer when Strobe is activated. We wait 600 us per cylinder plus 14400 us settle time, to model the physical seek duration on a readl drive. The process will then respond by returning ReadytoSeekReadWrite to true which signals completion of the seek.

There needs to be a mechanism to load or unload the virtual drive. When loaded, FileReady as well as ReadytoSeekReadWrite are turned on. Unloading waits until all in-flight actions such as seek or write are idle, then resets those two signals.

It is up to the user to load or fetch RAM contents to the PC while the drive is unloaded. The emulator will serve up whatever is in RAM as data or replace selected words during writes. 

Tuesday, November 15, 2016

On vacation in Kaua'i but doing design work from the cabana

My posts will be short and erratic this week as I am on holiday. I left the Diablo drive, the dog, the 1130 and the housesitter to spend some time at the Grand Hyatt Kaua'i lurking in my cabana. 

View from my cabana


I will do the design work for the emulator role, where the tool will attach to an Alto II computer and act as a disk drive, while on vacation. I worked out the broad strokes for all the functionality, but have not begun generating VHDL yet. 

Saturday, November 12, 2016

Worked on Alto II and prepared for vacation


The team got together today to work on various open tasks and to experiment with some of the applications on the system. 

We cleaned one of the mechanical mice loaned to us by Xerox PARC and it works quite well. We cleaned a second similar mouse but a bearing near the top of the ball cavity failed making it perform erratically. Several of the balls in the bearing fell out - each barely visible to the naked eye - and it does not look like it can be repaired. Finally, we still need to acquire and connect a DE19 plug to the optical mouse that came with the system. 

We serviced the disk drive, replacing the air filter and adding a touch of the oil to the wipers on the arm bearings. The drive is working quite well. 

We undertook some scoping and capture of the ethernet connection, to provide final specs for Ken's ethernet adapter project. We weren't seeing what we expected, so the team hooked up the logic analyzer to probe the adapter more fully. 


I am away on holiday for the next week, having to put aside the work while I pack tonight.

Wednesday, November 9, 2016

Preparing to study the circuit failure in the Diablo drive

Spent part of the day at Computer History Museum working on an odd problem with card reader errors that occur only when executing op code 3 - a combined instruction that writes a line on the 1403 printer and then reads a card. 


I am going to stick the Diablo data separator circuitry onto the logic analyzer to understand more deeply what is occurring in those times where the circuits fail to properly handle timeshifted transitions and report a clock pulse as a data bit, for example.

Points to capture and feed to the logic analyzer

To do this, I can relay on three test points but still need to attach to three pins on ICs deep on the board. The board sits in a group of PCBs such that the surface of this board is fractions of an inch away from the back of the adjacent board.

There is no room for traditional grabbers without using a board extender - which I don't have. I tacked solder wires on the outsides of the selected chip pins so that the wires extended out of the congested area.

The lines were hooked up to the logic analyzer and ready to begin testing tomorrow morning. 

Tuesday, November 8, 2016

Discovered cause of sectors with permanent checksum validation errors, working on possible solutions


The bit error rate is currently around one wrong bit in every 150,500. Since there are about 4,320 bits in the data words, checksum and sync, this comes out at the 2.87% error rate of checksum validation failures.

In order for this error rate to produce a sector which passes the checksum but has corrupted contents, we need an even number of bit flips in each of the 16 positions in a word.  By position, I mean that the high order bit of a word is one position, the least significant digit is another position, etc.

If we have two bit flips, but they are not aligned in the same position, the checksum will catch it. Only when an even number of flips occur in a position will that error slip through. The checksum is a simple XOR of all the words; XOR by definition is independent in its action in each position.

To have a read which appears valid but is not, we need to have 2 bits in the same position occur in the same record. Crudely, this is 2.87% x 2.87%  for two bits and then at least 16 x rarer to have the second bit occur in the same position as the first. That is an 0.005% or 1 in 19,424 sectors. With 4,872 sectors, the chance of an entire cartridge having one sector with a false positive read is 25%.

All this depends on what causes the problems. If there are conditions that always or mostly lead to the error, then I might be able to work around it by mitigating the underlying cause, otherwise it is purely random.

In the random case, running the ReadEntireCartridge several times and using autoretry should produce files that can be merged with a majority vote to clean up any sporadic corrupted sectors that sneak through the checksum test.

In the situation where this is conditionally induced, the right combination of conditions in one sector may make it sneak through even with mulitple cartridge read passes. That is, the conditions occur in just the right (wrong) points in a given sector so that it will almost always have a pair of flipped bits whenever it is read.

I set up the scope and went after the sectors which incurred checksum failures, to look for conditions that cause the errors. It would be easiest when the conditions are in the shorter records - header or label - but essentially all of them are in the data record which is the only one long enough to have a real risk of a bit flip if the root cause is randomly occuring.

I looked at Cylinder 1, Head 0, Sector 2 which had a hard checksum failure on the data record. I found a spot in the data record where a clock pulse was clearly missing coming from the drive! Very consistent and of course it would shift all the bits over by 1 after that point.

Missing clock pulse leading to checksum validation error

Looking at the waveforms inside the drive showed that the flux reversals were a bit smaller at that spot, which could be symptomatic of a bad spot on the surface, but that is far from conclusive. Below is the output of the differential amplifier, where we see the reversal is stunted in both positive-going and negative-going limits compared to all the others.

smaller clock flux reversal at time when clock pulse is dropped

When we look at the signal after the clipping amplifier, which should even out such variations, we see the waveform below. All I notice is that the transition is a bit squeezed compared to the others, closer to where it might look like a data pulse transition instead of a clock transition.

Clipping amp output showing slight 'squishing' of transition

The suspicion is that the data separator logic is malfunctioning when presented with this waveform, neither seeing it as a data bit value of 1 nor as a clock pulse. It makes sense that timing shifts like this would be able to block the pulse, if we look at the relevant part of the J10 card schematic.

Circuits at left pick time when pulse is Data versus Clock
Once again, jitter on the disk signal is leading to problems. In this case, the next clock pulse arrives too early while the data separator is still looking for a data value transition, thus blocking the pulse from reaching the ReadClock line.

I need to validate this by looking at other test points on the card. Specifically, I want that the pulse itself is generated at TP 6, after the clip amplifier, since we need a pulse to pass through. Interestingly, there is another electrolytic capacitor at the output of the pulse driver transitor. I decided to replace it with a new known good capacitor.

I ran the same read of Cyl 1, Head 0, Sector 2 with the changed capacitor and watched on the scope. The same thing happened - lost clock pulse and spurious ReadData bit. I will defer the look at TP6 because it appears the pulse is getting emitted, it is just the timing that is far enough off to fool the data separator.

If the pulse at T6 exists, then the next point to watch is TP5 to see whether the Data Gate is on, blocking the pulse, or not. Finally, I want to look at the clock line at TP 3.  TP3 and TP5 are easily reached, but I have to find access to TP6 which is awkwardly buried in the card making it hard to reach without the card and cable extender (that I don't have).

We can see that the TP 5 Data Gate signal gets scrunched up to left of center, then loses the clock
Lack of data gate pulse leads to failure to pass pulse as Clock here at TP3
TP4 shows the pulse slipping through as Data due to separator error

Once again, jitter in the signal recorded on the platter has led to failure to correctly read the sector. In this case, the jitter has caused the data separator to fail to properly allot the transition as a clock rather than data pulse.

I can see how to spot and compensate for this particular condition - having a watchdog timer to spot a missing clock pulse - but unclear how I would handle jitter surrounding bitcells each with a 1 data bit as it would simply shift the next data bit to come out as a clock pulse.

If only the drive emitted the unseparated pulse train and let me assign them as clock or data signals, I could address the shift more easily in my logic. However, that is not the interface offered by a Diablo drive.

I need to do a lot of thinking. How the Diablo mechanism might be adjusted to handle the timeshifting better. How I might be able to recognize and compensate for the Diablo issues. What scope and logic analyzer traces I might add to the card to understand what happens with the timeshifting situations.