Sunday, October 29, 2017

7906 disk fault corrected, working on 2645A tape drive capstans and debugging some HP 1000 F processor faul


Tape drives in 2645A terminal

The capstan tires I ordered will slip over the metal capstan wheel as a replacement for the rubber coating that liquified with age. This failure of the rubber rollers is quite pervasive across many types of tape mechanisms from audio to video to computer peripherals, affecting almost every mini cartridge drive mechanism in existence. 

Capstan wheel with remnants of liquified rubber still to remove
I cleaned the existing rubber sludge from the metal capstans in preparation for the new tires, but had to set everything aside waiting for the tires to arrive from China in 2 to 4 weeks. Once these work, the terminal can be used to read and write data, programs and even as a device for booting the HP 1000. 

7906 disc drive

When I left off, prior to my trip, the drive was displaying a disk fault error. There are several causes of this, differentiated by small LED lights on the controller board. The failure I see is either loss of one of the critical power supply voltages or a fault in a line of microswitches, jumpers, thermal switches and other devices called the interlock chain.

I had tested the output of the power supplies, all of which appeared nominal. Based on that I was convinced that the problem was in the chain. Setting up the voltmeter on a suitable component on the controller board, I powered up but the voltages from the interlock chain seemed correct for a no-fault condition.

It was time to test more comprehensively on the controller board, determining where the fault logic is set and tracing it back to determine what component has failed. Eventually I found that the spindle encoder board connector was slightly loose, breaking the interlock chain. With that repaired, the fault condition went away.

I let the blowers run for quite a while, interspersing spin-up cycles where I let the drive get to nearly full speed before flipping the run switch off. I want to dislodge any specs of dirt or other contaminants that might remain inside, before I let the heads load onto the disk.

In the final run, I flipped off the switch just as the arms began to move out over the disk surface, but before they were actually loaded. I still don't like the sound of the bearings on the spindle motor, although in some cycles it is quieter than in others.

It is possible that the grounding strap on the bottom of the spindle is what is making the noise, rather than bearings, since it was initially quiet then progressively returned to the louder sound. I will wait for a new day to let the heads actually load, hoping that I don't have any crash on either the fixed lower platter or the upper platter in the removable cartridge.

HP 1000 processor

Prior to my trip, I had been running through all the tape based diagnostics for the system when I began encountering problems. The tape wouldn't boot any more, which is either a new fault in the processor or the tape drive/controller. I set up to recreate the issue and begin debugging.

I yanked the timebase generator and the second 12996A serial board, tested and found everything good again. May be gremlins, might be one of those boards. In any case, I completed the full FPP, FFP and SIS test with no problems.

I next reinstalled the timebase generator and ran the diagnostics for it. That worked perfectly - it is a long sequence taking almost 20 minutes but just when I was ready to give up, the longest test completed and it raced through to final status.

I wanted to test DMA, memory protect, and some of the IO boards but in all those cases, they required some special test fixture to plug into certain IO cards or some manual operations such as reversing parity by grounding some pin.  It appeared that everything was solid in the processor itself and with the timebase generator, tape controller and my primary serial card for the console.

I plugged in all my boards and connected all the cables - second serial, microcircuit, HP-IB, and disk controller. With everything in place, I powered up to rerun some diagnostics but found that the basic loader was hanging before it read in the entire boot file. This was the same kind of problem I faced before today.

It was time for lunch, but later today I will yank one card at a time out of the IO board until I find the one that is causing this problem. Cycles of power down, card manipulation, power up, loading tapes and attempted boots, a few minutes apiece, so this took some time.

The boot loader is looping waiting for the next word to be flagged as ready to pick up. After yanking every card except for the tape controller and terminal card, it still failed to read the boot record. I swapped to the other diagnostic tape reel but still have the problem.

At this point I have to assume that the recurrence when I plugged in all the cards was coincidental. I either have some flaky connector somewhere or the controller card or the tape drive itself is flaking out. I wiggled all the connectors both inside the tape drive and on the HP 1000 itself, but the tape continues to hang.

The question is why it was hung before I left, why it was okay again when I started this morning and what pushed it back over into failure. It may be some component getting hot after prolonged use, then cooling. I will let everything sit for hours until it is at ambient temp, then try again.

When it was cooled and I gave it another try, the same hangup was experienced, making the temperature based failure theory less likely. I yanked the serial card entirely, leaving nothing but the tape controller plugged in. Viola, I could boot the tape again.

Since I have two cards, I put in the other terminal (serial) card and could again boot up. Using the diagnostic monitor, I selected the floating point instructions diagnostic once again and let it run. Unfortunately, it almost immediately reported an error attempting a FADD operation.

When I tried to run it again from the bootup, the diagnostic monitor halted with an error 010675 which means that the code took a dive to a random location. The first 4K of memory is loaded with this halt as a way of trapping any such failure to execute properly.

Something is sick in the processor, I conclude. Time to step through the diagnostics from the most basic onward, hoping to find the most basic fault first. It is likely that the failure of some instruction type or facility is causing the branch to a random location, thus I want to find that errant logic and repair it before running through too many more of the diagnostics. 

I stepped through the basic instruction tests:

  • memory reference
  • alter/skip
  • shift/rotate
  • EAU
  • extended indexing
I didn't complete the extended word/byte tests because I need to add another IO board to be used for interrupts. The same is true for my test of floating point, scientific and fast fortran instructions (FPP/SIS/FFP). I planned out my next set of runs.

When I run the floating point instruction tests, it fails hard on FADD. When I run the FPP/SIS/FFP diagnostic, it fails on the FIXS instructions. These are the first of a set for each diagnostic. I don't know whether it is my extended microcode failing (FEM board) or the floating point processor hardware, but something is sick.

I went out to run full memory diagnostics and to add in an interrupt card to compete the extended word/byte instructions test. Both passed with flying colors. Using the interrupt card with the FPP/SIS/FPP diagnostic, it ran further, but in the midst of the SUB test I saw the S register lights go dark, along with the OFLOW and EXTEND lights. 

I waited a long time but nothing further happened. The absence of these lights is highly, highly unusual for a running program and suggests some microcode going awry. My suspicions are leaning slightly toward the FEM board at this point. 

No comments:

Post a Comment