Rescue 1130: 2014 Pickup of an IBM 1130 System and More: Restoring Apollo Guidance Computer, part I

RESTORATION OF APOLLO GUIDANCE COMPUTER

System oscillator (2MHz)

We provided 14V and 4V to the clock oscillator module (B7), limiting current to protect against any failing capacitors or other shorts. The output was beautiful. We have a pulse. It is running at 2.04801 MHz, amazingly close to spec for a machine that hasn't run in almost 50 years.

Pulse of the Apollo Guidance Computer

Testing the logic modules (contents of tray A)

The AGC is built of two trays that are bolted together, with mostly digital circuits in tray A and mostly analog circuits in tray B. For example, the master oscillator is module B8 because it is analog, but all the further processing of the clock is done digitally, in modules A1 and A2 in tray A.

The logic modules are constructed exclusively of dual NOR gates, each with three inputs. These are Resistor-Transistor Logic (RTL), which was quickly obsoleted in the industry by DTL, TTL, CMOS and other schemes. After a while working with the schematics, one gets proficient at quickly understanding designs using only NOR gates. No flip flops, no other type of boolean operations at all.

The RTL chips operate with 4V power supply and have a pretty low threshold between logic 0 and logic 1 condition. We found voltages as low as 0.8V that were successfully recognized logic 1 states in the running AGC. Logic 0 inputs were typically under 0.3V in actual operation.

Clock dividers and scaler, other timing signals

The oscillator delivers 2MHz to modules A1 and A2, which first divide the signal to the master oscillator of 1MHz and then produce various timing cycles that will drive the rest of the computer. Most notably, the computer is designed around core memory cycles which take almost 12 us, using twelve of the 1MHz pulses as stages 1 to 12.

We injected appropriate clock signals into the two modules as they sat on the workbench, out of the AGC trays, and verified that all of the timing signals were being produced at the proper time. We soon discovered a failure!

The AGC uses a variety of timers of different durations, from almost master clock frequency down to one that has a pulse once every 36 hours. This is implemented with a chain of dividers in A1. We found that the time pulses stopped somewhere in the middle of the chain.

All the other timing logic in A1 and A2 worked fine. With the failure of our clock scaler, the machine would not properly detect certain errors (watchdog alarms) that depended on slower pulses, it would never go into Standby mode, and of course mission software that needed longer timers wouldn't work right. Most of the machine would still work even with the one gate failing.

Timing signals (some of the 12 steps of a memory cycle

Investigating the failing NOR gate

After some probing around and other testing, we concluded that the output of one NOR gate was up at 4V. This means it was shorted to the VCC power supply. Inside the NOR gate there is a pullup resistor between VCC and the output, but this was shorted. It is an odd failure, one that seemed only possible if some bit of conductive material had fallen across some the two traces, VCC and output; these traces must be close to each other somewhere for this failure to have occured.

We shook the module with the package facing down, retested, and found that the problem had cleared. We will perpetually have the conductive junk, probably a solder blob, inside the sealed gate. Problems could recur but we know that some shaking will clear it up again. We will live with this situation.

SQ (instruction op code) register and decoding module

The next module we placed on the bench, A3, provided the storage of the instruction operation code and performed some of the initial decoding. A module has two boards, one per side. Each board has up to 60 flat pack ICs with two NOR gates apiece.

We worked to find combinations of inputs that would put each and every gate through all its states and give us a way to observe its output. That is, we wanted to see that the output was high when all three inputs are low, but that each input when raised would drive the output low.

This process was very labor intensive and slow, because of the need to find a way to test every input of every gate, isolating others. Since many of the 120 gates on a board in a module are only connected internally on the multilayer circuit board, this is logically challenging.

Some checking of the logic modules

After ten hours of work we had confirmed that every gate worked properly. One possibility was to continue this way, bench testing modules A4 to A24, but at ten hours each that would be a very long process.

Decision to check for destructive failures only in the remaining logic modules

We decided that our sample of 720 gates with only one intermittent failure was a sign that we would expect a low rate of defects. Thus, it would be reasonable to put all the logic together, power up, and debug using logic analyzers. A small number of defects can be chased down this way, but a system with high rates wouldn't do enough to capture anything meaningful.

The only type of failure that would be physically damaging would be short circuits that might produce high current and melt something. We have power supplies that will limit the current to a set level, which means we can power a module and see if the current is too high but not let it go high enough to cause damage.

We proceeded to power each of the remaining logic modules, board by board, while monitoring current drain. They all passed and ran at levels between 80ma and 110ma per board. Given this, we could stuff it all into tray A and power up the machine.

Testing the twin power supply modules

Tray A also has two power supply modules. These are identical but look at the connections across certain configuration pins to decide if they will produce 14V or 4V, the two main voltages used across the AGC. We bench tested each module with both configurations, even though in actual operation each module is assigned one of the voltages.

During power supply testing

The power supplies produce two outputs - switched and unswitched - so that the computer can sit in standby mode with very low power consumption since only a small subset of circuits are powered. We checked that both the switched and unswitched outputs were proper and clean.

Testing the analog interface circuits between the rest of the spacecraft and the AGC

There was five remaining modules in tray A and they were not digital logic. These provided the interface circuit to couple a spacecraft signal, typically implemented as 28V / 0V logic levels and the AGC gates that run with 4V / 0V levels.

We set up the bench to power each circuit in each module, inject the appropriate input conditions and verify the correct output. This work only took a half day to wrap up modules A25 to A29, and once done, let us fully populate tray A for testing.

Testing the analog modules

Working on tray B - the memory and alarm functions

Tray B has the master oscillator (B8) which is needed, plus an alarms module that checks for various analog style failures. For example, it verifies that the clock isn't running too fast, that the 28V input power and the internally generated 14 and 4 volt power are within limits, and other checks.

We did not install any of the modules involved in memory access - rope or erasable memory drivers, selection logic or sense amps. Nor did we install the erasable memory module itself. The six core rope module slots on the outside of the AGC are also empty.

We then plugged tray A and B together. Power and control signals must move between the two trays over connectors that only mate with the two trays are together. The module faces of the two trays face each other when mated, with the wired wrapped backplanes facing outward.

Trays mated, tray B on bottom with open slots for core rope modules

Examination of the core memory module (erasable memory)

Mike checked continuity of all the pins on the erasable memory module. Since this is one of the few modules that is potted, we couldn't see inside and only had the pins to test. Modules on computers that would fly on missions had every module potted. A material like RTV-11 hardens around the components and holds them from vibration.

Unfortunately, we discovered that one of the pins was an open circuit. Each bit of the memory has a power feed and two inhibit line returns (among other pins used for other purposes), but bit 16 has an open circuit on one of the two inhibit lines.

That will mean that at least half, if not the full 2K of memory will always have bit 16 set to 1. Not only will this limit what op codes and data we can use, even for limited testing, but it will cause the parity check to fail on average half the time (i.e any time when bit 16 should be zero but is read as one). We will need to correct this in order to have a workable erasable memory.

Examination of Core Rope Simulator modules

This AGC came with a rare Core Rope Simulator rather than six actual rope modules. The two boxes we have plug in and act as the six rope modules, but are cabled to an external box that we don't have to complete the functionality.

Typically, the rope simulator has a magnetic tape drive and some RAM. A new version of software is read from the tape, put in RAM and then served to the AGC as the contents of rope. Test and development centers would find a unit like this convenient, instead of waiting for a real rope module to be wired and installed to make any change.

Since we don't have real rope modules and don't have a complete Core Rope Simulator system, Ken is reverse engineering the simulator boxes we have. We need to do this in order to create hardware to drive the boxes. Alternatively we will need to build a different kind of tool or box to respond with the proper data word anytime the AGC requests it.

Ken reverse engineering the core rope simulator boxes

Verification of level of the AGC

Mike went through the backplane wiring and logic module, looking at documentation of each of the versions of the AGC and comparing what we had to those variants. He discovered that our backplanes were brought up to the latest revisions, although a couple of the modules were not equally updated.

All this meant was that some added functions, while wired on the backplane, aren't instantiated in the modules. Based on the product code and serial number, our AGC was initially built as one of the last of the block II prototype systems, just before Raytheon switched over to production versions.

The prototypes went to the Kennedy and Johnson space centers and to the major contractors such as North American Aviation and Grumman. Ours was used at Johnson space center, although it didn't have that name back during the Apollo program.

The reworking of our unit made it functionally identical to the production versions that would fly on the various missions. This was good news since we want to be able to run any mission's software.

First bringup of the computer (without memories)

In the next part of this blog we will bring up the AGC, without memories, to see in what shape the computer is after all these years.

TO BE CONTINUED

Rescue 1130: 2014 Pickup of an IBM 1130 System and More

Friday, November 9, 2018

Restoring Apollo Guidance Computer, part I

1 comment: