Saturday, November 10, 2018

Restoring an Apollo Guidance Computer, part IV


Looking for suspected design flaw/race hazard

Mike Stewart had modeled the exact schematic of the AGC, gate by gate, which he had used both to understand the system using simulation and to create his own instance of an AGC in an FPGA board. One of the discoveries from his simulation was a glitch or transient signal that impairs operation of one of the hardware alarms. The question - does this occur in the real AGC?

Since the Apollo Guidance Computer acts as the digital autopilot for the astronauts, is in between the hand controllers and the attitude jets, and is responsible for navigation, it must be very reliable. A number of hardware checks are built in to detect software bugs by recognizing undesirable behavior.

One of the clever checks added to the hardware is a timer that watches to see how long it has been since a control transfer instruction (e.g. branches) have been executed. If code is in an infinite loop, this timer would signal an alarm and force a restart of the computer, hopefully restoring function to the spacecraft.

Mike had discovered a transient that would occur relatively frequently that resets the timer, the way it would be reset by a control transfer instruction. Since this glitch is occuring independent of branches, it has a good chance of defeating the check and delaying, perhaps indefinitely, the detection of the infinite loop.

We set up the oscilloscope to watch the condition he had observed in his simulator and indeed this glitch occurs relatively often. He had developed a test for this problem, creating a two instruction loop with a clever program that does not use transfer of control instructions. When he runs it in the simulator, in the software AGC simulator available on the web, and in his FPGA, it proves that the detection of the loop is seriously delayed because of this design flaw.

Testing a longer stream of instructions

Mike adjusted his memory tool and achieved much longer run times, for example having the AGC execute a few thousand instructions of the Aurora core rope software beginning with the restart/start code that the hardware runs at o04000

All of the instructions examined produced the proper results, although the flow of the real AGC diverted from the flow under the simulator at the 100th instruction. The deviation was not due to a defect in the AGC but completely reasonable because of the state of the memory tool right now.

The memory tool is currently wired up to serve up only the fixed memory (core rope memory) contents. This means any access to erasable (RAM) memory returns o00000 as a result. As code executes, at some point a conditional branch will go a different way, because the erasable memory would have had a non-zero content but under the tool it is different.

The actual case we monitored was not an issue of branch conditions, but due to a programming technique that was used. The address of a routine to execute was stored in erasable memory and then the programmer attempted a Transfer Control through that address. Under the tool, we branched to location 0 instead.


Our team reached the end of the trip and we are flying back home this weekend. It has been a very satisfactory first week of work but more needs to be done to achieve our goal of a demonstration of a fully restored AGC running actual Apollo mission software.

Different memory tool implementation

The temporary memory tool is connected to the AGC by a myriad of wires pushed onto the wirewrap pins of the tray A backplane. This is not a rugged or cosmetically appealing approach for the long term. Instead, Mike is building a different version of the tool that will connect to the AGC using the maintenance connector (A52) that is accessible on the outside of the computer with all covers closed.

A52 was used to connect to Ground Support Equipment (GSE) and test out the AGC in a number of locations. Raytheon used it during manufacture, North American Aviation used it to test while installed in the Command Module, Grumman Aerospace Engineering Company used it to test while installed in the Lunar Module, NASA used it to check out the spacecraft in the Manned Spacecraft Operation Building (MSOB) at Kennedy Spaceflight Center, and also used it for testing while the vehicle was sitting on the launchpad.

This connector allows the monitoring of key conditions and control of the computer, including the ability to deliver data that overrides the contents of AGC memory. Initially, A52 allowed override of core rope memory contents but not erasable memory. Later changes added the ability to override the erasable as well.

Our AGC, which was one of the last prototypes before the production versions of the computer were shipped, varies only slightly from the final machine. Our chassis is made of aluminum instead of magnesium, as the weight reduction was important to the production machines that would fly. Most of our modules are not potted, which is great for testing and repair. However, the function to allow override of erasable memory was not installed on our box. We intend to add the option.

Adding MAMU pin to A52 maintenance connector

The signal to override erasable memory (MAMU) is a trivial change to implement and simply adds a standard feature of all production machines. We found a spare NOR gate with an open collector, allowing us to tie that to the existing NOR gate which drives the strobe to read erasable memory. Thus, when our gate activates, it will suppress the strobe.

This requires that we add three wires to the backplane. One will run from the output of the spare gate we chose to the output of the existing eraseable strobe gate. One will run from the unused input of our spare gate to ground. The third wire will run from the open input gate of our spare module over to the MAMU pin on connector A52.

We will wire wrap these new wires, using a different color wire and documenting what we did, allowing the AGC to be restored to its 'as found' state if desired by the owner sometime in the future. Restorations of rare artifacts typically apply a similar process, saving any removed parts, marking any substitutions, and providing a path to restore the artifact to its prior state.

Core Rope Simulator activation using a team built driver system

Our AGC was run with Raytheon built modules installed in the core rope slots. These modules were part of a core rope simulator that connected via two thick cables to some GSE which would respond to any fixed memory request with the appropriate data word. Unfortunately, we did not receive the thick cables nor the GSE, just the two modules which plug into the AGC.

Ken Shirriff spend much of the week reverse engineering the Raytheon modules. We have no schematics or other documentation on these, but think that it would be possible to connect to the modules with some electronics of our own devise that would drive them. In other words, a partial substitute for the GSE associated with the rope simulator.

He is wrapping up the documentation and study of these modules. With a good definition for the 31 data and control lines running over the thick cable, he can control these modules using simple digital signals over differential wire pairs. We have found an old connector that will mate with the Raytheon modules, except that some keys on the connector block insertion.

We will mill off the keys that don't match so that we can insert our connector into the module, substituting extreme care for mechanical keys to ensure that the connector is oriented properly.

Once we have circuits to control the Raytheon modules, they can respond to fixed memory address requests and serve up the data words from within our unit. This is the preferred way to provide fixed memory to this computer, since the modules are standard Raytheon units that were on the machine and historically accurate.

Rope software is available in many versions, corresponding to most Apollo missions. We will have these in our box and selectable to run the AGC as a CM or LM on whatever flight we desire.

Repair of erasable memory module and checkout in the AGC

The preferred way to supply erasable memory to the AGC is by repairing the actual module from this computer and running with it in tray B. As previously described, the module has a pin with open connectivity where we should have a inhibit line for bit 16. We hope to repair this defect.

The module is fully potted - filled in with a solid material covering all the wires, pins and interior components. We believe this material to be RTV-11 but will do some testing to confirm. If RTV-11, we know of solvents that will slowly dissolve the potting material allowing access.

We don't want to remove all the potting. Instead, by analogy to arthroscopic surgery, we want to make the smallest hole feasible, accomplish the repair, and fill the space back in using new RTV-11. To make this happen, we must know exactly where the break is within the module and be sure it is in a location where it can be repaired.

Apollo era documentation informs us that quite a few of the modules failed during testing or use, but these all were breaks where the wire runs from the external pin down to the memory stack itself, not somewhere inside the memory stack. The stack is folded multiple times to fit in the small space, making access to wires inside the stack impossible.

To spot the location, we are going to make use of a 3D tomographic X-Ray machine to give us an exact picture of the location of the fault. Based on the X-Ray, we will build a plan for the repair, make our opening into the potting and hopefully fix the module. The X-Ray is scheduled for late December, based on availability on the high tech imager.

Conversion of DSKY substitute prototype to a finished and rugged state

My DSKYsubstitute was a breadboard hodgepodge used to prove out the design before I commit it to printed circuit boards and more permanent construction. We are not able to drive it from the AGC yet, because the software that communicates with it hasn't run and won''t until we can provide erasable memory, either with a tool or the repaired module.

I was, however, able to test it with a pattern generator and look at the function of much of the unit. We discovered a few places where the design has to change, such as in the method of flashing the Verb and Noun displays. We also discovered the potential to react too quickly to the AGC output and select the wrong digits, signs or lights.

All these learnings will be fed into the design, an improved prototype will be tested and then I will build a PCB based substitute that can fit into an aluminum case milled to match the real DSKY. We were fortunate enough to receive the milled case from the owner of the AGC and can use it to house the substitute.

Producing realistic spacecraft inputs to the AGC

One final aspiration would be to produce realistic inputs over the main spacecraft connector (A51), such as the pulses that come from the inertial navigation unit, telemetry uplink, and so forth. Similarly we would find ways of displaying important outputs of the AGC.

As an example, if we have the inertial gyros and accelerometers providing input, and can display the pulsing of the Reaction Control System (attitude jets), maybe implement the hand controllers and watch RCS activity as we request rotation or translation, then we would have a more compelling demo for the 50th anniversary of the Apollo 11 landing in July 2019.

No comments:

Post a Comment