Friday, November 9, 2018

Restoring Apollo Guidance Computer, part II


Testing the analog alarm module from the B tray

The alarm module has a number of circuits that look for error conditions and raise alarms. The alarm will cause the computer to restart, by forcing in a branch instruction to a fixed address in core rope memory (octal 4000). The AGC will continue to restart until the alarm goes away, at which point it completes the restart sequence and is back in operation.

We were able to get all of the circuits verified except for one. This was the circuit that looks at the 28V in from the spacecraft bus, the 14V produced by one power supply and the 4V produced by the other supply in tray A.

It should trigger an alarm if our two generated voltages are above a max level or below a minimum voltage, and it should trigger an alarm if the spacecraft main bus (A or B) falls below a minimum.

When we varied the voltages and watched the alarm signal, it did turn on when we went into the invalid ranges. However, it latched in that condition and wouldn't turn off when the power returned to healthy levels. In a real machine, this would be a problem because the only way to recover would be to pull circuit breakers, a cumbersome and impractical action; thus we knew this should not be happening.

However, when we inserted the module in tray B and did the first bringup (below) we found that the alarms automatically reset when power reached good levels. Not sure why we couldn't recreate that on the bench, but it is significant that this one module is a different level from all the schematics we use.

First bringup of the computer (without memories)

We powered up the computer with some oscilloscope probes attached and saw that the system appeared to be executing, so it was time to wire up the logic analyzer to record enough to know what is going on.

The analyzer showed that we were in restart until the power alarm went off, but then took a different alarm and went back to restart. That alarm was parity error! Since we have no erasable or fixed memory, any read of a location will see the contents as all zero. Since this means the parity bit isn't delivering odd parity, we raise an alarm condition.

Erasable memory module (with open circuit on inhibit line)
This makes us loop, as each time we restart, our second instruction tries to execute at o4000 but gets a parity error, taking us back to restart ad infinitum.

Bypassing the parity error alarm to continue testing

A maintenance connector on the AGC provides a wire that we can raise to logic 1 to disable alarm conditions. We connected it to a pullup resistor and indeed see that the processor does not loop on restart any more.

Traces and verification in the absence of memory contents

There is a lot we can validate even with all memory accesses returning o00000 so we ran traces on the logic analyzer and pored over the entries. Everything we see is correct operation, no sign of a flaw so far.

Marc studying the logic analyzer traces
Once the system comes out of restart, it branches to location 0. This happens because the contents fetched as the first instruction come out o00000 which is interpreted as a transfer control to 0. Normally you would expect that this would loop forever fetching a transfer to zero, but a few other things are going on.

The AGC supports hardware counters that track spacecraft information. A function in the spacecraft may request that a counter be incremented or decremented by sending some pulses to the AGC. The hardware will stop executing instructions and instead force a special instruction like INKL or DINKL to change the counter.

Counters are fixed addresses in erasable memory which are updated by the cycle-stealing function of the INKL, for example. The software doesn't see the increment or decrement, but can read the current counter value at any time by looking at the memory location. With the decrement instruction, if the counter reaches zero an interrupt can be generated

This is what we see happening. After restart, we see INKL and DINKL commands executing because they are top priority and are set up by logic. Of course, the instructions fetch the current value which is always zero, add or subtract, then try to store the updated value but no memory so still zero.

The DINKL sees it result, the zero, and triggers the interrupt. Once the counter increment and decrements are done, the interrupt is processing by branching to the interrupt handlers up in fixed memory. The instruction comes back as o00000 a transfer to location zero.

The other non-intuitive result is that we don't execute the transfer to location zero, because the read request returns with a non-zero value! What? We have no memory. Well, it turns out we do have a limited memory.

The architecture of the AGC uses the first seven or so locations of memory as registers - accumulator and others. The registers need to be much faster than a core memory cycle so they are implemented with flip flops in the machine. Reading memory at 0 will return the current value of the accumulator and treat it as an instruction.

This turns into an instruction referencing memory location o3777 because the flip flop starts out with all bits turned on. We see that happen in the trace. Once the processor steps through the first few register contents thinking they are memory contents, we reach the first non-register location, get the transfer control to 0 instruction and start the loop over.

We need to put some data into the system to try out different instructions. Eventually we need to provide fixed and erasable memory plus the software that was coded into the core rope modules.


No comments:

Post a Comment