Wednesday, May 11, 2022

Power sequencing fixed, traced down the root cause of the machines bad behavior, majority of it is working properly


After some careful tracing and reconnecting of leads, the power sequence logic is solid and correct. I flipped on machine power and it came up to the state it had been in when the last restorer had to give up. Except for when it is in Single Step mode, the Run light goes on and it won't stop even with Imm Stop or Reset buttons. 

CE functions like Stor Disp or Stor Load can't be used because the machine has to be stopped for them to work. Further, even in Single Step mode it didn't allow me to step through the T-clock states and try instruction execution. 


Because I had fully tested the run state card (6213) already, I was confident that the problem would be due to bad input(s) to the card and not a defect inside it. Debugging would be a matter of inspecting signals two at a time on my two channel oscilloscope.

Turning on the Run state is done by a pair of flipflops that are triggered by events like the Prog Start button or interrupts. These are Advance and Delay. The output of Delay was oscillating which would indeed keep triggering the Run flipflop even when conditions existing that would rapidly turn it off - such as the Imm Stop key. 

I walked through the signals that might be driving Delay, such as the Prog Start key signals or interrupt, but didn't see anything in the wrong state. Certainly nothing oscillating. 

As a quick aside I have to mention a technique IBM used extensively in their computers from the 1950s and 1960s. Every logic gate, even every transistor, was a relatively high cost item to them and they worked hard to drive out superfluous components. This manifested itself in the use of wired-OR gates.

When several signals were combined through an OR gate in some design, they could be implemented with open collector transistors, all tied together to the same output point, with just one pullup resistor for the output. If all the signals were high, the result was high, but any low signal would pull the whole output down to low. 

Yes, that is the definition of AND logic - output is high only when all inputs are high. IBM also calls these wired-AND gates. Essentially the mirror of that is a NOR gate with all the inputs inverted - any signal low will cause the NOR to output low.

The Delay flipflop signal that was oscillating was in a dot-OR with a signal coming from off the card. This signal, -Prog Load T7 Ph B, should only dip low when the machine is doing a Program Load (boot). Since it also requires it to be T-clock state T7 and clock Phase B, the period Phase B clock was producing the oscillating signal we saw. 

I moved over to the section of the machine that control Program Load and generates the -Prog Load T7 Ph B signal. There I put the oscilloscope on the three inputs to the NAND gate that generates our signal. +T7 was high, +Phase B was oscillating but +Program Load was at an invalid voltage of about .5, high enough to appear high to the gate thus we were sending out the signal

Looking at the source for +Program Load lead me to another card. It was in gate A, compartment C1 whereas the prior cards were in gate B, compartment A1. Here on the card, +Program Load was a valid near zero low state, but on the other gate it was different and invalid.

The way the signal gets from one compartment to another, including to different gates, is over ribbon cables. These plug into the backplane around the edges, in columns A and N as well as rows 1 and 8. The ALD page will describe the location and pin for a signal that is leaving or entering a compartment. 

Ribbon cables plugging into right edge of compartment

Ribbon cables neatly folded and routed

Thus, the signal for +Program Load is generated on a card and output on a pin to the backplane. That signal will travel on a trace or via wire wrap connections over to one of the edge slots where the cable plugs in. The signal travels over the cable to the other compartment where it is plugged into that compartment's backplane. Again, it has to travel by traces or wire wrap over to a pin for the card slot with the destination card that needs this signal. 

First step was to use the continuity tester to verify that the pin on the source card and the pin on the destination card were electrically connected. Problem found! No circuit existing for the +Program Load to make to to the NAND gate which therefore was pumping out spurious signals due to a floating invalid input. 

Open circuit where connection should exist

I checked the seating of the connectors for the cable, then began testing each link to figure out where connectivity was lost. The destination pin was connected to the cable pin D06 in slot A6. The signal patch from slot A6 in gate B to slot N4 pin D06 in gate A was good. The link from N4 D06 to the card pin where the signal was generated . . . open circuit. 

I inspected visually. No sign of wire wrap on either pin, so this was a backplane trace that is somehow disrupted. With no obvious sign of trauma, the reason for the failure is obscure. Sometimes these backplanes are flexed too much by pressure put on card when seating or removing them, leading to hairline cracks. It can also be material failure of the bond of a trace to a pin. 


Where one failed trace exists, others may too. I will proceed in debugging for a while but if I encounter additional failures of signals on this backplane, I may have to laboriously verify every single connection.

This compartment is responsible for the device controllers for the selectric console typewriter, the keyboard, some pushbuttons on the console, the disk drive and some miscellaneous tasks such as Program Load. I may not find these failures until much later in the restoration, once the processor itself is free of errors and peripherals are being tested. 


I put a jumper to complete the circuit, powered up and the symptoms were gone. The machine now sits in a stopped state and responds properly to the mode switch, pushbuttons and CE switches. I used the Stor Disp (storage display) switch to cause the machine to loop through storage reading every location. 

Jumper completes the circuit

Initially I hit a word with a parity error, but after doing a Stor Load (storage load) to force every word to zeroes, a scan of all memory worked fine. I also loaded all ones just to check that it can write back both 0 and 1 values. Also good. 

I loaded a couple of instructions into memory - Load the accumulator with location 3, add location 4 to the accumulator, then loop back to repeat the add infinitely many times. Location 3 had zero and location 4 was 0001. 

Single stepping showed me that the machine was able to handle instruction execution - read the word that the Instruction Address Register points at, move the upper bits into the op code and flag registers, calculate the effective address for location 3, load the data from location 3, then shift it from memory into the accumulator (ACC). 

The next instruction also triggered addition, which I watched work successfully. Even more reassuring was the execution of the branch instruction. I used a short form MDX which has a relative displacement from the current IAR. That is, if the MDX is in location 2, the IAR will be set to 3 when the MDX starts executing. My MDX had a displacement value of xFE, which is negative 2, so it had to add -2 to the IAR value to compute the target location of the branch.

IBM 1130 addition is an iterative process. The value being added is in the Arithmetic Factor Register (AFR) and it is added to the ACC bitwise. The result of the add is in the ACC but any carry is saved in the AFR. As long as the AFR is non-zero, we have to do the addition of AFR with ACC again. This can take up to 15 cycles in the worse case. The machine has to repeat that T7 clock state until the AFR becomes zero, something I watched. 

This proved out the logic to extend a cycle by staying in T7 multiple times. It proved the adder circuitry was sufficiently sound to handle the addition. I also saw that the machine logic to step through multiple storage cycles while executing, states such as I1 (instruction 1) and E1 (Execution 1) were correctly set. 

I put the machine into normal mode and hit the Program Start key, where it began the incrementing of the ACC but then we encountered a Parity Error and it stopped. This would happen sporadically. My guess is that the core memory circuits need some adjustment to make them more reliable. However, it could be flaky signals due to bad backplane traces instead.

I entered a few other instructions, including a Rotate ACC and EXT to shift the contents of the ACC rightward into the EXT and to take bits falling out of the bottom of the EXT and place that in the vacated top of ACC. I saw the bits in ACC shift out but they never reached the EXT. This did prove the cycle control logic that counts for variable length shifts was working properly.

The bottom line is that the machine has a significant majority of its functions working properly, but a few things don't work correctly or as reliably as they should. A very good start to a restoration, indicating that we did not have widespread damage from a catastrophic power event as I had feared earlier. 

No comments:

Post a Comment