Saturday, August 24, 2024

Debugging failure during Branch and Store IAR instruction execution

THE OP CODE, FORMAT AND INDEX REGISTER LAMPS WEREN'T REGISTERING

I had to debug this first, because if the machine wasn't latching in the instruction type properly it didn't make sense to look deeply at any instruction malfunctions. After some tracing of signals, it was clear that the issue was only the failure of the signals to get up to the display light panel.

All the signals were on connector A2 of a particular compartment. I pulled the connector out and reseated it, with a satisfying click, and the lights now work properly.

IN THE PAST I SAW FAILURES WITH THE BSI INSTRUCTION, THIS WAS NEXT TO DEBUG

The instruction should store the current IAR in the target location, then set the IAR to the target word + 1 to begin executing the next instructions. This is used to call subroutines. When done in the subroutine, the address in the first word is set as the next instruction to be executed, thereby returning to the caller.

While any instruction is executing, the IAR has been bumped up to point to the next word after the instruction. Thus when the BSI stores the IAR, it is pointing to the instruction immediately after the BSI. Coming back from the subroutine will resume executing at that next instruction. 

The error was a parity check in the midst of the BSI instruction, as it has stored the return address in the word at the target location but before execution starts at the following word (target+1). Fortunately, this was NOT a problem in the memory unlike the real parity problems I had been dealing with.

The return address that was stored had the correct parity when read from memory. The parity check stop was a false error, caused by defects in the CPU circuitry rather than the memory. I began capturing signals with the oscilloscope to determine the failing component.

The parity scheme in the IBM 1130 is to have odd parity for each halfword. That means that for either bits 0 to 7, one of the halfwords, or bits 8 to 15, the other halfword, the count of 1 bits must be an odd number. The parity bits are used to ensure that condition is satisfied. Thus, if the number of 1 bits in a halfword is even, the parity bit for that halfword is turned on so that the halfword plus its parity bit had an odd count. 

The particular value I was troubleshooting was an IAR value of 1001 that would be stored at the target address. Since each halfword has only one bit turned on, they are already at odd parity and no parity bits would be turned on. After the false parity check, indeed the value in memory at the target address was 1001 with both parity bits off. 

I had the scope checking the signals that produce the parity check at clock step T6. I saw that the signals for the two halfwords were initially good - indicating that parity was odd - but then switched to believing the parity was even. The clock step T4 is when the parity bits would have been set but they were both odd then. At clock step T5 the parity of the second halfword changed to appear to be even, but it was past the point where the parity bits are changed. With the state remaining even, at T6 the parity check latch is set and the machine stops.

The green trace has -even flipping between T4 and T5

I then put the scope on the sources that generate the green signal (-even 2). That is a tree of Exclusive OR (XOR) gates connected to the eight bits b8 to b15 as well as parity bit 2. I traced down the tree watching the parity state flip between odd and even, until I found the XOR where the inputs didn't justify the output.


Bits 14 and 15 feed into the XOR gate and the output has the error shape that matches the -even 2 signal. You can see from the scope output below that neither bit 14 nor bit 15 change between T4 and T5 but the output of the XOR flips.


The dark blue trace is the -parity check signal, which turns on (goes to 0) at T6. Yellow is the output of the XOR gate, while green and purple below don't change. Further, they do reflect that the low halfword is 01, with only bit 15 at a 1. The slight glitches are a result of imperfect ground connections for the scope probes. 

TWO IDENTICAL CARDS USED, ONE PER HALFWORD

The 9 signal XOR gate tree that calculates the parity is implemented on a 3022 SLT card. Slot L7 has the card for the low halfword, while K7 has the same card type, used to check the high halfword. I swapped the two cards to verify that I likely have a defect in the 3022 card. 

The failure of the BSI instruction went away with the swapped cards. However, when I began to run the code segment to check the keyboard entries, I experienced parity faults in a different spot. I interpret this to mean that the fault moved to the high halfword and signal -even 1 because it is the card itself that is defective. 

My plan for tomorrow is to put the 3022 card on the SLT testbench and test its operation. If I can find and reproduce the fault on the bench, then I can find a way to replace the XOR gate SLT module. This card has eight of the 361477 SLT modules that are XORs plus one 361451 SLT module which is a standard And-Or-Invert gate, acting as an NOT gate in this case. 

The XOR module has the following circuit:


Pads 1 and 6 are connected together on the card, with the two inputs to the XOR connected to pads 3 and 6. If pad 3 is high, its transistor T1 conducts. If pad 6 is low at the same time, then the transistor T1 is conducting to pull the circuit connected to pad 11 to ground. With pad 11 at ground, transistor T3 will not conduct so the output at 9 is pulled up to +3 because pads 8 and 9 are tied together.

If both inputs are low, neither T1 nor T2 can conduct so the base of T3 is pulled high, causing it to conduct and pulling output pad 9 to ground. If both inputs are high, both transistors T1 and T2 won't turn on because their emitter is as high as the base. Again, T3 activates and pulls pad 9 to ground. 

Thus, the only way to get a logic high output (+3) is when one input is +3 and the other is 0V. The circuit is symmetric so we don't care which one is high as long as they differ. 

I do have the card schematics and know which XOR module is bad, assuming I reproduce the failure on the testbench. Then it is just a matter of finding a spare XOR module to replace the faulty one. 

Actual 3022 card schematic


No comments:

Post a Comment