Monday, September 2, 2024

Wasted a few hours debugging a non-issue

SWITCHED THE TRIGGER SIGNAL FOR THE SCOPE - AN UNFORTUNATE DECISION

Because my scope only displays four signals at a time, having to dedicated one of them to catch the Parity Check flipflop being set allows for only three signals of interest to be displayed. Every time there was a Parity Check during the XIO Read of the keyboard, the signal -I/O to B Sample Pulse was reliably activating to latch in the keyboard data in to the Storage Buffer Register (SBR).

I therefore decided to move the trigger condition to that signal, which I needed to observe to see the relative timing of the other signals, freeing up the fourth probe for other uses. This seemed to be a good decision, but after more than four hours troubleshooting, diverted away from looking at the actual error conditions, I was ready to grab schematics and bench test a 7343 card which generates a Storage Use signal I believed to be in error. 

However, once I got home and began a blog entry, I realized in a blinding flash that I went down a rabbit hole because this particular -I/O to B SP signal was not related at all to the XIO Read. The anomalies I was seeing in the signal were because it was being used for a very different purpose. I naively assume the SP signal was not being used other than to read the keyboard, but I was wrong. 

SOMETHING LOOKS WRONG AND I ASSUME IT IS THE CAUSE OF THE FALSE PARITY STOP

Now that I was triggering on the -I/O to B SP signal, I didn't have to wait for a parity error, I could look at any key press, even successful ones. The traces however had an odd look.


The SP signal was doubled! On the prior days of debugging I only saw a single pulse, which is what I expect to see, but now there are two. This could be tricking the parity checking circuits, I thought, and looked like the smoking gun. 

The -I/O to B SP pulse is generated when the -Entry Sample signal goes low - at the falling edge. When I put the scope on that signal, I saw two lobes. This triggered two pulses since there were now two falling edges. The generation of that signal, however, involves the dreaded IBM dotted-OR making the debugging process frustratingly slow. 

DOTTED-OR, THE ACCOUNTANT'S DREAM AND DEBUGGER'S NIGHTMARE

In the early days of computing, every gate and component was relatively expensive. IBM became adept at minimizing parts counts, with each diode or transistor eliminated lowering the costs of a product. The dotted-OR and its sibling the dotted-AND were widely used in the machines to strip out cost.

Modern designers, when building circuits to produce a signal reflecting multiple cases, would connect the logic gates for each case to an OR gate which yields the final signal. When the final signal is true, it is relatively easy to find which of the cases is causing that, since each case has its signal on a different input of the OR gate free from interference by the other gates. 

IBM would skip the final OR gate. They would just wire together the outputs of multiple gates covering the many cases, calling it a dotted-OR. This was possible because of the Diode-Transistor Logic nature of their circuitry, where a single transistor inverts the logic function implemented by diodes. That final transistor is either on, conducting to pull the output down to ground, or it is off. If the circuit has a pullup resistor, the output will jump to to 3V when the transistor is off or down nearly at 0 if the transistor is on. 

When multiple gates are wired together, they are like open collector gates connected together. Any one or more of the transistors, when turned on, will pull the wired junction down to ground. If none of the transistors are conducting, then the junction can pop up to 3V via one or more pullup resistors. This is also analogous to shared bus designs with open collector gates. 

The result therefore is the same as if an actual OR gate were used, but the costs are lower. There is one downside, however. If a dotted OR is connected a dozen gates (for a dozen cases that can produce this signal), there is no easy way to find which of the dozen gates is active. The junction drops to 0V and all the gate outputs, wired together, will show 0V. 

As a practical matter, then, the debugger has two possible situations. First, the actual circuitry of one or more of the gates might be defective, producing an activation when the input signals should not cause that to occur. Second and more likely, the inputs to one of the twelve gates are correctly causing the gate to activate, because the inputs are wrong. The error is therefore further back, in the circuitry that produces the incorrect input signal. 

MADE IT THROUGH ONE DOTTED OR, ONLY TO FIND ANOTHER FURTHER BACK

The -Entry Sample signal is produced by a dotted-OR, thus I had to hook up many permutations of the input signals that could activate any of the gates in the wired junction. Quite a bit of time was spent studying logic diagrams, moving probes and taking samples. 

Finally I found the case that was triggering the double pulse I saw on the scope. A particular gate combined signals +T1, +Non Storage Use and +T Clock Advance SP with those inputs exactly matching the output of the dotted-OR. 


The yellow trace is the -Entry Sample output of the dotted-OR. The green trace is Non Storage Use, the purple trace is the T Clock Advance SP and the blue trace is the T1 signal. While the T1 signal is active, there are two edges of the T Clock Advance signal and these plus the Non Storage Use produce the double lobed signal at top. 

The anomaly is the Non Storage Use signal, which is high in the trace but that did not make sense. It indicates that the core memory should not do a read then write cycle. Since we were successfully fetching and executing instructions from memory, this signal looked wrong. 

That signal, Non Storage Use, is generated by another big dotted-OR. The circuitry for that is contained on one card, a type 7347, which has internal flipflops and other circuits that can't directly be monitored on pins. It is also a card for which I could not find any schematics. 


The logic page is almost totally implemented in the one 7347 card. I tried to work my way through all the conditions, observing with the scope, to try to find why the +Storage Use output was low and after feeding through an inverted becomes the logic high signal +Non Storage Use.

I didn't spot the conditions with all the signals I captured. I suspected that perhaps the inner flipflop Stg Use was set somehow, but I had no way to observe it directly. I began to draw up a bench testing plan for the card where I could experiment to see if defective circuitry was causing it to generate a low output for +Storage Use. 

It was when I got home that the thunderclap of understanding struck. I had ignored a key cue when looking at the condition producing the -Entry Sample signal, the presence of T1 in the signal mix.

T1 would be nowhere near the point where the XIO Read was latching in data to write back to memory. Every core memory cycle steps through the eight T clock steps T0 to T7, with the read taking place during the first four steps and the write (or writeback if no change is to be made) occurring in steps T4 to T7. 

The -I/O to B SP pulse that was important for the parity check issue was the one taking place early in the write cycle, at clock step T5. This latched in the new value for memory from the input-output bus fed by the keyboard character encoding. At clock step T6, the parity would have been set properly and when the circuit checked for any Even parity, none should be found. However, we were getting an even parity condition leading to the Parity Check latch being set at T6. 

I realized that the -I/O to B SP pulse was also generated to complete an XIO Sense command, which was executed by the code just before the XIO Read command. This function sets the Stg Use flipflop to inhibit core memory use during the execute cycle of the instruction, when the device controller will have put bits on the input-output bus to indicate device status. At time T1, the values from the bus are latched into the SBR as if they had been read from memory. 

I was looking at an unrelated use of -I/O to B SP, one that occurs before the issues that trigger our false parity check during the subsequent XIO Read instruction. The double lobes don't matter, the just latch the same bits from the controller into the SBR twice in succession. Non Storage Use is high because it should be. 

When I looked at Non Storage Use statically as the machine sat in a wait state, Non Storage use was also high. This is by design. When the machine stops the storage use signal is turned off, then reactivated when we begin running again. Sadly it reinforced the perception that Non Storage Use was incorrect when instead it was behaving as designed. 

I leapt to a bad conclusion when I switched the scope trigger to the -I/O to B SP pulse. I missed the significance of the T1 signal and instead burned time debugging a problem that was not a problem at all. At least when I get back to the shop, I won't need to bench test the 7347 card nor continue to focus on the wrong issues. I can get back to the area that is malfunctioning and find the cause eventually. 

No comments:

Post a Comment