Thursday, April 2, 2026

Trying to track down the parity stop failure using my 1130 MRAM board while running IBM's core memory diagnostics

RECAP OF THE FAILURE TO BE STUDIED

When running the IBM 3B1 diagnostic - core memory testing with the code residing in low addresses - the machine will stop with a parity error at the same location retrieving from the same address. The program had executed the loop more than a thousand times before the error occurs. The machine is executing a store instruction in indirect mode, where it puts the data in the location stored in the memory word, rather than into the memory word itself. 

The machine has just fetched the address from the memory word but hadn't begun to store the contents of the accumulator register into that address. The data that was fetched into the Storage Buffer Register (SBR) and the two parity bits don't agree under the odd parity rules of the system. Since my card generates the two parity bits on the fly from the retrieved word, this should not happen. 

TRACING STATE LEADING TO THE FAILURE

I set up the scope to trigger on the setting of a parity stop, while monitoring various other signals such as + Storage Write and the actual output of the parity testing logic in the 1130. The goal is to see what seems incorrect or hints at other places to look, with the ultimate aim of seeing the definitive root cause of the failure. 

The eight data bits of each half of the SBR are cascaded through exclusive-OR (XOR) gates so that we know if there is an even number of bits that are set to 1. The parity bit associated with those 8 data bits is used to ensure that the total 1 bits in the 8 data and 1 parity bits come to an odd total. When the machine fails, it is because one of the halfwords and its parity bit have an even count of 1 bits. 

I consistently saw the diagnostic produce an error where the right halfword (bits 8 to 15) had the mismatch with its parity bit P2. When I put the scope leads on P2 and the output of the XOR cascade for that halfword, the mismatch is now with the first halfword and its parity bit P1. Moving the scope leads to that side and rerunning, I found that the error flipped back to the second halfword again.

I doubt this is a quantum phenomenon where an observed condition works differently than when it is not watched. However, something must be caused by high impedance scope leads attached to one of the nets. I wonder what will happen if both sides are observed at the same time. Will the machine run without parity stops? Seriously, I do need to observe more points simultaneously, which means I have to move over to a logic analyzer. 

No comments:

Post a Comment