Saturday, July 6, 2024

Slightly frustrated that I can't imagine a fault in core memory that fits the observed symptoms

EARLIER PARITY ERROR HAD A CLEAR PATTERN AND CAUSE

When I first worked on the machine to diagnose its parity errors, I found that anytime the address bit 9 had a value of 1, the result from memory was a full word of zeroes with both parity bits incorrectly also at 0. This is what you see when core was not actually read. The Y axis lines through the core are selected by bits 9 to 15, with bits 9, 10 and 11 selecting one of eight groups of Y axis lines; bits 12 to 15 select which of the 16 wires in each of those groups is to be activated.

I checked the card responsible for selecting the four groups of Y where bit 9 was a '1' value, and found that it was not properly inserted into the card socket. Once I pushed it into place, the parity error went away.

NEW ERROR IS AN ENIGMA

Errors of this type, where they occur at regular address ranges, point at specific cards, wires, diodes and other components that logically could cause the failure. The new error, however, both fits and does not fit that profile.

Now, when we are accessing the upper 4K of the 8K core memory, we get parity errors at certain address ranges. The X axis lines through core are selected by bits 3, 4 and 5 accessing one of eight groups of X wires, with bits 6, 7 and 8 picking which of the eight wires in a group is actually used. 

Thus, if the flaw was in the cards, diodes or wires where bit 3 is a '1', it would make the machine fall over at many addresses beginning with 0001 0000 0000 0000 but that is not what happens. In addition to bit 3 being on, bit 9 is (again) a '1' when the errors occur. 

Bit 9 affects the Y selection for all addresses, both in the upper 4K and the lower 4K, so a flaw in that would cause parity errors in the lower 4K. Bit 3 is part of the X selection and thus should have no interaction at all with Y address components such as the ones for bit 9 = '1'. 

Finally, when a parity error is signaled, I see valid bit patterns in the data word and parity bits, such that I don't see anything that is a true parity error. How a flaw in detecting parity would be specific to certain range of addresses is a mystery to me. 

LOOKING FOR AN EXPLANATION FOR THE FAILURE THAT MAKES SENSE TO ME

I really want to understand what type of fault would give the symptoms I am experiencing. I prefer to understand why instead of just finding and changing something by blind luck. I will keep at this. 

No comments:

Post a Comment