Tuesday, June 21, 2022

Test with the spare 3475 card in the memory module - parity errors shifting to other bits


I pulled out the card that I suspected was bad and put in a spare card of the same type. This position covers bits 4 & 8, whereas its original position handled bits 0 & 6. My original failure was always bit 0 flipping on erroneously. After moving the card up I had bit 8 flipping on in error. This is why I suspected the card and did a replacement. 


The nature of a parity error leaves memory corrupted, although parity is reestablished to make the new pattern have correct parity. Thus, when the location was mis-read with bit 0 as a 1, the count of 1 bits had to be odd but with this extra one, the bits plus the parity value were not odd anymore. 

Since memory is read destructively, all the bits are flipped to zero with sense amplifiers reporting those that had previously been a 1. Those 1 bits are saved in the B register and then in the second half of the memory cycle, the hardware writes back the value in the B register. This means that the process of mis-reading gives us corrupted data that is immediately written back. 

A memory cycle consists of T-Clock steps T0 to T7. The first half, steps T0 to T3, are the destructive read part of the cycle where the value read out is latched into the B register. The second half, steps T4 to T7, does the write of the B register to memory. When the CPU is storing new data in a location, the B register contents are replaced, discarding what was read out of the location, so that the new contents of B are written back. 

The parity checking occurs during the first half of the memory cycle, while proper parity for the word to be written is generated in the second half. If the parity from the read, the number of 1 bits in 8 bits of data plus one of the parity, is not odd then we have a parity error. The latch turns on in step T6, when the B register is written back to memory. 

If it stopped earlier we would have a completely zeroed word and both halves would calculate as even parity. We want valid parity on memory so we have to generate proper parity in the second half of the cycle and then stop after it is written back. 

Thus, words where we have a parity error are written back with good parity but incorrect contents since a flipped bit is what triggered the parity error in the first place. I wanted to restore the multiply-divide routine and its data areas to the correct values, which I did by stripping down the load file to just those locations and letting my Memory Load Tool toggle it in.


The test ran for almost two minutes and finished with a normal completion wait (3003). This validates the hardware for multiplication and division, finishing the checkout of all the instructions. I decided to run it a second time, which I started but it stopped with a Parity Stop!

The bit being flipped on was bit 2 this time. It did this consistently. The card that handles bits 2 & 3 is up a level in B5 rather than B6 where I swapped the card. This is perplexing. Something more subtle is happening than a bad sense amplifier. 


I corrected the value in the core word and reran the test a few times, always getting a bit 2 turned on to trigger the Parity Stop. More interestingly, it was always the same location where this happened. It is always executing an EOR instruction, long format, indirect. The failure occurs in fetching the second word of the instruction, in other words during the I2 cycle. 

I remember that this was the same place where bit 8 was going on before I swapped the card, the second word of the EOR instruction at location 0D36 and 0D37. Very curious. 


In order to investigate this, I need to use the IBM 1130 Simulator to load the CPU Core Test diagnostics, create a load file and have it entered in the core memory of this 1130. That will shake down the memory and give me a better idea of what kind of error lurks there.

If this is an issue with that one word of core, it is a very strange error. Earlier I had experienced the parity stop with a simple loop at an entirely different address, thus I suspect this is not associated with one address. That would be very unusual since the failures happened on different core planes - bits 0, 2 and 8. 

I need to ponder the circuitry of the memory to see if I can find any common factor. There are steering diodes that handle the addressing, the inhibit and the sense operations, so that the same wire can have current flowing in different directions at different times of the cycle. A bad diode could do funky things, but the core tests will help flag this. 

No comments:

Post a Comment