BIZARRE ISSUE WITH MEMORY REVERTING TO OLD VALUE
I was running the memory diagnostic with parity stops disabled - the parity issue is a separate problem I am troubleshooting - and the loop advancing through memory was never ending. When I looked closely I found that the code was adding 1 to a stored address in a memory word but would seem to skip back to a prior value.
It began at some point with an address around 0481 and would step up to 048F before returning to 0481. If I stored a higher address in the memory word, for instance 0491, it would advance to 049F and then return to 0491. I hand stepped the code and it appeared to be correctly advancing the address to 04A0 and stored it, but then jumped back to 0491.
I then manually displayed and loaded the word where the address was stored - this in memory location 02E7 - and saw very puzzling behavior. I would display the value in 02E7 as 0491, then update it to 04A0. The first time I displayed the content of 02E7, I retrieved 04A0 but then when I displayed the same location again, I saw 0491 returned!
Memory is not a push down stack. Writing a value in a location should totally replace the old value. If it reads as the new value once, it should always read back as the new value unless I rewrite it explicitly back to the old. It should have no way to 'remember' the prior value.
WHAT MIGHT CAUSE THIS?
I did some thinking about mechanisms that might lead to this behavior. I stopped when I had a path that could deliver the old value for a read request, not diving any deeper to find a means that it could accomplish the feat, simply looking at a pathway to get that old value back.
A memory cycle in the IBM 1130 is based on the way that core memory works. Reading the content of a word involves destructively setting the word to all zeroes, detecting which bits were previously set to one. Those bits that had stored a one before the destructive read will cause the core memory system to deliver a pulse into the 1130 where the Storage Buffer Register (SBR) has that bit flipped to be one.
The second half of the memory cycle will write whatever is in the SBR into the cores. It actually tries to flip every bit of the word to one, but inhibits the flip if the corresponding bit in the SBR is a zero. Thus core memory erases the contents, transferring its value into the SBR and then rewrites the word from the SBR. If the 1130 wants to put a new value in a memory location, it first reads the current value into the SBR and then updates the SBR before writing it to the machine. Every memory cycle has a read followed by a write.
Thus, the first possible pathway is if somehow the SBR is updated after a successful read, putting a different value and then writing that different value back. This would let my first read of 02E7 successfully pull the value 04A0 into the SBR but somehow during the write it is changed to 0491 and written that way. The next time it is read back, 02E7 returns 0491 and not the value we thought we wrote into it. I don't yet have a way that the 0491 value would be retained in the 1130 and forced into the SBR for the write cycle, but this is a way that reversion could occur.
Another possibility is if the address lines running into the MRAM chip on my board were to change during the cycle, reads might actually be fetching values in two different words. Thus, if the address 02E7 is sometimes seen as 06E7, two different values could be retained and returned. I don't understand how the first read of 02E7, after we updated it to 04A0, would address one word and subsequent reads would address the the other word location, but this is a way I could pull a previous value from a location.
One final possibility is that the chip itself is defective. It might allow the value in 02E7 to slowly revert to the old value even though it held an updated value for a short time. It might misaddress locations in the chip much like my speculation in the prior paragraph so that two different locations are being retrieved at different times.
I could watch for the first speculation by using the logic analyzer to record the SBR bits, let the machine get into the loop where it reverts, then examine the trace to see if the write cycle is changing the value of the SBR.
I could watch for the second speculation but only if the address bits change in the 1130 in the Storage Address Register (SAR). That would happen if I trace the 13 address bits of the SAR and record while the machine is in the reversion behavior. However, if the address bits are flakey on my 1130 MRAM board and not from the SAR, then I wouldn't catch this issue with the logic analyzer.
I will reflow the address bit soldering points on my board to eliminate the chance that the cause might stem from a poor connection. However, to test for errors inside the MRAM chip itself, I might need to manually do sequences of load and display operations on the 1130 while recording the SBR bits.
No comments:
Post a Comment