Tuesday, July 23, 2024

Still have an issue with bit 7 inhibit

HIGH 4K HAD ISSUES WITH BIT 7 INHIBIT BUT LOW 4K SEEMED GOOD

I had been getting persistent parity errors in the upper 4K, as soon as the memory address crossed the threshold into the upper section. I saw bit 7 and 11 as troublesome in that area. However, it did appear that the lower 4K was working properly, at least until the flakiness from the loose connector arose.

AFTER CONNECTOR RESEATED, BEGAN SEEING ISSUES IN LOW CORE

I would load some data in memory and relatively quickly see a parity error. Further, the data coming back out of memory did not match what I had attempted to write. For example, I tried to load this sequence of words starting at location zero:

  • 080B
  • 3001
  • 0000
  • 0804
  • 0805
  • 0806

As soon as I got to locations 3 or 4, the parity light came on. Where I tried to load 0804 into memory, what came back was 0904 and the left side had a parity error since it did not have the P1 bit set. This issue with bit 7 inhibit is repeatable at locations 3 and 4 but did not arise at locations 0, 1 or 2. 

Once again I suspect that we have excess write drive current for some types of addresses which the inhibit cannot overcome. I don't see a route that would cause bad transmission of an inhibit bit value only with certain address ranges, but that doesn't mean it can't exist.

Some areas that could be failing:

  • Diodes shorted thus combining currents
  • Leakage and coupling between adjacent drivers causing extra current
Bit 7 inhibit is handled by two separate cards, depending on whether it is above 4K or not. The existence of bit 7 inhibit failures in all 8K of memory rules out the inhibit cards themselves. 

The leakage between adjacent driver pins is a known failure mode of these core memory modules, as described in the IBM 1130 Field Engineering Maintenance Manual. Unfortunately, the fix for that is to replace the entire memory module since the leakage is occurring on the SLT backplane that hosts everything in the memory module. 

If the worst happens and there is such leakage, it does not mean that it is impossible to repair or tweak the backplane. It just means that IBM decided it was unreasonable to attempt that in the field when a spare part could be ordered and the factory could rework as they choose after the fact. 

This is going to be a vexing investigation. I am going to have to begin scoping the memory during these operations to try to spot a smoking gun. 

This is not an issue with individual cores or lines of cores. It is not an issue with the read process, because that is reliably flipping cores back to the zero state and recording the pulse on the sense line when they were previously set. If it was not resetting reliably, we would see bits read as 0 when they should be 1. Instead we are seeing bits read as 1 when we intended them to be zero. That can only happen during the write phase when the core flips to 1 but should have been inhibited. 

No comments:

Post a Comment