Friday, July 3, 2026

Getting insight into what is happening during failure of 1130 MRAM doing reads - no more quantum effects

CURRENT SITUATION

When storing words of all 1 values (0xFFFF) into all of memory and then setting the machine to do continual storage reads looping through memory addresses, I will get sporadic parity errors where bit 2 fails to be set. Since the 1130 MRAM board has calculated parity based on that bits stored value of 1, the parity check fails and the machine stops.

I had been unable to see what was occurring because any time I put an oscilloscope probe or the logic analyzer on the incoming pin for setting the bit, the machine never failed. I finally dumped about over $500 into acquiring an active FET probe (used) which has less loading effect. 

MORE EXPERIMENTS WITH ACTIVE FET PROBE

I set the active probe to 10X attenuation and AC coupling, with a direct ground lead to the ground pin on the same SLT card as the incoming sense bit pulse. That worked as I wished - the rate of sporadic dropped bits didn't change thus my loading was not affecting the measured circuit. 

What I saw when a parity check was triggered was that the pulse which attempted to set the B register bit 2 was slightly different shaped than the others. It is there, but somehow it isn't flipping on the bit thus we get the parity check. 


The yellow trace is from the active FET probe recording a negative going pulse from my 1130 MRAM board which is intended to cause the SLT card to flip on the bit. Thousands prior to this successfully turned on the bit but this one didn't. I looked closer at the failed bit attempt as well as the successful one that came just before.

Pulse which fails to set the B register bit 2

Example of a pulse successfully setting the bit

The failed set involved a pulse that dropped about 2V whereas the successful ones show a pattern that comes from the capacitor discharging inside the SLT card and getting down to about .25V. The edge detector in SLT is a capacitor that is charged up through a resistor by an enable signal; then when the falling edge pulse arrives, it discharges the capacitor, resulting in the flipflop being turned on. 

The transistor on my 1130 MRAM board pulls the signal line down to ground. A pullup resistor on the SLT backplane causes the line to sit about +3V until the transistor fires to drop the line. We see the pin barely gets below 1V in the failing case but down to 0.25V when it succeeds. 

The transistor has a minimum beta of 25 and with a 1.6ma drive current it should be sinking over 40ma which should be sufficient for the activation of the IBM edge detector. The successful pulses reach a threshold of about 0.9V and then we see the capacitor delivering energy as it is discharged down to 0.25V. The failed pulse reaches about the same initial level but we don't see the capacitor delivering energy. 

I had a long, long chat with AI about what might be happening. Lots of speculation that didn't make sense, but I did 'listen' carefully and think about the phenomena being described. A good refresher on EE topics. Discussions about varying signals from other processing in the 1130 didn't make sense as this is a pure memory display loop.

As the AI pointed out, the SLT backplanes are designed for slow signals and higher frequencies such as from my board's fast pulse edge can ring across the backplane and cabling. The speculation was that ringing reflections could randomly cancel out my pulse edge if it arrives at just the wrong time However, the active FET probe is not showing any 'long term' ringing. 

When I started the discussion with the sporadic nature and failure rate of about 1 in 200,000 memory accesses, the AI asked me to look for a beat frequency of 1.388Hz but that is assuming it is always deterministically 1 in 200,000. Other suggestions assumed that the IBM flipflops are clocked, but they are really asynchronous circuits. Still, it did push me to think along many lines. Slight timing drift between my board generating the pulses and the clocks in the 1130 could line up bounce and dips on rails. 

I do remember seeing ground bounce in earlier versions of the board and that might still be an issue I need to address. I will add a braided ground strap from my board to the 1130 ground bus and see what effect that has. Another idea was to temporarily add a .1uF capacitor from ground to the -3V rail input of the SLT card and another .1uF capacitor between ground and the +6V rail input on the SLT card. This will absorb some high frequency bounce that might be caused by my fast pulse edges. 

The shape of the failed pulse looks to me as if the capacitor is not discharging. It has to be charged by the SLT circuit prior to my negative going pulse arriving. AI speculated that the resistors on the SLT cards have drifted high and are barely recharging the capacitor in time thus very random timing differences between memory cycles might eventually arrive just too soon to trigger. Bounce of the rail that charges the capacitor might also cause it to fail to charge sufficiently, it mused. 

This problem has bounced around the machine between the SLT cards. The B register is implemented across eight double width SLT cards, two bits per card. Previously the most common failure was bit 14, but recently it is bit 2. Thus it is unlikely that several cards have degraded to have exactly the same vulnerability. The only commonality I remember is that the errors always occurred on even numbered bits - those on row 3 of the SLT backplane. Thus cracked traces for power or ground could be a factor and only impact the cards that used that row. This may be coincidence however and the same issues might be possible on row 2 connections - odd bits. 

Bottom line, I can now see the failures where before they were masked by the loading of the probes. This should allow me to drill down to figure out what is causing my issues. I am making progress but don't yet have the smoking gun that pins down the exact cause. 

PLAN FOR NEXT OBSERVATIONS

When I next get to the workshop, I will add oscilloscope probes to the +6 and -3V pins of the SLT card and watch in AC mode for any activity on the rails that coincides with the failures. 


No comments:

Post a Comment