Monday, April 13, 2026

1130 MRAM board effort continues - not quite there yet

SWAPPED CARDS BUT ISSUE REMAINS AT BIT 14

The issue is somewhere from the pins of sockets L2/L3 in gate B compartment B1 of the 1130 out to my 1130 MRAM core memory replacement board. I continues to be intermittent, but fails on the order of once every few hundred accesses. I also find that bit 10 continues to fail but that might be one time for every few dozen times we stop with a bit 14 error. 

I resoldered all the leads and components on my PCB that are involved in handling bit 14 - no change in the results. I examined the backplane where the card for that bit plugs in (L3) and noticed that a wire wrap connection was made from the T4 cable connector on top down to the pin where the sense pulse for bit 14 is connected to the card. That connection should already exist on the backplane, so the added wire seems redundant unless it is fixing a defect in the board. If there was a defect it might have spread a bit. 



Sunday, April 12, 2026

Continued confusion over analog issue with the 1130 MRAM board BUT progress made

SLOWING THE EDGES OF MY SENSE PULSES

I could apply a low pass filter to remove high frequencies from the signal so that the pulse is rounded, to see if that will correct for whatever the heck is going wrong on the Solid Logic Technology (SLT) circuit board that is implementing the Storage Buffer Register (SBR). That card expects a falling edge from my board as a sense pulse and should turn on the SBR bit when it receives the pulse. However, once every few thousands to millions of times, it doesn't quite turn on. 

Hooking a scope probe to the pin seems to mostly tame the beast. Thus if I could add the same impact to the pins at the backplane, I might be able to achieve consistent reliable operation. A complication is that the signal is directly routed from my board to the pin where it enters the SLT card, so I have no reasonable way to insert a series resistance. This means that a typical RC low pass filter isn't practical. 

I did develop a load that will look similar to the probe - a simplified equivalent circuit ignoring cable inductance, cable capacitance and the complexity of the actual equivalent circuit. I then increased its low pass behavior hoping to slow the edges a bit more. If that does solve the analog issue plaguing the memory substitution project, I will see no parity error stops at all and fully correct readback of memory at all times. 

The load circuit is a resistor and capacitor in series from the pin to ground. I have 18 pins that need the loading applied, if this works, which I will support with a teeny PCB that slides over the pins in place on the SLT backplane. The SBR card is a double width card that implements two bits of the register. 

I put together a couple of the circuits and hooked them to the two bits that seemed to be the most problematic. It had no effect at all. I think this was a false trail. 

SWAPPING CARDS TO SEE IF THE BITS THAT FAIL MOVE

The SBR register is implemented with several SLT cards - IBM type 5804619 - each card implementing a pair of bits. Eight cards are installed in gate B compartment B1, in slots B2/B3, C2/C3, D2/D3, E2/E3, H2/H3, J2/J3, K2/K3 and L2/L3. The most common bit error is bit 10, which resides on the card in J2/J3 but there were also some errors on bit 13 which is K2/K3. I will swap these with the cards in C2/C3 and D2/D3. If the failures move to those bits it will point at the card, but if the failure does not move then the issue is in the cabling, backplane, or my design. 

The problem with bit 10 disappeared, and in fact the only sporadic drop is with bit 14 which is one of the cards that I did NOT move around. Perhaps the rodent urine atmosphere layered a bit of corrosion on some contacts between the cards and the backplane, which I wiped off by the removal and insertion. I will try to swap L2/L3, which implements bits 14 and 15, with the card in E2/E3 in the hope that this resolves any added resistance that was plaguing the circuit operation. 

HAND TOGGLED CODE TO VERIFY THAT DATA IN ALL WORDS MATCH THEIR ADDRESS

My loop wrote the address of each word as its data, then the new loop read and compared the contents of each word with its address as a means of catching any mangling of data or addressing defects. The program ran to successful completion several times. 

USING STORAGE DISPLAY HARDWARE FUNCTION TO FIND BIT DROPS

The last work I did today was to let the machine loop continually reading every word of memory, unless a parity stop is forced if a single bit gets dropped. I know I am not dropping pairs of bits because of the loop that verified memory word contents match the address, but I will get stops where bit 14 is dropped. I can easily tell that is the case because the Storage Address Register (SAR) should always match the SBR due to what I wrote throughout memory. In each stop, bit 14 was missing in the SBR but part of the address in the SAR. 




Thursday, April 9, 2026

Collecting data with logic analyzer on 1130 MRAM parity stops - tantalizing hints

ERROR STOP WE ARE TRYING TO CAPTURE

The IBM 1130 employs odd parity to detect core memory errors, where each 8 bit half of a memory word has an associated parity bit to make the count of bits with a 1 value be odd. If the count is not odd when retrieving a word, the machine stops with a Parity Check error. My replacement for the core memory has a much more reliable memory technology and thus does not bother storing parity bits with each word. It instead generates the correct value for the parity bits as a word of data is read.

I can load and display memory several ways using my 1130 MRAM memory replacement board, with no detected errors. However, if I load the IBM high core memory diagnostic program and run it, the program encounters parity stops at predictable, repeatable points in the code. This happens after the code has successfully run through the same point more than a thousand times. 

I could also repeatedly see the error when I store and then display words at certain addresses. Anything with data having a 1 in bits 8, 12, 13 and 14 would come back with bits 12, 13 and 14 as 0 instead of 1. Because an odd number of one bits had been removed, the parity bit would have had to change to ensure odd parity, but it corresponded to the value when 12, 13 and 14 were correctly read as 1 bits, thus triggering the parity error. 

LOGIC ANALYZER SETUP

I used my DSLogic USB logic analyzer because it is very portable and easy to connect, but it is limited to 16 channels of data capture. That does result in iterative changes to use of the probes as I narrow in on a problem. Of course, my initial 16 channels of data need to provide some clue to where to next look. 

I need a trigger signal, which is the +Parity Stop signal when the error is first recognized. I will capture the two parity bit flipflop values and the signals generated when a halfword plus the parity bit has even parity. The two sense signals from my board whose pulse sets a 1 value into the parity bits, -Sense 16 and -Sense 17 will be recorded. Rounding this out will be the +Storage Read signal to place the activity in context of a memory cycle. 

The other eight channels will be recording sets of data bits from the SBR or address bits from the Storage Address Register (SAR). Since there are 16 data bits and 13 address bits, it will take at least four runs to record all of them in conjunction with the eight channels from the paragraph above. 

OSCILLOSCOPE USED TO LOOK AT SENSE PULSES FOR BITS 12, 13 AND 14

The four channel scope was set up to trigger on the +Parity Stop signal in one channel and display the -Sense Bit 12, -Sense Bit 13 and -Sense Bit 14 signals on the other three channels. I wanted to confirm whether the 1130 MRAM board is emitting pulses to set the bits to a 1 or not. 

Indeed, I could see pulses coming from my board but the SBR bit was not being set in the 1130. I experimented with longer pulses, with no effect. I then tried separating the pulses with 40 ns pulses within each 75 ns state machine step, but the results became worse. 

RECOGNIZING THAT THE SAME OLD UNEXPLAINED ANALOG ISSUE PERSISTS

 I rebuilt my board entirely from the prior design that seemed to have too much ground bounce, thus encountering spurious retriggering that produced sense bit pulses at improper times. I continually strengthened the ground plane, power plan and size of the power connections within my board. I drove the pulses with very fast discrete transistors controlled by the logic chips. I varied timing and spread out the bit setting. Ultimately, none of these changes gave me a memory that was reliably and consistently working. 

There are constraints on how early in a read cycle that I can set bits in the SBR; I believe I must wait at least 450 ns so that I am in clock step T1 of the four clock step read cycle (T0 - T3), but also must complete all the bit settings before reaching clock step T2 because the 1130 may begin gating the results in the SBR to other registers in the system within that step. 

The two parity bits do not need to be emitted within that constraint. They only need to be set by the end of the read cycle since they are interrogated midway through the ensuing write cycle at clock step T6. This still requires the 16 data bits to be pulsed in that interval. 

This gave me a tight window of 450 ns in which to set 16 bits, just over 28 ns if each bit were set separately. The Solid Logic Technology (SLT) family used in the IBM 1130 is 30 ns nominally, making this impossible to achieve with a single bit in a step.  

If I pulse two bits in each FPGA step, then I have just over 56 ns for the pulse, which should be long enough. However, since I don't understand what is happening with the analog behavior that leads to this issue, I am not satisfied that such a change is sufficient. 

MUSINGS

Each pulse sent to the 1130 is in fact a current sink from the IBM 1130 through an open collector transistor on my board. The power rail on my board is not involved in this current flow just in the minor 1.5ma drive current to the transistors as well as minor switching current in the 74LVC08 quad 2 input AND gate that is delivering the 1.5ma to each transistor. 

The ground plane of my board is joined to the ground bus of the 1130 system with stranded 18 gauge wire which should keep my ground plane from straying too much from the 1130's however my hunch is that this is where the problems arise. When I watched pulses that failed to set the SBR bits, the pulses didn't make it all the way to zero volts on the scope. They seemed to bottom out higher, which could be caused by a voltage differential between the ground planes. 

With an inductance of 800 nanoHenry and an effective resistance at 2.2 Mhz of approximately 60 milliOhms, the resulting impedance is around 20 ohms, giving me a voltage drop of almost one-half Volt on the ground conductor for those high frequency signals if they were a continuous train. 

This is quick and dirty, but it is consistent with the scope observed pulse bottom rising above 0V. The germanium diodes in SLT circuitry have a voltage drop around 0.3V, thus I could easily drive up the pulse bottom so high that it fails to switch the transistor in the register. Depending upon what other pulses were produced close in time to the affected one and what ringing might occur in the ground wire, I could see that it would be possible to get instances where it fails to set the bit. 

I still don't see how specific data patterns cause the failures. These are spread across three SLT cards in the B gate compartment B1 at H2/H3, K2/K3 and L2/L3, across two of the cables between my board and the 1130 and across multiple parts on my board. 

DOING MORE INVESTIGATION

After tightening up the FGPA code even further, I found that the 1130 would run for 5-15 seconds before encountering a parity stop. I noticed that bits 10 and 13 were the most likely to not register in the SBR when they should be 1. 

I then hooked the scope up to the -Sense Bit 10 and -Sense Bit 13 pins on the 1130 to watch the signals when a failure tripped a parity stop. Interestingly putting the scope probe on one of the pins dramatically reduced the rate of that bit failing to set. Putting probes on both led to the machine running 25 to 30 seconds, looping through memory successfully, prior to hitting a parity stop. The machine executes almost 277, 778 reads and writes per second, thus the failure rate was around once per 7 million accesses. However, only words with a susceptible bit set to 1 would cause a parity error, thus the actual error rate is closer to 1 in 3.5 million accesses.

Close, so close, but far from acceptable when the computer may run for many hours to days. However, the fact that putting leads on the backplane pins lowers the error rate is a tantalizing clue. The scope doesn't show ringing on the signal when observed at the pin, but that may be due to the impedance of the scope probe - its capacitance and resistance. The input resistance should be around 10 megohms and the capacitance perhaps 10 picofarad. For 100 MHz signals, the impedance is closer to 100 ohms and at 1MHz the impedance is still around 10K ohms. 

The effect of the frequency dependent impedance is that the probe absorbs the higher frequencies more than lower, rounding the pulses. It acts to slow the rise and fall times of the pulses, which appears to improve the reliability of the memory. Thus I may need to develop a filter to produce similar but larger rounding of the pulses. 

Tuesday, April 7, 2026

Finishing the build of the substitute controller card to hook a 1627 plotter to an IBM 1130

BATTLING TO INSTALL THE GOLD CONTACTS THAT WILL FORM THE SLT SOCKETS

This printed circuit card fits into the IBM 1130 backplane in card compartment A-C1, where IBM would have plugged in a double width Solid Logic Technology (SLT) card 5806223 to provide the controller logic to interface the plotter to the IBM system. The backplane has two slots vertically that accept the card, each having twin rows of 12 gold pins that fit into the sockets on the SLT card. 

My card has springy contacts that will make good contact with the gold pins on the SLT backplane. I used gold plated RF shield contacts to build the card, each of which being soldered onto a pad on the card edge. These are very challenging to install, since they are small and hard to hold while soldering them down. I used solder paste, my heat table and hot air rework gun to solder them, having placed them onto the solder paste by hand. 

These move around a bit, not anchoring perfectly onto my PCB pads. Eventually I got once side set up with contacts, although not as neat and orderly as I would like. 

A few of the pads have no connections to circuitry on the card, so I left the contacts off in those spots. Once I checked for shorts and dealt with any excessively objectionable positioning, I moved on to the other side of the PCB since each SLT socket is two sides of 12 pins each. 

My first try ended in the card sliding off the heated table and scattering contacts whose solder was still molten. After a suitable break, I went back and completed the construction of the other side. 


I had 3D printed covers that fit over the contacts and form the shape of an SLT socket on a card. These slide over guide plates as the card is inserted and position the card socket into the backplane over the pins that form the complementary sockets. 

IBM cards have a complex shape to the gold spring connectors that line up with a curved shaped barb on the end of the gold pins from the backplane, forming a mechanical lock to hold the card in place unless it is tugged out. I can't manufacture parts to do the job, so I need a different means to hold the SLT card down in the backplane when it is inserted. The spring contacts on the card will cause it to push back out of the backplane socket otherwise. 

The solution to holding my card down centers on the guide plates that are on the SLT backplane. The card sockets have a notch on them that slides over one side of a guide plate, with the next card's socket fitting on the other side of the plate. 

A guide plate is circled

Pointing at one notch

It is hard to see the relationship based on the way that IBM drew the card socket and the existing pictures from IBM manuals. 

The drawing above shows how the two adjacent cards slide over the guide plate, as seen from the top looking down into the backplane and seeing the edges of the cards. My card is a double width one, thus it would have a guide plate in between the two sockets on the card, much like above, as well as notches on the outsides of the two sockets where other guide plates would fit. 

Adding friction or clamping on the guide plate from my card would seem the easiest solution. I have to make detailed measurements of the guide places and SLT card sockets in order to pick the point where the card should be locked in place, then work out the mechanism to add to my card that will hold it there, ideally by friction rather than a complex clamp. 

Chasing down 1130 MRAM memory issues

PARITY ERROR INVESTIGATION HINTS AT 'REGRESSION' CAUSE AS WELL

I whipped up a short program to enter into the 1130 that will fill memory from location 0010 to the end of storage with values that match the address of each word. Then I can see if there is any sort of addressing defect because the value displayed at a location should be identical to its address. 

Using the Storage Display switch I had the machine loop reading memory. I did spot one design weakness that I quickly corrected but continue to see parity stops at very repeatable locations. First, I will discuss the weakness and how I corrected it.

RACE HAZARD IN PARITY GENERATION LOGIC WHEN SETTING PARITY BITS

A chain of exclusive OR gates produce the proper parity bit value so that each halfword has an odd number of 1 bits. The logic to turn on the parity bits in the 1130 uses sense pulses, just like the pulses that turn on the 1 bits in the Storage Buffer Register (SBR) for any bit of the word that is a 1. I was updating SBR bit 15 at the same time as I was setting the P1 and P2 parity bits, which lead to some wobbling of signals in the 1130 logic before it steadied out. 

Because the 1130 exclusive OR chain was changing because of bit 15 being set at the same time as I was latching in P1 and P2, the logic in the 1130 was susceptible to sometimes considering parity for a halfword to be even (an error) although when the halfword plus its parity bit is considered, the result would be odd. I simply separated the setting of SBR bit 15 from the subsequent setting of P1 and P2 to eliminate this timing sensitivity.

What I noticed was that the parity stops always had a word where bits 12, 13 and 14 should have been one bits, but were read as zero. Look at the recorded stops below. The second row of lights is the Storage Address Buffer (SAR) which is the address from which we read the word, and the third row is the SBR showing the memory contents. 






I realized that the supposed memory regression I reported in the last blog post was just another case of the missing bits 12, 13 and 14. That is, it would appear that the loop reverted back to a value that had its last four bits of 0001 when it should have been 1111 when the logic was incrementing the address in the diagnostic program. 

LOOKING FOR SOMETHING COMMON THAT MIGHT CAUSE THIS SYMPTOM

This is always a word fetched from an address where the SAR has bits 8, 12, 13 and 14 set to a 1 bit value. Other bits may be set to 1 as well, but these are common to every captured parity stop. The word being read into the SBR drops only bits 12, 13 and 14. Because an odd number of bits are missing compared to what was coming out of the MRAM chip, the parity bit is incorrect which the 1130 detects and flags as a parity error. 

Bits 12, 13 and 14 must be coming out of memory correctly and flowing through my exclusive OR chain to produce the proper P2 parity bit, which is why P2 is correct for the case where those bits are set to 1. However, I can tell they are not getting into the SBR since the bits are still 0 after the read cycle ends. 

My FPGA logic in the 1130 MRAM board sets bits 12, 13 and 14 in the same state machine step, although it emits separate signals on discrete traces thus this can't be a solder issue. The step that handles these bits is no different than the other steps that handle bits 0-2, 3-5, 6-8, 9-11, 15, or the two parity bits. Nothing should cause the FPGA to sporadically malfunction only on this step and only for a location and data pattern that as bits 8, 12, 13 and 14 set to 1. 

A single logic gate passes pulses from bits 12, 13, 14 and 15 to the 1130 when the discrete signals for each bit are activated by my FPGA. A bad chip should fail to drive all four, not just bits 12, 13 and 14. Further, this only fails when bit 8 is also on (or addressing memory where the SAR has bit 8, 12, 13 and 14 turned on). The chip in question has no connection to bit 8, which is handled by another chip. 

In the 1130, bits 8, 12, 13 and 14 of the SAR and SBR are implemented across multiple Solid Logic Technology cards thus unlikely to have an issue common to the SAR or SBR that might produce these results. 

Yellow circles around parts generating bits 12, 13 and 14

The parts on the PCB don't have common elements that could explain how they would fail, as you can see in the graphic above. More significantly, the logic supporting bit 8 is on the left side of the PCB away from all of these parts. Similarly as seen in the graphic below, the SAR bits for 8, 12, 13 and 14 are separated around the memory chip.

I just don't see anything on the board that would account for issues arising only when bits 8, 12, 13 and 14 are turned on in SAR and the memory word but bits 12, 13 and 14 are not set in the SBR. It is time to go back to the oscilloscope, watching the pulses for SBR bits 12, 13 and 14 while monitoring and triggering on parity stop. Perhaps I can see something suspicious on the signal lines that could explain the results. 

Sunday, April 5, 2026

Odd error detected with memory, possible issue on the 1130 MRAM board

BIZARRE ISSUE WITH MEMORY REVERTING TO OLD VALUE

I was running the memory diagnostic with parity stops disabled - the parity issue is a separate problem I am troubleshooting - and the loop advancing through memory was never ending. When I looked closely I found that the code was adding 1 to a stored address in a memory word but would seem to skip back to a prior value. 

It began at some point with an address around 0481 and would step up to 048F before returning to 0481. If I stored a higher address in the memory word, for instance 0491, it would advance to 049F and then return to 0491. I hand stepped the code and it appeared to be correctly advancing the address to 04A0 and stored it, but then jumped back to 0491. 

I then manually displayed and loaded the word where the address was stored - this in memory location 02E7 - and saw very puzzling behavior. I would display the value in 02E7 as 0491, then update it to 04A0. The first time I displayed the content of 02E7, I retrieved 04A0 but then when I displayed the same location again, I saw 0491 returned!

Memory is not a push down stack. Writing a value in a location should totally replace the old value. If it reads as the new value once, it should always read back as the new value unless I rewrite it explicitly back to the old. It should have no way to 'remember' the prior value.

WHAT MIGHT CAUSE THIS?

I did some thinking about mechanisms that might lead to this behavior. I stopped when I had a path that could deliver the old value for a read request, not diving any deeper to find a means that it could accomplish the feat, simply looking at a pathway to get that old value back.

A memory cycle in the IBM 1130 is based on the way that core memory works. Reading the content of a word involves destructively setting the word to all zeroes, detecting which bits were previously set to one. Those bits that had stored a one before the destructive read will cause the core memory system to deliver a pulse into the 1130 where the Storage Buffer Register (SBR) has that bit flipped to be one. 

The second half of the memory cycle will write whatever is in the SBR into the cores. It actually tries to flip every bit of the word to one, but inhibits the flip if the corresponding bit in the SBR is a zero. Thus core memory erases the contents, transferring its value into the SBR and then rewrites the word from the SBR. If the 1130 wants to put a new value in a memory location, it first reads the current value into the SBR and then updates the SBR before writing it to the machine. Every memory cycle has a read followed by a write. 

Thus, the first possible pathway is if somehow the SBR is updated after a successful read, putting a different value and then writing that different value back. This would let my first read of 02E7 successfully pull the value 04A0 into the SBR but somehow during the write it is changed to 0491 and written that way. The next time it is read back, 02E7 returns 0491 and not the value we thought we wrote into it. I don't yet have a way that the 0491 value would be retained in the 1130 and forced into the SBR for the write cycle, but this is a way that reversion could occur. 

Another possibility is if the address lines running into the MRAM chip on my board were to change during the cycle, reads might actually be fetching values in two different words. Thus, if the address 02E7 is sometimes seen as 06E7, two different values could be retained and returned. I don't understand how the first read of 02E7, after we updated it to 04A0, would address one word and subsequent reads would address the the other word location, but this is a way I could pull a previous value from a location.

One final possibility is that the chip itself is defective. It might allow the value in 02E7 to slowly revert to the old value even though it held an updated value for a short time. It might misaddress locations in the chip much like my speculation in the prior paragraph so that two different locations are being retrieved at different times. 

I could watch for the first speculation by using the logic analyzer to record the SBR bits, let the machine get into the loop where it reverts, then examine the trace to see if the write cycle is changing the value of the SBR. 

I could watch for the second speculation but only if the address bits change in the 1130 in the Storage Address Register (SAR). That would happen if I trace the 13 address bits of the SAR and record while the machine is in the reversion behavior. However, if the address bits are flakey on my 1130 MRAM board and not from the SAR, then I wouldn't catch this issue with the logic analyzer. 

I will reflow the address bit soldering points on my board to eliminate the chance that the cause might stem from a poor connection. However, to test for errors inside the MRAM chip itself, I might need to manually do sequences of load and display operations on the 1130 while recording the SBR bits. 

Thursday, April 2, 2026

Trying to track down the parity stop failure using my 1130 MRAM board while running IBM's core memory diagnostics

RECAP OF THE FAILURE TO BE STUDIED

When running the IBM 3B1 diagnostic - core memory testing with the code residing in low addresses - the machine will stop with a parity error at the same location retrieving from the same address. The program had executed the loop more than a thousand times before the error occurs. The machine is executing a store instruction in indirect mode, where it puts the data in the location stored in the memory word, rather than into the memory word itself. 

The machine has just fetched the address from the memory word but hadn't begun to store the contents of the accumulator register into that address. The data that was fetched into the Storage Buffer Register (SBR) and the two parity bits don't agree under the odd parity rules of the system. Since my card generates the two parity bits on the fly from the retrieved word, this should not happen. 

TRACING STATE LEADING TO THE FAILURE

I set up the scope to trigger on the setting of a parity stop, while monitoring various other signals such as + Storage Write and the actual output of the parity testing logic in the 1130. The goal is to see what seems incorrect or hints at other places to look, with the ultimate aim of seeing the definitive root cause of the failure. 

The eight data bits of each half of the SBR are cascaded through exclusive-OR (XOR) gates so that we know if there is an even number of bits that are set to 1. The parity bit associated with those 8 data bits is used to ensure that the total 1 bits in the 8 data and 1 parity bits come to an odd total. When the machine fails, it is because one of the halfwords and its parity bit have an even count of 1 bits. 

I consistently saw the diagnostic produce an error where the right halfword (bits 8 to 15) had the mismatch with its parity bit P2. When I put the scope leads on P2 and the output of the XOR cascade for that halfword, the mismatch is now with the first halfword and its parity bit P1. Moving the scope leads to that side and rerunning, I found that the error flipped back to the second halfword again.

I doubt this is a quantum phenomenon where an observed condition works differently than when it is not watched. However, something must be caused by high impedance scope leads attached to one of the nets. I wonder what will happen if both sides are observed at the same time. Will the machine run without parity stops? Seriously, I do need to observe more points simultaneously, which means I have to move over to a logic analyzer.