Tuesday, December 12, 2017

Detailed diagnostic work on failing 1401 memory, not yet repaired

IBM 1401 MEMORY PROBLEMS AT CHM

I went into the museum today to try to diagnose the memory problem afflicting the Connecticut 1401 system. The first 4000 words (the base core stack) is not working properly. No matter what pattern you attempt to write into that portion of memory, you get back all zeroes which triggers a parity error (Check error in IBM speak).

The memory stack consists of multiple planes of cores. Each plane has 4,000 locations each with a tiny circular core supported by wires through its center. Straight wires running in the X and Y directions are the primary support and are used to address an individual core by sending a current through one X and one Y wire simultaneously.

In addition to the straight X and Y wires, long wires are snaked through every core in the plane. A sense wire will detect if any core in the plane flips from one magnetic orientation to another, as that change of the magnetic field around the core induces a small pulse of current in the sense wire.

Each location in a 1401 consists of a character - binary coded decimal - constructed of the 1, 2, 4, 8, A and B bits. Another bit called the wordmark is used to delineate the end of variable length fields in the machine. Finally, the C or check bit is used to enforce odd parity for the word as a way of detecting errors in memory.

When a current runs down one of the X or Y wires, it is insufficient by itself to change the magnetization of the cores it runs through. However, the combination of the current from both an X and a Y wire is enough to flip the magnetic orientation. Thus, the current running down one X and one Y wire will not affect any core on either wire except for the one that is supported by both.

When the current through the X and Y wires runs in one direction, the core it addresses is flipped so that the magnetic orientation is considered a binary 1 value. If the current through X and Y flows the other way, the core will flip to the opposite orientation which is considered binary 0.

There is no way to detect which way a core is magnetized in this scheme except to deliberately flip it to a known state. The sense wire will see a pulse if the core was previously in the opposite orientation, otherwise not. Therefore core memory uses a destructive readout method. The core is flipped to binary 0 and the sense wire detects if its prior state was 0 or 1.

We don't want a memory that can only be read a given addressed core one time, so the destructive read is followed immediately by a write cycle to restore the state of the core. Sending current through X and Y to set the core to 1 will restore its value, but if the core is supposed to hold a 0, then we don't want to flip it on with our current.

Enter the inhibit wire snaking through all the cores in a plane. If a current is applied to the inhibit wire in the reverse direction of the current flowing through the X and Y wires, then the sum of the currents is reduced below the threshold that will flip the bit to 1. We inhibit any core that we want to be a 0 after the write cycle, otherwise the write cycle flips them to 1.

The heart of the machine is a cycle determined by magnetic core speed. In the 1401, it is a 10.5 microsecond interval, where the cores are all flipped to 0 at the beginning of the cycle, the sense wires are used to set a register with the prior state of the core, then the second half of the interval is used to write the cores to 1 with the register controlling the inhibit wire.

To get the symptoms we are experiencing, all 4000 core locations of all 8 planes must either be written to 0 because the inhibit wire is active or get sensed as 0. This would need to be a single point failure to cause the problem - for example it can't be the X and Y driver cards as there are one per plane, routed through a switch that determines which X or Y wire the current passes through.

Similarly it can't be a sense amplifier failure or bad inhibit drivers as there are one per plane for these also. What is common to all planes that could cause the entire stack to fail? We had to search for the cause.

We checked that all power supply levels were correct. We checked activation signals that cause the drive current to flow for a particular stack, 2000 character half of a stack, X and Y address. We tested the output of sense amplifiers; they were indeed 0. We did determine that the inhibit drivers were off for the bit that should be written to 1.

We tested one side of the driving current - the switched 8 x 10 array to select one of 80 parallel lines in each plane. We did see the current flowing in general but didn't look at the individual wire to see that it was different from the other 79.

We still need to check the other side - a switched 5 x 10 array that selects one of 50 parallel lines in each plane, these at right angles to the original 80 lines. Together these select the individual core in each plane which is the coincidence of one of the 80 lines and one of the 50 lines. We also need to verify that the direction is correct for X and Y lines.

The testing ran for many hours Monday, including a side by side comparison of the same signals on the correctly working German and the failing Connecticut 1401s. No differences were found so far and no anomalies. We will continue this on Wednesday.

No comments:

Post a Comment