Wednesday, December 27, 2017

Working on reader stop problem on German 1401 at CHM

CHM RESTORATION OF IBM 1401

Reader stop during Read/Print operations

The German machine has exhibited an intermittent problem for quite a while, where the 1402 reader suffers a reader stop when programs are repeatedly reading and printing. 

This was first discovered with a program that used the combination operation code 3, which is a combination of the 1 (read a card) and 2 (print a line) operations It prints a line from the print buffer at 201 to 232 then reads a card into the card buffer at 1 to 80. 

A program can use the 3 op code to read a card, move the data from 1-80 to 201-280 and then go back to issue another 3 to do it again. This tight loop would work listing some small number of cards then experience a reader stop.

We discovered that it is not only the 3 op code that fails, because a tight loop of a 1 (read), a move from 1-80 to 201-280 then 2 (print) will also fail after some number. 

Stripped to its essence, we found that a program of 1, 2, 1, 2, 1, 2, 1, Halt will fail on the last read most times it is executed. We use that code to try to debug the fault.

The logic to read a card has a section that requests a card feed but will block that temporarily if a number of conditions exist. One of these is a signal that the printer is actively using core memory to transfer the contents of 201-232 to a dedicated core memory called a print buffer. 

While the printer is moving data from main core memory to its dedicated buffer, it is triggering reads of locations 201 to 232. This could interfere with a card reader if a card is moving through the machine. That is because the card reader requires the 1401 to read locations 1 to 80 once for each row of the card. 

The reading of locations 1 to 80 while cards are reading is called a scan and each scan of 80 columns is triggered by the card moving past a row. The physical movement of the card determines when this happens, or actually the timing wheel inside the reader. 

If the core memory is busy doing printer transfers, we lose the chance to scan the 80 columns and thus any holes in that row of the card are lost. The correct action if that happens is to cause a check condition - reader stop - because data integrity was lost. 

In the code 1, 2, 1, 2, 1, 2, 1, Halt you might naively assume that each instruction is completed before the next one is executed, but print operations appear to end much earlier than the actual printing operation has completed. Thus, the next 1 (read) will be executing while the printer is still moving. 

We suspect that the reader stop occurs due to failure of timing interlocks that should block the 1402 from clutching to move a card until the printer is done transferring. 

We looked at the path from the flipflop that indicates the printer is active accessing core, through the gating logic that blocks clutching the card reader until the transfer flipflop turns off. 

Drawing of the signal path to hold off card reading during printer transfers
We could see the signal from the flipflop as an input to a +C0 gate, one that translates the +U level signal from the flipflop into a -T level signal named -T PR INTLK RD.

The -T PR INTLK RD signal flows through an OR gate (the odd triangle signal) which handles the multiple conditions that should block reading a card, then into the AND gate which gates a Feed request to produce the Read Clutch Magnet activation. 

The signal on the output of the +C0 gate, which should flow through the OR, did not seem to pulse upward in spite of the input signal pulsing downward. Admittedly we could have had difficulties with the scope trying to watch it, but it did not appear to jump up from -6 to +6V (T level logic). 

Thus it would be failing to hold off the Feed request, starting a card read too early such that its first row would trigger a scan while the printer logic is still busy reading from 201-232. 

We swapped the +C0 card but the problem persisted. We swapped the OR gate (actually it was a -A0 which is a NOR) but the problem persisted. We then turned our attention to the wired-OR at the output of our +C0 gate which comes from the Overlap logic in section 74 of the ALD. 

Looking to the overlap logic page, we found two more +C0 gates that are tied to the original +C0 from the printer. 

First anomaly we noticed is that one of the two +C0 gates in the overlap page had a pullup resistor, but so does the original +C0 in the printer logic page. A wired-OR net should NEVER have more than one pullup active, but we have found a few cases already where IBM violated this design principle, including here.

We isolated the two +C0 gate outputs from the overlap page, using a card extender, so that neither was tied to the original +C0. The behavior got much better - our test program of 1, 2, 1, 2, 1, 2, 1, Halt would work correctly much more often. 

Unfortunately, we did still encounter some reader stops. Pulling those cards improved the situation but did not correct it fully. Too, the signal on the output of the +C0 was still not pulsing upward properly. 

At this point, it was time to hand the machine over to the demo team. We have to dig further into this problem of the +C0 output from the printer logic.

In addition, we should work backwards through the logic triggering the reader stop to be sure this is caused by what we suspect. If not, we may zero in on the fault from that direction.

No comments:

Post a Comment