Sunday, June 25, 2017

Refactoring and ruggedizing my fpga logic for the disk tool

IBM 1130 LIGHT PANEL UPGRADE

My shipment of 200 miniature incandescent bulbs from China arrived yesterday and are waiting for me to begin soldering them onto 2-pin headers for use in the 1130. I think it best if I make a jig that will hold the header and the bulb in place for quick soldering of the wire leads onto the pins. It will give me consistency of results and some speed in preparing the 160 bulbs needed to populate the light panel. 

ALTO DISK TOOL

My disk tool design is suffering from glitches driving state machines into undefined states. The state machine (FSM) then will stall in that invalid state. The state of an FSM is defined in a register using some encoding (examples are binary, one-hot and gray code).

In the simple case of a binary encoding, if the FSM doesn't have a number of states that match the number of total states, some values in the register are invalid. For example, a three bit register encodes 8 possible binary values, but if the state machine has five valid states, coded 0 to 4, then if the register becomes 5, 6 or 7 the FSM will stall.

One-hot encoding uses a string of bits as long as the number of states, with one and only one bit set at any time. Thus, for the five state machine discussed above, the register is five bits long and has valid values 00001, 00010, 00100, 01000, and 10000. If the register ever has more than one active bits, or none are active, the FSM will stall.

The next value of the state register is determined in the FSM as a combination of the current state value and some input signals. If an input signal changes very close to the clock edge, it might produce a glitch where the next state register attains one of the invalid values and stalls.

Developing reliable operation of the FSMs requires careful attention to details. Particularly with externally generated asynchronous signals, such as the Sector Mark value, the value may be changing too close to the clock edge and lead to stalling.

One solution is to have every input signal that generates the next state value be itself registered so that it cannot change near the clock edge. Other techniques include methods to detect invalid states and force the FSM back to some valid state. For example, a parity test of one-hot values might detect an error.

Even with auto correction to recover an FSM from a stall, the result may be an FSM at idle when it was supposed to generate some triggering output. Another FSM may be waiting for that missing output, producing a deadlock. The result is still a stalled system. This can get quite tricky.

My first task is to ensure that I force all the external signals to become aligned to the clock boundary, passing each through a chain of a few registers in a row. This is the classic cure for preventing meta-stable states but also forces signal states to only change on the clock edge.

I completed a set of two-stage D flipflops to pass each outside signal through, getting them aligned to clock boundaries. I used the schematic approach, placing 54 D FF symbols and routing the signals graphically. Previously, I inverted the signals using combinatorial logic from the input pin, but now I invert combinatorially then pass through the two stage D flipflops to get it properly aligned with the clock.

I will then look to see if any of my input signals to important FSMs are passing through combinatorial logic where they might be in transition around a clock edge. As much as possible I will remove such logic leaving all inputs clean, and if I still need the logic then the outputs will be registered so they only change on clock boundaries.

While doing this, I will strip out some logic that had been in place for functions which I will eventually implement, but am not dependent on for the near term archiving and cartridge writing roles. This includes sector update transactions, disk drive emulation and some display functionality.

As part of stripping this down, I am also reviewing several hundred warning messages from the toolchain to look for any that are truly relevant and act upon them. In most cases, these will be leftover signal declarations that are not being used any more, which I can strip out.

I spent about one day in total doing all of this, then set it up for initial testing at home with a wrap-around board of some signals. Full testing will require the disk drive and cartridges  Done for today.

Saturday, June 24, 2017

Disk tool wrote cartridge images that boot on Alto; work on CHM 1401 and Digibarn Alto continues

ALTO DISK TOOL

I spent a week chasing inexplicable behavior in the tool, such as the stalling of the WriteEntireCartridge process after the first sectors. I kept exposing signals to external pins and LEDs to see what was happening, until I found the state machine for matching sectors was hung up.

It has six states, two of which are single cycle advances to provide a short pause. I saw that the machine was not in any of the four longer term states. I then added the two short term states to the LEDs. That would either show it stalling in those one-cycle states or more likely getting wedged into an invalid state value that was none of the above.

However, with the six states displayed on the LEDs, it worked properly. I was able to write disk images to our scratch disk cartridge, first booting up and verifying the games.dsk image from Bitsavers.org and then the xmsmall.dsk image.

We took some time to play with Smalltalk, using the latter image, hoping to end up with a compelling brief demo to use on a video and at our upcoming VCF West appearance. We did find that this disk had very little free room, causing some out-of-space conditions as we played.

After we were finally sated on that, I wanted to test out the ReadEntireCartridge function using the Smalltalk cartridge which I could compare to the xmsmall.dsk image from which it was written.

The file I uploaded did not match well at all, which indicates problems in the reading function. My suspicion was that my timing was now off and I was encountering checksum errors as evidence of that bit shifting. To see this, I exposed the header, label and data checksum state bits on three LEDs and resynthesized.

Aaarrrgh. Once again the unit was stalling after reading one sector. When I pushed button 2 to command a seek back to cylinder zero, it triggered the ReadEntireCartridge state machine to start advancing through cylinders but while I was simultaneously triggering seeks.

At this point, I will have to look over my design and find ways to 'bullet-proof' the state machines. There are techniques, for example, to manually specify the encoding of the various states in the state register for a machine, so that I can ensure the register won't glitch into an undefined state. This is my homework assignment for the week.

CHM 1401 RESTORATION

The 1406 memory cabinet power supply remains out of commission. It is a 30V, 7A supply that uses a regulator card to adjust the output to a setpoint from a potentiometer, but the circuit is not regulating. We have swapped all of the transistors and checked just about everything we could think of.

I have a few more tests to attempt on Monday or Wednesday, then we may be forced to pull a power supply from a spare 1406 that we have in the warehouse.

DIGIBARN XEROX ALTO RESTORATION

We put in another full day working on the remaining bad power supply for the Digibarn Alto. It is the unit that supplies 5V for the main logic. Actually, this is a dual 5V supply, one providing high current and another providing a low current second 5V source. Neither supply inside the box is working.

Previously we had identified four bad large electrolytic capacitors and replaced them, found and replaced two silicon rectifier diodes that were blowing the PS fuse, then realized that the 18V and 5V internal power to the power supply circuitry was missing.

Today, we removed and checked the 18V voltage regulator (LM7818 chip) which was good. We then chased down a short to a section of the machine, applied controlled current to that line until we found smoke coming out of a tantalum capacitor.

Replacing that capacitor allowed the power supply to develop its internal 18V and 5V power, but it was still not delivering any output from the unit on either primary or secondary sides. We found a smaller electrolytic inside that was also bad and replaced that by 5PM. Still no output.

Marc continued that evening after Ken and I left. He hunted for and found the 23KHz chopper signal on one side, which was being blocked from driving the rest of the circuitry by a NAND gate. Note that there are no schematics for these supplies available, so that figuring out what is going on is notably harder than a normal diagnosis process.

Marc lifted the regulation line, which is what kept it cut off, seeing some unregulated voltage develop on the primary side. Secondary side is still dead. While working on this, something else failed, likely another tantalum somewhere but alternatively a failing diode or other semiconductor. The power supply is back to blowing 12A fuses.

At this point, the effort required (16 hours so far) is getting out of proportion to the value of continuing to restore this particular machine. It had many bad power supplies, as were most of the spare supplies that Bruce had available. The Alto itself is missing one logic board and one cable, at least.

The monitor was not compatible with the Alto at all, thus had never worked together as a system. The disk drive had a label inside - "smoked" - thus of suspect condition. The spare logic boards he was given with the donation contains missing chips, burned sections and other signs that they are non-working items.

This machine is a great artifact and static exhibit, but is appearing to be a collection of broken parts pulled from working systems and donated to Digibarn. That would imply extraordinary time will be needed to find and repair, if possible, all the nonworking parts that were assembled to make up this artifact.

We are therefore coming to the conclusion that this is a poor candidate for restoration and a bad use of our time, compared to restoring other machines such as the exhibit in the Xerox PARC lobby that were clearly working at shutdown or units at CHM not already restored by Al Kossow. We will finalize a decision within a week. 

Wednesday, June 21, 2017

Slogging along working on Digibarn Alto power supplies

DIGIBARN XEROX ALTO RESTORATION

We restored three power supplies to operation, replacing bad capacitors and other failed parts. The fourth supply is the big one, delivering +5V for most of the logic, and it was blowing its 12A fuse immediately on powerup. 

We discovered that all the electrolytic capacitors were bad and replaced them - fairly expensive parts to match the mounting method and available space. We then found two of the rectifier diodes were blown and replaced them. Now the fuse does not blow, but no power is produced yet.

On the board, the logic for the power supply is powered by an 18V regulator chip for operational amplifiers and in turn a 5V zener diode, that feeds off the 18V line, for TTL logic. The output of the LM7818 was zero, which can either be due to a failed regulator or a short circuit downstream. We ran out of time to work further on this unit, having spent the day restoring the other three and doing the capacitor and diode replacements. 


Thursday, June 15, 2017

IBM 1130 light panel upgrade boards complete, working on Alto disk tool debugging

IBM 1130 LIGHT PANEL UPGRADE

I investigated the three bad SCR positions I had previously uncovered and discovered that in two of the cases, the issue was that the large flat contact surface (anode) didn't flow onto the PCB pad beneath, leaving the circuit open. Resoldering brought them into full operation.

Working through my big PCB, I began to check the continuity to the anode. Fortunately, the SCR type has a stub lead sticking out of the side between the cathode and gate, which I can reach with ohmmeter probes and check to see if the anode is connected to the lamp pin. I repaired several positions that had such faults and each worked perfectly after the fix.

I definitely had to repair the original SCR that is wrong, the one I tested with in my first attempt, since it won't fire until the voltage gets to an unacceptably high level on the input pin. I used my hot air rework gun to strip off the failed part and solder in a replacement. Voila, the board is now fully functional. 

I am now bottlenecked waiting for the light bulbs to arrive from China. The boards are complete but I don't have enough bulbs to load onto the boards and install them into the 1130. When I solder each lamp on the header pins, I plan to encapsulate it in silicone which will prevent the wires from ever shorting together, as that would destroy an SCR. 

I installed quite a few bulbs onto one of the boards and did a test fit to see how easily the could be fit into space without bending or damaging the lamps. The results were excellent, which implies that once I have all the bulbs on their headers and potted with silicone, placing the boards against the honeycomb will be easy. 


Bulbs on headers plugged into sockets on PCB, after test fitting into honeycomb

ALTO DISK TOOL

I worked on the testbed to check out my write cartridge code, since something is going wrong when I attempted to write an entire cartridge on the real disk drive. The logic stopped after the first sector was written and a flag bit indicated an overrun, where the writing FSM is still active when the next sector mark arrives.

I can't see any place that will write the overrun flag, so that must have been a false indication during my testing, something I misread. I concentrated on the logic that steps my transaction through writing the entire cartridge.

I don't see anything that should block the write from continuing sector by sector, so I set up for some testing, simulating the sector mark and disk status to allow my logic to run. I set up the scope to track key signals, the first of which is the WriteGate signal which defines the range of the write to an individual sector. If that is active long enough to run into the next sector mark (3.3 ms) then I may be experiencing overrun conditions.

I see the sector begin writing at the sector mark and the last word of zeroes is written at 3.168ms. The sector has over 160 us of free time before we reach the next sector mark. This reinforces my belief that we are not experiencing overruns when writing a sector.

I wondered whether I might start the next sector (sector 1) while in the midst of a sector mark interval (i.e. it was already logically 1 when I started looking for a sector match), but with the SM only 5 us long, it can't force us into an overrun situation.

I kept looking, focusing on what has to happen for the WriteEntireCartridge state machine to step through the entire cartridge sector by sector. I changed the diagnostic outputs to give me the data that will help pinpoint the cause of any problem.

IBM 1401 RESTORATION WORK

We have a document of uncertain quality that was built by field engineering specialists back in the 7094 and 1401 era, listing a number of general market transistor types that are said to match an IBM transistor number. We were missing spare 028 and 036 transistors, but the chart gave us 'equivalents' as 2N1038 and 2N456. Those in turn are in equivalence tables to the NTE numbering scheme as NTE176 and NTE104.

Using the NTE numbers, we bought a few of each type to use in repairing the voltage regulator card for the extended memory frame of the Connecticut 1401 system. This card is a differential amplifier that compares the voltage being produced by the power supply to a reference value set on a small potentiometer. The difference signal is amplified by a chain of two transistors (028 and 036) and that drives the base current of the parallel 108 transistors that deliver up to 7A of the regulated voltage.

Normally we can test transistors for signs of death using a DMM, either looking at resistance across the various junctions or using the diode tester function. There should be a one-way path from emitter to base, and a one-way path from base to collector, of the appropriate polarity, but no path from emitter to collector.

That is true for silicon transistors, but germanium ones exhibit enough leakage current that they will slightly bias themselves on, passing current in one direction from emitter to collector even with no current supplied to the base. Further, if a transistor becomes weak, having too low an amplification factor, it will still test good on the DMM but fail to deliver enough current in the real circuit.

We suspect that is what has happened in the voltage regulator card (and a second known failed card which keeps the voltage lower than the first card, but still is not able to drive it down to the set target voltage. Our bad card will allow the voltage to exceed 40 volts, while this second card keeps it down to 33V. The target is 30V, which neither can maintain. Weak amplification would explain this.

I have a Peak Atlas DCA Pro DCA75 tester that will measure amplification, leakage and other factors. I will use that when checking out any suspect transistors. 

Wednesday, June 14, 2017

All boards working on IBM 1130 upgrade for light panel

IBM 1130 LIGHT PANEL UPGRADE

All boards are built and I began live testing. Lamp test works properly but for some reason I was not getting the bulb to light with the signal pin at an acceptable voltage. A single instance of the circuit built off board works, so this is a matter of interaction among circuits that I have to address. 

In the 1130, the lamp test line is hooked to all SCR gates with 6.8K resistors, while the individual signal inputs are hooked to the gate directly. The 1130 wiring has 6.2K resistors in series with all signals, thus it appears that the SCR gate is hooked to a voltage divider between the +3V logic signal and the ground level that lamp test is normally holding. 

About 1.5V goes into the SCR gate which should conduct. When lamp test is pulled up from ground to +3V level, the SCR gate conducts. The fact that I don't see the lamp lighting with the input signal is troubling. It only works when the voltage is boosted to 3.26V with lamp test floating or 3.9V when lamp test is at ground.

To hook this into the 1130, I have to accept the constraints of that system. Signal inputs are somewhat less than 3V to fire the lamp, AC supply to the SCR is 7.25V and the serial resistance with the signal inputs is 6.2K. Thus, it seemed the current boards wouldn't work. 

I moved the bulb over one position and the results were completely different! I seem to have a rogue SCR or a flaw in that one circuit. I will now populate bulbs in as much of the board as I can, set the input signal voltage to less than 2V and spot any positions that don't work as intended. Moving bulbs around will let me check out the entirety of each board.


I completed both small board checkouts. Three positions didn't work properly and need to have components replaced, but the remaining 59 worked properly with both 1.41V signal and 1.41V lamp test voltages applied. I am off to the CHM to work further on the power supply regulator card I am happy that the circuit is sound and these should work well when installed into the 1130 panel.

Tonight I only had time to test a portion of the big board, since it will require about 20 setup and test operations to move 5 bulbs carefully through all 96 circuits. I may need to float the lamp test line when not active, rather than grounding it as the 1130 currently does, since the grounding will drive the signal voltage to about 55% of its value at the SCR gate. Floating will provide all 100% of the signal voltage to operate the thyristor.

IBM 1401 RESTORATION WORK

We worked on the voltage regulator card but were ultimately stymied by the lack of any 028 and 036 transistors on hand. We have sourced them and can continue with the repair next week. 

Monday, June 12, 2017

Finished all 1130 light panel boards, worked on 1401 system at CHM

IBM 1130 LIGHT PANEL UPGRADE

I continued building the final large board today, completing all 96 triacs and resistors before breakfast. I continued with inserting all the lamp sockets and half of the signal pins before it was time to head over to the CHM to work on the 1401 systems. 

I did some testout of the resistors and signal pin wiring, confirming that the first 48 pins and their associated lamp test resistors were installed properly.  After I returned from the work at CHM and evening with the 6800 club at Holders, I finished up the board.
Big board completed
All boards check out, but the power on testing with the limiting resistor will be needed tomorrow.
Three boards in approximate relative position as they will sit inside 1130
IBM 1401 RESTORATION WORK

One of the 1401 systems (Connecticut machine) was down since smoke poured out of the 1406 memory extension box last Wednesday. We removed the power supply and found the part that emitted the smoke.

The power supply has two SMS cards installed, one regulating the output voltage and the other protecting against overvoltage. If the output of the power supply goes too high, a silicon controlled rectifier is clamped across the output. This technique, called a crowbar, will cause a circuit breaker to pop.

In our case, the breakers did trip but took far too long, since the twin 3 ohm load resistors limiting current in the crowbar carried 10 amps each for enough time that they scorched the board underneath and burned insulation off of nearby wires.
Trace side of crowbard card

Component side of crowbar card
The cause of the crowbar activation was the failure of the regulator card, allowing the voltage to soar up to more than 40V, instead of the nominal 30V expected from the supply. We don't have any spares for this card type, thus will have to diagnose and repair the card before we can restore the 1401 system to operation. We have replaced one transistor so far but the card is not yet working properly.

Sunday, June 11, 2017

Building the IBM 1130 light panel upgrade

IBM 1130 LIGHT PANEL UPGRADE

My PCBs arrived along with the remaining components, while I was on my trip to NY. I found that I hadn't specified the right size hole to mount my turret connectors directly on the board, but I thought I had a workaround that would retain the turret connectors. It did not pan out, so I will be soldering the power wires directly to the board. 


Small board (one of two)
Large board (only one required)
I will begin to build one of the boards to test it on the 1130. First step is to solder down the surface mount resistors, as they are the smallest and closest to the board. Second step is to solder the surface mount triacs in place. Third is to mount the lamp sockets on the bottom side. Fourth is to mount the signal pins on the top side. Fifth is to mount the turret connectors. 

I am concerned about shorts in my soldered lamp holders, since the bulbs have bare wire leads. This vulnerability affected the original IBM boards and will affect mine too, destroying the Triac immediately. I have two ways to address this. 

First, I will work out an insulation scheme that protects the bulb leads and prevents possibility of a short. Second, I will put in a current limiting resistor to the AC line while I am checking out the light circuits one by one, so that I will only have one of three cases for any light circuit:
  1. The lamp lights correctly and all is good
  2. The lamp does not light due to a bad bulb or open circuit, replace and repeat test
  3. The lamp does not light because holder is shorted but the resistor protects the Triac from catastrophe
By Sunday evening I had the small panel for the far right side completed and a number of lamp holders guaranteed to be short free. First, I fitted the board into place to confirm how it sits inside the 1130 pedestal box on the face of the honeycomb. That was a perfect fit.

Trial fit of one small board against honeycomb
As you can see from the board above, not every position is used on the small boards. The first board above only has 27 bulbs out of the 48 possible positions, thus I only installed components on those 27 spots. The middle board, also a small one, has 33 lamp positions utilized The final, large board has every position implemented, a total of 96 lamps.

I began construction of the second small board, installing all the resistors, triacs and lamp sockets by dinnertime. All that was left were the 33 signal pins and the three turret connectors. Soon those were installed as well and I could move on to the final large board. A very long process, soldering 387 components, so didn't finish this evening.

Tomorrow, I will hook them up to test power with the limiter resistor and check each light circuit. Since my hot resistance of the bulb is around 50 ohms, my limiting resistor to protect against shorts can't be more than about 10 ohms if I hope to see the filaments light.

I am still waiting for my 200 light bulbs coming from China, which I will then have to solder onto the holders to plug into the sockets on my boards.

Major progress on 1401, 1311, 729 and 1402 restorations

RESTORATION WORK AT TECHWORKS BINGHAMTON

1401 System

Our team arrived at our hotel late on June 6th, worked on the 7th, 8th and 9th, with travel home on the 10th. There were tours, picnics, interviews and other events that took time, but we did get a decent amount of time working on their equipment.

The 1401 system had been previously powered up by the local team, but it was not able to do arithmetic correctly. When we arrived we started to work on that problem. Other problems arose that had to be dealt with, such as when we lost the ability to store the A bit in any position in memory. 

The A bit problem manifested itself as a C bit (checksum) error, which we began tracing through the C bit logic until we realized that the machine was also not holding the A bit, whose absence made the C bit value incorrect.

We found a total of three cards that were malfunctioning, replaced them and had data storing properly again. We went back to work on the addition failure. The machine could correctly add 1 + 2, for example, but not 2 + 2. 

We quickly realized that we had a 'hot' 1 bit, where any arithmetic result would have the 1 bit turned on regardless of its proper value. Thus, 1 + 2 produced 3, but 2 + 2 produced a 5 since the 1 bit was erroneously set. 

We were tracing this from the adder logic itself out to the memory. The way that arithmetic works in a 1401 is that the result character of an addition (or other arithmetic operation) is stored in memory without going into the B or A register. Thus, along with the wrong value, if the 1 bit was not intended to be on, the parity would also fail. The 2 + 2 case stored a 5 (1 and 4 bit) without the C bit since parity should be odd, flagging an error due to an even parity.

The 1401 uses wired-OR logic, where multiple gates have their outputs shorted together to form an OR of the conditions of the contributing gates. This means when you have the extra 1 bit set, it could come from any of several gates that are shorted together. 

We did lots of oscilloscope work probing the state of various signals in the path from the adder to where it stores in memory. For quite a while, we saw that no set of inputs should produce a 1 output yet it was there. 

To do the scoping, we set up a short loop to set up fields for an addition, perform it and loop perpetually. We had the most success triggering the scope by a signal that is activated when the adder is ready to store its result in memory. 

The 1401 system encodes numbers as binary coded decimal (BCD) characters, but the arithmetic hardware itself uses a system called qui-binary by IBM. Thus, the input digits are converted from BCD to qui-binary, arithmetic occurs and the output digit is converted back to BCD. 

Qui-binary has a five value and a two value section, the quinary (base 5) and binary (base 2) portions. Thus, we had to find the circuitry that assembled the BCD bits from the quinary and binary states. We looked at the first gate generating the 1 bit and found that the adder was giving the proper value. 2 + 2 had only the 4 bit set, not the 1 bit. 

The 1 bit value then transitioned through a small number of gates until it reached a double negative AND gate whose two sections were ORed together and also wire ORed to several other gate outputs. This wire OR output is the drive for whether a 1 or 0 is written in the 1 bit during the current memory cycle.

The top of our double AND gate had the 1 bit value from the adder and the overall signal to write an arithmetic result to memory. The bottom had the value of the 1 toggle switch on the console and the overall toggle switch to manually enter data into memory. Thus, this double gate drives a 1 either because of manual entry or arithmetic results. 

The inputs to the manual entry section don't change unless the toggle switches are moved. The inputs to the arithmetic result section were 1 for the 1 bit value and a pulse to store. Since this is a negative AND gate, it only passes a result if both inputs are negative. It therefore should NOT write a 1 into memory.

The wired OR output of this and the other gates showed a positive pulse, writing a 1, at exactly the timing and shape of the enabling pulse for arithmetic result storing. Inputs don't meet the conditions of an AND but the output pulses. 

Swapped the card but no change. Examined inputs to all the other gates wired into this output, but none had conditions that would fire. Swapped each of the other cards just in case, but no change. Looked at the wiring on the backplane near the card. Tested the signals on the card itself, with an extender, to see if there is a socket problem. 

After half an hour of increasingly fanciful hypotheses and tests, looking for some analog issue or hidden path to drive the erroneous 1 output, the problem went away. It was the end of a workday and inexplicably the addition was no longer producing a hot 1 bit in the result. 

We could tell instantly because my looping program encounters the parity error when the hot 1 overrides the intended 0 value for that bit. This shows up as a red light in the storage block on the console panel. When that stopped lighting we checked the stored field and found that 2 + 2 was now 4, not 5. 

We came back the next morning, and extended my program to add multidigit fields, rather than a single digit for each operand. The red light flashed again while the program looped. A look at the result field showed that our problem had simply changed from a hot 1 bit to a dead 1 bit - always a value of 0. 

Thus, 2 + 2 properly produced 4 but 1 + 2 produced only 2, not three because the 1 bit was permanently set to 0. The scope went back on and we began tracing signals again. At this point, I noticed the the input to our double AND gate, arithmetic results section, was at ground potential. Since this is a T level logic signal, the only valid values are -6V and +6V.

I looked at the ALD page and saw that our input to the double AND comes from another logic compartment. The signal moved over our backplane to a paddle card that would route the signal to the other compartment. I checked continuity with a meter to the paddle card. 

Since continuity was good on the original compartment (01A3) we moved to the arithmetic unit compartment (01B3) and verified continuity over the cabling between compartments. In fact, we traced it all the way to the output pin of the card that produces the arithmetic 1 bit value. 

The output of the card was at ground (invalid level) but the input to gate was valid and correct - either a 1 or a 0 depending on the arithmetic result. We swapped that card with a spare and resolved the problem. Apparently this card was producing the hot 1 bit through some weird failure mode and got worse suddenly yielding the permanent 0 value for bit 1. 

We proceeded to check out many variants of arithmetic - different length fields, carries, and subtraction for example. After this proved arithmetic is good, we went on to check other instructions. Among the instructions tested successfully were:

  • Move
  • Compare
  • Branch
  • Branch when Equal
  • Add
  • Subtract
  • Set Word Mark
  • Clear Word Mark
  • Move zone
  • Move digit
  • Zero and Add
  • Read a card

As far as we can tell without running the complete and comprehensive diagnostic tape, the 1401 is fully operational. 

1311 Disk Drive

The 1440 system came with a 1311 disk drive that so far was only able to spin the platters. The arm could be manually pushed out over the disk surface but the heads never loaded (lowered to fly on the surface). Iggy worked on this, beginning with a careful inspection and full cleaning of the disk heads and disk pack.

He discovered a misadjusted microswitch, several missing logic cards and a few other things over the course of the three days. After one day, the drive would sequence up to the point that it moved the arms all the way to the inner cylinder, but was not jumping back to the outer cylinder and loading.

By the time we left, the drive completed its sequence, loaded the heads and was fully operational as far as we could tell with the limited testing we completed.

729 Tape Drive

Iggy pulled out one of the tape drives to work on. He found a failed microswitch that kept the vacuum pump from operating, a few other problems and then had the motor that lowers the head onto the tape fail to spin. He determined that the motor itself works but the relay to control it is not operating properly. Since he didn't have documentation for the drive he couldn't finish getting it working.

1402 Card Reader/Punch

The local team were concerned because they had found fragments of rubber belts in the bottom of the machine, but had no spares to install. Frank examined it carefully and found that the only two belts which were missing were both for the punch side. One is critical, as it drives contact breakers in time with the feeding process, but the other is only needed to move the stacker rollers for punch output. As long as one can accept that all punched cards will fall in one stacker, it isn't needed.

We were able to trigger a read reliably by issuing the appropriate 1401 instruction (op code 1) although the data may not be scanned in properly due to a premature reader stop. One cause of this is that the alignment pins to hold down the first reader block were sticking, thus not holding the brushes fully in place.

Frank was able to rebuild the alignment pin mechanism. The brushes in the 1402 are kind of scraggly, so we will send some spare brushes to this museum after we return home. Another problem was that doing a non-process runout (NPRO) operation didn't reliably trigger the read clutch, which we attribute to a problem with the relay logic that drives the 1402.

The machine has many relays which sequence through operations such as reading, NPRO, punching and handle conditions like the hopper emptying. The contacts tend to oxidize over time if not used. We couldn't look at the suspect relays because we didn't have the documentation to tell us which relays were involved. We will send the museum a relay tester that has helped us find and fix bad relays for our 1402s.

Wednesday, June 7, 2017

Working on 1401 and other gear at museum in Binghamton, NY

ON EAST COAST AND HELPING AT TECHWORKS MUSEUM BINGHAMTON

I have been out of touch for a few days while traveling and helping the teams at TechWorks! in Binghamton with their 1401, 1440 and other IBM gear. Two way exchange of advice and ideas plus a chance to see some historic IBM and other gear. One non-computer example is part of the lunar module simulator used by NASA to train the Apollo astronauts for their landings on the moon.

We are currently chasing a problem in the 1401 system here which causes Add instructions to inject a 1 bit into any result. If the result of the added characters already had the 1 bit set (was odd) then no harm, no foul. However, if the result was even, the added bit causes a parity error in the work going into memory.

The signal only passes through a few gates from the point where the adder converts the qui-binary result into BCD, with a signal called Arith 1, to the point where it drives the inhibit line on the 1 bit core planes. The way core works, inhibit lines must be active to keep a core from flipping on, leaving it at the zero state created during the readout of its previous contents. If no inhibit, then the core is set to 1 at the end of the memory cycle.

Our tools were inadequate to watch the signals and find the spot where it is failing, but we will have access to a more modern storage scope tomorrow when we hope to find and eliminate the problem.

We will also help inspect and clean some 1311 disk drives, as well as one of the 729 tape drives for the system. We have already done some inspection of their 1402 reader/punch and are finding the part number for one missing rubber belt.

Thursday, June 1, 2017

Making good progress on disk cartridge writing but not there yet

IBM 1130 PANEL UPGRADE

I continued to solder bulbs onto headers to use with my new PCBs, including one of the original size bulbs used by IBM in the 1130. It is possible that with great care I could use original size bulbs and get them to fit into the honeycomb blocks, but that won't be necessary if my newly ordered mini bulbs arrive from China.


Two mini bulbs and one original sized, installed on headers

ALTO DISK TOOL

I made some adjustments to the logic related to when the new parallel word is prepared for access by the serializer, as it will be important to having the checksum written out properly. With these changes it appears I am writing the sector correctly.

The timing is still off, with the crystal on my board running 9+% slow. I bought some LVCMOS oscillator modules and stuck one on the board, but the clock did not run with it in place. A quick look at the documentation gave me the correct pin for that clock signal - U9 - and I reran the tool chain and tested again.

The error persisted, I dug deeper and found a spare cycle in my clock FSM. I fixed it. Measuring again, I found the inter-record time (from sync bit to next sync bit) to be 144, close enough to the 134.4 theoretical timing. I will put the clock module on the other two fpga boards so that they all run at the (same) intended speed.

Next up was a validation that the checksum is working properly. To address this, I set up a spreadsheet with the eight words of the label record and calculated the XOR of those with the seed value 0x0151. It came to 0xA32C

The value being emitted by the disk tool is 0xA32C so this is working well. I ran over to Marc's to give it one live trial before I leave tomorrow for the visit to the TechWorks museum in Binghamton, NY.  The attempt to write an entire cartridge failed with an overrun error, where the write is still active when a sector mark arrives.