Friday, April 4, 2025

Running IBM disk diagnostic 309 against the Virtual 2315 Cartridge Facility - part 4

SIMULATION USED TO CHECK OUT THE NEW LOGIC FOR SEEKS

I carefully simulated seek behavior with the significantly changed code for the seek function in the Virtual 2315 Cartridge Facility (V2315CF). This runs under the Vivado design suite, although the actual generation of the code for the FPGA is done with the IceCube2 suite. 

It seemed to work well, but then again the prior version of the code seemed to work properly. In the real world, the signals come from the 1130 disk controller to the Virtual 2315 Cartridge Facility (V2315CF) and are passed out to the 2310 disk drive inside the IBM 1130. The controller and the drive appear to be in sync at all times, but the V2315CF which is just trying to shadow or observe the seeks ends up with the wrong cylinder number quite often. 

FIXES APPLIED TO V2315CF

My new version of the FPGA logic was loaded into the V2315CF. Up came the 1130 system and the virtual cartridge in real mode. I used my manual instructions once again to see if it does better in maintaining solid synchronization with the 2310 disk drive. 

Alas, I saw similarly bad behavior this time, in spite of an entirely different approach to shadowing the disk arm movements. For example, I performed a forward seek to cylinder 202, by setting the seek value to xC9 which is 201 decimal. When I did a read of the sector, I did see the proper relative sector number in the first word of the sector. 

However, when I tried to back up by 200, which would put me at cylinder 1, the Home indicator came on. I was either not fully at 202 with the forward seek or moved further than 200 in reverse. I tried some small seek values and saw similar mistracking.

It has been my belief that the physical 2310 and the 1130 disk controller were in sync, but I need to test that directly. If I take the top cover off of the 2310 disk drive, I can see the marked cylinder locations on the side of the disk arm actuator. It will be better if I can be completely certain that the drive and the disk controller are correct before I keep mucking with the logic in the V2315CF. 

Next I need to make the PICO debugging output list the cylinder number, allowing me to quickly see where the V2315CF believes we are and compare that to the arm actuator scale and to the intent of the instructions I execute. 

Running IBM disk diagnostic 309 against the Virtual 2315 Cartridge Facility - part 3

DEBUGGING TO FIND THE ISSUE

I looked for asymmetry - places where the logic in the IBM 1130 disk controller or Virtual 2315 Cartridge Facility (V2315CF) do things differently for seeks in the reverse versus forward direction, since the symptom appears to be that a reverse seek doesn't go as far as was requested while the forward ones 'appear' to work correctly based on very limited data points.

The disk controller in the 1130 acts symmetrically. When a program issues an XIO (eXecute Input Output) instruction pointing at an Input Output Control Command (IOCC) of type control, it is requesting a seek. The first word of the IOCC has a count of the number of cylinders to move from the current point. The second word has the code for XIO Control, the device address of the internal disk, and bit 13 which is reverse direction if 1, forward direction if 0. 

The disk controller logic picks up the count from the first word of the IOCC and stores its twos complement in a count register. It then sets the direction of arm movement based on word 2 bit 13 of the IOCC and begins repeatedly triggering the Access Go signal to the disk drive. Each time it issues Access Go, it increments the count register. When the register gets to zero, the seek function ends. 

The only fine point to modify the above is that the disk drive can move either 10 mils or 20 mils at a time, a step of 1 or 2 cylinders. The controller logic uses the 20 mil setting for every Access Go except the first one if the count of cylinders to move is an odd number. The increment of the count register is 1 or 2 depending on the step size being used. 

Nothing differs based on direction in the logic for selecting the step size, triggering each move with Access go, incrementing the count register and ending the seek. The direction signal is asserted based on bit 13 of the second IOCC word and shouldn't change during the seek operation. Unless there is some weird flaky component in the disk controller, therefore, it shouldn't cause the symptoms seen.

I reproduced the anomalous reverse seek stopping point by issuing XIO Control directly to the drive and saw similar behavior. This seems to exclude the disk function diagnostic 309 code. 

My logic in the V2315CF is minimally different for reverse versus forward seeks - the code that stops the arm from moving past its cylinder extremes of 0 and 202 has to check different limits. I don't see how the code could malfunction but it is one of the open areas.

The V2315CF looks for the leading edge of the Access Go signal to perform a seek. I have a chain of four flipflops to deal with possible metastability but don't do any explicit debouncing. The same signal that comes to the V2315CF is passed along to the 2310 disk drive, which also looks for the rising edge to start a movement of the arm. 

It appears from my scrutiny of the drive controller logic that the Access Go signal is blocked when the microswitch is activated by the arm sitting at cylinder 0 (Home). Thus the number of actual movements in the reverse direction will be less than the count from word 1 of the IOCC, if the arm reaches Home before completion of the count. 

This shouldn't impact the V2315CF because we get the Access Go signals exactly the same as the 2310 drive does, from the disk controller logic. A possible source of error is if the disk drive reports Home to stop the movement but the V2315CF believes it is at a higher cylinder number. It will then use the residual higher cylinder number to access RAM for when a XIO Read is done. 

I did plan to make a change to the V2315CF to look at the Home signal from the disk drive. Any time it goes to Home, the cylinder in the V2315CF will be reset to 0. This will sync the V2315CF and the 2310 every time we go to the Home cylinder. However, if the two are not tracking perfectly, the emulation will still not work correctly since the data read and written from the virtual disk cartridge will not match the intended cylinder address except surrounding times it goes to Home. 

To collect more data on the (mis)behavior, I set up the machine to manually issue some seeks and reads. I will move the arm around, in both directions, doing a read of head 0 sector 0 from the resulting location to observe the relative sector number which is written in the first word of the sector. By doing a mix of short, medium and long seeks, including some that attempt to move past Home or cylinder 202. 

The most important thing I detected was that this was not an asymmetric failure, occurring only with reverse seeks. For example, starting with the arm at the Home position and the V2315CF in sync with the 2310 disk, I requested a seek of 32 cylinders forward, then did a read. The first word of the sector told me we had reached cylinder 37, four too far as we should have been at 3... Following this with a reverse seek of 32 cylinders, I found the arm back at the Home cylinder according to the disk drive as reflected in the device status word. A read of the sector confirmed this.

I started at Home, performed a forward seek of 201 cylinders so that we would reach the last cylinder on the disk. A read confirmed we were at 202. If the V2315CF saw excess seeks it would still have stopped at 202 so this didn't tell me we performed exactly the correct number. I then executed a reverse seek of 201 cylinders, resulting in the 2310 disk recording our arm back at Home. However, reading the sector showed me at cylinder 16 as far as the V2315CF was concerned. This matched the asymmetric examples I saw earlier.

Now that the disk drive reported the Home cylinder, it was not possible to get the V2315CF to back up any further than cylinder 16. That is because the 1130 disk controller logic, seeing the Home cylinder, does not perform the reverse seeks I requested. 

For some reason, the V2315CF is not tracking the seeks being performed correctly. We are either dropping or adding movements compared to what the disk drive sees and performs. Given the relatively slow logic in the IBM 1130 and the disk drive, what IBM terms the 30ns medium Solid Logic Technology (SLT) family, short glitches from the controller to the disk might not result in any disk movement but we picked up by my logic. Alternatively, I may be picking up noise from adjacent signal lines that trigger the V2315CF. 

CORRECTION OF THE FLAW

I determined that I can make make use of the interlocked behavior of the 2310 disk drive to more faithfully track disk seeks. Each time the disk drive is commanded to move one or two cylinders, it toggles the Access Ready signal. More specifically, the sequence of signal actions is:
  • Access Go is raised to request a disk seek
  • several milliseconds pass
  • Access Ready is dropped
  • Access Go is dropped by the disk controller
  • perhaps 10 milliseconds elapse
  • Access Ready is raised
Notice the interlocked behavior, with Access Ready dropping to indicate receipt of the seek, Access Go dropped once the drive confirmed receipt, then Access Ready raised to confirm completion of the movement. I can make use of this in the V2315CF logic so that bouncing signals wont trigger multiple seek simulations. 

Tuesday, April 1, 2025

Running IBM disk diagnostic 309 against the Virtual 2315 Cartridge Facility - part 2

RESULT OF DEBUGGING SESSION ON SEEK ERRORS

My first test was performed by setting up manual XIO commands to seek forward, seek in reverse, read 1 word of a sector and to sense the device status word. I used them to manually command the Virtual 2315 Cartridge Facility (V2315CF) and the attached internal 13SD (2310) disk drive in the IBM 1130.

I had set up the two seeks to move x00C8 cylinders, one forward and the other in reverse. That is a move of 200 cylinders, thus stopping on cylinder 200. The home bit (cylinder 0) was on in the device status before the seek and off afterwards. 

I then did a seek in reverse of 200 cylinders which should have returned us to cylinder 0. The home bit should be on in the DSW and when I read a sector, the first word should be 0x0000 to 0x0007. 

When it finished, the home bit was NOT on. The read displayed a relative sector number of 0x0020 (32) which corresponds to cylinder 4, not cylinder 0. I did another reverse seek of x00C8 but the value read from the sector showed we had only moved to cylinder 1, not to zero. It took a third reverse seek to light up the home bit and to read 0000 as the relative sector. 

This proved that the diagnostic seek to 0 should have been successful, but somehow it read sectors 112 to 119 instead. Similarly to the experiment with the manual XIO commands I set up, the reverse movement is not moving far enough. 

This rules out the diagnostic as the source of the errors. I may be the IBM 1130 disk controller logic, the V2315CF, the 2310 disk drive or some kind of signal integrity problem on the cabling between 1130, V2315CF and 2310. 

The home cylinder status is passed directly from the 2310 to the 1130, thus we are really not backing up as far as the command requests. The V2315CF decision on what cylinder we reached begins with 0 at startup of the machine and then tracks purely by the seek commands from the 1130 controller logic to the 2310 disk. 

I did update the logic so that when the home condition turns on from the 2310 disk drive, we immediately force the V2315CF to set cylinder to zero. This will keep them synchronized whenever the drive goes to cylinder 0. I don't believe this is the cause of the error, but it was something that made sense as I thought about it. 

The data returned by a Read command is based on the cylinder that the V2315CF believes we are at. Thus if the seek logic in V2315CF is not working correctly, it might get out of sync with the cylinder where the arm of the 2310 is flying over. 

I am busy tomorrow but should be back at the workshop on Thursday where I can load the new bitstream to the FPGA and test again. 

Feeling a bit more confident in the IBM 1130 core memory

CE STORAGE DISPLAY FUNCTION OF THE MACHINE

The machine has a few switches intended to be used by the Customer Engineer who services the system. One of them is Storage Display. When turned on, pressing Prog Start has the machine loop continuously through memory reading each word. If there were a parity error it would stop the scan, otherwise it just runs forever until you press Imm Stop to turn off the run flag. 

SHOOK MACHINE AND THUMPED CORE MEMORY COMPARTMENT WHILE SCANNING

While running the Storage Display, I jerked the machine around on its casters and even thumped atop the gate and compartment holding the core memory. No parity errors were detected which comforts me. I had feared there were intermittent connections, similar to the few I have already resolved, that would show up as new parity issues. 

Monday, March 31, 2025

Running IBM disk diagnostic 309 against the Virtual 2315 Cartridge Facility - part 1

IBM DISK FUNCTION DIAGNOSTIC 309 LOADED INTO CORE

IBM wrote a diagnostic to shake down a 13SD disk drive (the drive that is inside the IBM 1130 as well as separately mounted in an enclosure as the 2310). 

The diagnostic identified two flaws in the software simulation of disk drives in the IBM 1130 simulators that run on my laptop. These are the last two tests of the diagnostic, checking for unusual cases either writing a single word sector or writing too many words until the write fails. This will be a great test of my emulation logic in the FPGA. 

Both of these behaviors are produced by the disk controller logic inside the IBM 1130, not by the disk drive. Thus, I have every reason to believe that my logic is going to breeze through those two tests. However, it was worth testing just to be certain that I don't do anything to confuse the disk controller. 

LOADED THE DIMAL CARTRIDGE FILE IN THE V2315CF

I inserted the microSD card into the V2315CF main box, switched the V2315CF interface board to virtual mode, then powered up the 1130 system. Flipping the Load/Unload switch to load resulted in this cartridge being loaded into the virtual disk drive and after 90 seconds becoming ready for testing. 

I used my core memory loader feature to load memory in the 1130 with the contents of the 309 Disk Functional Test diagnostic program. It restarts when you push Prog Start with the machine reset, then setting the console entry switches to 0181 and pushing the interrupt request button on the keyboard restarts and pauses test 309. 

Issuing two more interrupt requests with the console entry switches set to 8000 and then 01C0 fired up the diagnostic, which typed out each test as it completed as well as a summary at the end. 

 DIAGNOSTIC BEGINS AND MOVES INTO ROUTINE 1 - GETTING ERRORS

The diagnostic first seeks backwards and confirms the Home cylinder is detected. It verifies that it receives the four sector numbers 0, 1, 2 and 3 in the correct sequence. It times the speed of the machine and prints out the 1130 cycle time. Lastly it seeks to cylinder 199 and reads sector 3 which contains a specific data word if the cartridge was initialized as a CE testing pack. After this initialization testing is complete, it begins running the test routines starting with routine 1. 

I began to see messages printing out reporting seek errors. Routine 1 starts by seeking to cylinder 0 (Home) and verifying by reading all eight sectors. It then moves around in various patterns, reading to check the achieved location. For each seek error, it flagged the movement (from location X to target Y) along with the cylinder that was actually read. 

My first error was from 199 to 000 but the data read in was from cylinder 14. The second error shows that the software tried to move from cylinder 14 to zero but now found itself at cylinder 12. This continued until I stopped running the diagnostic.

THE V2315CF TRACKS THE CURRENT CYLINDER AND RETURNS DATA FROM THERE

My logic is supposed to monitor every seek command and simulate the achieved position of the disk arm. That also determines which data is returned for a read operation. What I can't tell is what cylinder the 13SD disk arm reached. All we know is that the diagnostics code, the 1130 disk controller logic and the V2315CF were not in sync. 

The diagnostic subroutine that does a seek is given a target cylinder but is tracking the previous location, so that it can request a seek of a given number of cylinders from the 1130 disk controller logic. The controller logic converts that into a number of two cylinder seeks with a final one cylinder seek if the count is an odd number. The 13SD disk drive and the V2315CF should respond to those one and two cylinder seeks by moving the arm that amount. 

The source of the problem could lie in various places. The diagnostic code could have miscalculated the seek distance to request, thus moving both the 13SD arm and the V2315CF simulated arm to the wrong position. The 1130 disk controller logic could have failed to convert the seek count into the proper number of one and two cylinder movements Finally, the V2315CF might have missed movement requests or spuriously detected additional requests. 

The failure appears to be moving smaller distances that was intended - so I don't believe there are spurious movement detections in the V2315CF. It is possible that we are not correctly seeing all of the movements from the 1130 disk controller. 

However, I also have to consider that the 1130 disk controller might have bad logic, as it has not been fully shaken out prior to running this diagnostic. The diagnostic code might have been corrupted on the card decks I used to load it into core memory, thus miscalculating the target movement. 

NEXT STEPS DEBUGGING THIS

I have a few ways I can identify the failing component - diagnostic, disk controller, or V2315CF - in order to dive deeper into debugging. The debug output of the V2315CF should record every seek command detected by my logic - if that shows that, for example, the seek from 14 to 0 really only did a seek of 2 cylinders, then the flaw is more likely to be in the diagnostics or disk controller logic. 

I can try to manually recreate the seek pattern and read a sector with hand entered code. That is, I can first seek to cylinder 199, the back 199 positions and do a read. If I am at cylinder 0 after this, I will become suspicious of corruption to the diagnostic. If I am at cylinder 14, this confirms the issue is in disk controller or V2315CF. Comparing what my seek attempted to the seeks detected on the debug output of the V2315CF will shed more light on the failing.

Signal integrity issues might lead the V2315CF to see a two cylinder movement as a single cylinder movement, or do a few forward seeks when they should be in reverse. That would be immediately obvious from the debug output. 

Ultimately I may need to use a logic analyzer to capture the activity of the disk controller performing a seek of 199 cylinders in reverse, identifying whether it counted correctly and issued 99 two cylinder and a single one cylinder seek in reverse. 

Final parts arrived so I could finish up the Power Distribution Board and second 2310 Interface Board

2310 INTERFACE BOARD WAS WAITING FOR ONE SCREW TERMINAL PART

When that arrived, the second board was completed and is ready to install in another 1130 system. This board has a relay that routes the Unlock lamp signal depending on whether the Virtual 2315 Cartridge Facility (V2315CF) is set to real or virtual mode. When in real mode, the 13SD disk drive in the IBM 1130 controls the Unlock signal, connected directly through to the 1130 console Unlock lamp. In virtual mode, however, the logic in the V2315CF determines when Unlock should light or be extinguished. 

OLD POWER DISTRIBUTION BOARD PCB MODIFIED AND COMPLETED

The Power Distribution Board lacked a 10 screw terminal connector, but also had the defect that shorted the battery power due to a KiCAD quirk. Finally, I didn't like the trace width for the power traces. These two issues are corrected in the new version of the PCB that is currently being fabricated by JLCPCB.com

I cut the grounding copper traces on the power signal that was incorrectly grounded, restoring the intended operation. I also used discrete wire on the bottom to bolster the current carrying capacity of the power traces. This gave me a usable board but I will swap the parts over to the new PCB once it arrives. 


Repair of IBM 1130 core memory with failure of bit 13 inhibit in lower 4K

ROLE OF INHIBIT WIRES

Core memory consists of a memory plane for each bit in a word. X and Y address lines select one core in a 2D grid on that plane. A third wire passes through each core, which IBM sometimes refers to as a Z wire. It is a wire that passes through all the cores on that plane, however, whereas a one X and one Y wire only cross in a single core.

Core memory is read by sending current in one direction (we will call this the Read direction) in one X and one Y wire. That current will try to flip the magnetic field of the core to an orientation we will call 0. The same Z wire that is used for inhibit is also used to sense the value of the core; if the core being addressed as previously magnetized in the 1 orientation, the flip of the magnetic field will produce a pulse in the sense/inhibit wire. Thus, we find out what is in a word by setting its bits to 0. This is called destructive read. 

To write into core memory, we reverse the current direction (now we call this the Write direction) in one X and one Y wire. Where they cross, the core involved will be flipped to the 1 orientation. If we don't want to set the bit to 1, but instead want it to be zero, then we have to inhibit the flip of the core. 

Magnetic cores have a threshold current that must be reached for its orientation to flip (to either 0 or 1 orientation). The current in an X or a Y line is less than the threshold. Thus, all the cores along the X or Y wire are not flipped except for the one that receives current from BOTH X and Y. 

If we send a current in the Read direction through the sense/inhibit wire, the net current in the core is now below the threshold and it doesn't flip. A Read direction current in the sense/inhibit line while we have a Write direction current in X and Y results in the core being set to (actually remaining) the 0 orientation. No inhibit, the core is set to 1 orientation. 

For engineering reasons, the core stack in an IBM 1130 is divided into four sections of 2K bits each. The upper and lower 4K have their own circuitry. Within a 4K section, however, the current loss due to resistance in the sense/inhibit wire during inhibit would be too great if it passed through all 4096 cores. Therefore, one wire runs through 2048 cores and a second wire runs through the other 2048 cores. One end of each of the two wires are connected together (called the common). 

During a read, the sense/inhibit wire ends are connected to a differential detector. When a core flips during a read, the pulse is detected across the two ends, even if the common connection to ground was not high quality.

The use of three wires for sense/inhibit is important to understand because the common point is wired to ground, so that the inhibit current flows through the two wires to ground. The driver for inhibit switches on a transistor to conduct current through a wire to the ground (common), while a mirror driver conducts to send current through the other wire to ground. 

Sense both ends see +6V conducting through the drive transistors, if there is no ground then there will be no current flow. It is only with a good ground at the common point that a current flows through the sense/inhibit wires and can block the flipping of the core. 

The connection point for the sense/inhibit wire triplet is on the top or bottom of the core stack, as shown in this diagram. We are concerned about inhibit for bit 13 in the lower 4K of the core stack. The connections are circled in green. The three wires from bit plane 14 come out on the A side (top of the stack when installed in the machine). These are on the right as viewed from the front of the core stack. 

Along the edges of the PCB at the base of the core stack, the portion closest to the SLT board upon which the stack is mounted, there are small S clips where the sense/inhibit wires are soldered. Counting from the right edge, looking down on the top of the core stack, clips 16, 17 and 18 carry the bit 13 signals from the two senses wires and the common wire, respectively. The wire triplets are white, blue and black - black for the common point. 

THE FLAW WE ARE FIXING

The common connection for bit 13 has much too high a resistance. The fault, as with all the others, is in the PCB sandwich at the base of the core stack, where it routes the signals to connector blocks that project pins through the SLT board for connection to the electronics. This lack of a low resistance path to ground causes the sense/inhibit wires to have essentially no current during a write cycle, allowing the bit to always flip to a 1 state. 

THE FIX

Since the common wire is ultimately connected to ground in the electronics, I can just cut the black wire from the edge S clip, add additional wire length to it, and bring it up to solder onto a ground pin on the SLT board. This will restore the path for current to flow when we inhibit a bit during a write. 

However, the wire broke off the common terminals at the edge of the core plane. I had to solder a new wire onto it, which I then routed to a ground point. The memory is starting to accrete many bodge connections to compensate for failing traces inside the core stack - although not within core planes. 

TESTING CORE SHOWS THE ISSUE IS RESOLVED

I wrote all zeroes across memory and read them back successfully. I also wrote all ones and that was correctly retrieved. This whacks the latest mole, so that until the next one pops up, I can get back to the disk drive functionality diagnostic testing.  I did run some code for a while, with one random parity check popping up. This leads me to suspect that there are more flaky connections in the stack.