Tuesday, April 23, 2024

Spent a few days working on a new memory loader for 1130 systems where Arduino uses DMA to core

EXISTING CONSOLE LOADER IS JUST TOO SLOW

The console loader device that I created and have installed on several 1130 systems simply automates the process of flipping the switches and pushing the buttons to enter each word into memory. Due to the speed limitations of the pushbutton debouncers in the 1130, the loader can take almost an hour to enter 4K words of memory. 

Since most 1130 systems are 8K or larger, some as big as 32K in size, loading core with this device is painfully slow. Yet, museums that have an 1130 without all the peripherals needed to run DMS2 have to use the loader in order to run demonstrations. 

1130 CYCLE STEAL IS ANOTHER NAME FOR DMA - DIRECT MEMORY ACCESS

The 1130 system has a capability that IBM calls cycle stealing, which is used by faster peripherals to access memory directly. It is very similar in concept to direct memory access (DMA) for microprocessor based systems. Instead of requiring the program in the CPU to read or write each word explicitly, the device is given a start address and amount of data to transfer, which it then does without further CPU involvement. 

The 1130 is based on core memory accesses, which take 8 clock steps to complete, the first half reading the contents of memory by erasing it and the second four clock steps used to rewrite (or change) the value in the memory location. Each instruction requires one or more memory access cycles to be fetched from memory, the effective address computed, and then up to three more memory access cycles to execute that instruction. 

Cycle stealing will pause the CPU between memory accesses, even in the middle of fetching or executing an instruction, so that the peripheral device can do a memory access. CPU memory accesses use the T clock, stepping through stages T0 to T7 for each memory access. Cycle steal holds the T clock at T7 and instead steps the X clock through its stages X0 to X7. The program has no way of detecting that it has been delayed by these interspersed memory access cycles, other than an elongation of the time to run the program compared to running with no cycle steal occurring. 

The disk drive is an example of a peripheral that is fast enough to use cycle steal, so that an entire sector of 321 words can be read by issuing a single (XIO) instruction. The disk controller than requests cycle stealing as each word is read from the disk or written the disk, until the entire sector is complete. 

IDENTIFIED SIGNALS I SHOULD BE ABLE TO USE TO TRIGGER A CYCLE STEAL

I looked over the 1130 logic diagrams and found a method that I can use to trigger a cycle steal. To simplify the design, I won't do this while instructions are executing, instead requiring that the 1130 be stopped before the cycle steal will write a word to memory. 

A cycle steal occurs in 3.6 microseconds (or 2.2 on the faster models of the 1130), so that the entire 8K of a typical machine could be accessed in under .03 seconds. Even with overhead of transmitting each word from a terminal over a 9600 baud serial link, this will complete in a satisfyingly short time. 

BUILT STATE MACHINE TO INTERACT WITH ARDUINO AND THE 1130 SYSTEM

I devised a state machine that will step the 1130 through a cycle steal as well as interact with an Arduino that is accepting the data file over the serial link from a terminal. I chose to match the current file format used by the very slow console loader, as that still offers many benefits such as interoperability with 1130 simulator environments. 

Taking a command line from the terminal, the Arduino emits the address and data value to be stored in memory, then raises a request line. If the CPU is in the stopped state, I raise the request for a cycle steal on the highest priority channel, zero. I pass the address and data values to the 1130. When the cycle steal takes place and we reach stage X6 of the access, the request is dropped as well as the address and data lines. The state machine emits a flag that our access is done, at which point we wait for the Arduino to drop the request line to finish the process. This is then repeated for each word to be transferred to core memory. 

CURRENT DESIGN USES SURFACE MOUNT ICS ON AN ARDUINO MEGA 2560 SHIELD

I have the design ready in KiCAD but before I purchase the components and send the files out to have the PCB manufactured, I will do some testing to confirm that this will work as expected. It will be about a week or two from when I kick off the board manufacturing until I have the final unit assembled. 

The Arduino with its shield will be mounted inside the 1130 near the B logic gate. I will use wire wrap to hook it into the signal lines in the 1130. A few mechanical mounting details remain to design, but I do know all the pins I need to connect with in order to make the process work. 

SIMULATING THE LOGIC FIRST USING CIRCUITLAB

I have set up the logic for the device in Circuitlab, an online simulator site that I use. I am verifying that the state machine and the output logic works as I intend. Once it appears solid in the simulation, I will move on to testing with real chips and the 1130. 

NEXT STEP WILL BE BREADBOARD TESTING BEFORE PRODUCING THE PCB

I can use my breadboard and related tools to build the circuit with full size DIP chips, first putting it through its paces with the tools and monitoring it with logic analyzers and a scope. When that is satisfactory, I will temporarily connect it to an 1130 and test whether it does write to memory properly. 

1 comment: