Wednesday, July 13, 2016

Seek and read of virtual 2310 working, small issue with seek op complete, plus other progress

SAC INTERFACE FOR ADDING PERIPHERALS TO THE 1130

Restructuring the GUI

-- virtual 2310 disk drive --

First thing this morning, I went out, entered a hand loop to exercise my disk drive and tested to see how it worked, and found it properly reading a sector into storage and producing valid sense afterwards. I also checked a seek program to shuttle out and in the max distance and saw the carriage home bit go on and off, indicating that it moved away from cyl 0 and back.

My display status showed cylinder 0, even when I ran the code that seeked out and back. That is understandable because my Python code does not see seek instructions, only IR and IW commands, so it won't know that the cylinder has changed.

When I poll for the IR or IW after a seek, one of the fields returned is the cylinder and sector, which is then used to pick the right part of the virtual cartridge file for reading/writing. I know this worked okay for at least the sector value, because the DCIP utility begins with a test IO to cylinder 0 sector 1 and I saw the sector number come up.

I don't, however, believe the cylinder number is showing up properly. It was working when I began restructuring my code, so I expected this was a relatively small problem to fix. Indeed, I saw where my conversion from multiple transactions to get device status to a single poll transaction had omitted to carry over the cylinder number in the change.

It appears that my problem is some flaw in my seek logic, because the fpga shows the device remains busy and no operation complete interrupt is requested. I need to look into this more deeply, which will require some new instrumentation.

The first think I looked at was whether the seek amount was not picked up properly. Next, I will see whether the disk seek modeling ever completes. However, when I single stepped through the execution I discovered that my XIO processing in the 1131 itself is not working properly.

The XIO Control I issued should pick up the low half of the IOCC in the E1 execution cycle, determine it is an XIO Control for device 20 (my virtual disk drive), and then pick up the data value for the seek in cycle E2.

Unfortunately, it advanced to an execution cycle E3, which should only take place for XIO Read or XIO Write instructions. Something is wrong in the execution control logic. Time to run some general CPU diagnostics and see if this is the only flaw or if other instructions are misbehaving.

I picked out some points to observe during operation of an XIO instruction, to verify that the machine is decoding the type of XIO instruction properly. There is a signal (XIO E R/W) that is generated in a card on page DN201 and drives the setting of the E3 flipflop on page KD111. It should not be on unless the instruction was a read or write. The decoding of the type of XIO happens on DU101 page.

I listed the pins to sample with a scope to check what DU101, DN201 and KD111 think are happening. DN201 and DU101 are using the U register bits to make these decisions - U is a saved copy of the A register, often used to allow the A register to do arithmetic of various types during set clock cycles of an instruction.

When I powered up the machine, it was no longer going into E3 for XIO Control - I think the fact that my air conditioning had brought the garage down to a datacenter like temperatures, compared to the somewhat hot temps when I started, made the difference.

I now found that my seek amount is being properly captured by the XIO Control to the disk device, but it is not triggering the interrupt nor coming out of busy. Time to debug the seek simulation logic.

I saw the seek timing simulation FSM walk through its states and finish on 'done', but its trigger for the op20state FSM didn't fire it off. I recognized that I had once again used poor design practices for triggers synchronizing various FSMs, which happens when my brain thinks of the code as programming, not hardware.

The dseek FSM steps through various countdowns to model the time delay of a seek, finishing on state dseekdone for one cycle before moving to the idle state. During the dseekdone cycle, I emitted a trigger dseektrigger that should start the op20state FSM moving forward.

What actually happens is that at the clock edge when the FSM moves into dseekdone stage, it begins emitting the signal dseektrigger and stops sending it out at the next clock edge when it moves into the idle state. The op20state FSM is watching for dseektrigger in order to switch from its idle state to its 'on' state. However, dseektrigger is changing, either going on or going off, at the clock edge which is also when the op20state FSM is deciding whether to move to 'on' or stay in 'idle.

Having signals changing too close to the clock is a classic error in hardware design. You want to have a setup time before the clock edge to let the signal become stable, and a hold time after the clock edge to keep things stable around the time that op20state is deciding on its next state.

I need to turn on these triggers and leave them on across a clock edge - either by coding two states that are sequentially reached, both of which emitting the trigger, or using separating combinatorial logic to introduce a delay after the clock edge, or by interlocking. An interlock would hold the dseek FSM at the dseekdone state until the op20state has reached the 'on' state. I chose the interlock method.

These kinds of problems are pernicious because they can work sometimes, but when a new load restructures the placement of the logic, they fail. All depends on routing delays and placement, which are out of your hands.

My testing shows that the seek is occurring properly and not hanging up. The one flaw - it should be triggering an operation complete interrupt on level 2, but nothing happens. I restructured the code in a way that should be more solid but for some reason that I must be too tired to see tonight, I just can't trigger the interrupt request for seeks. The seek works - my new cylinder tracking value is correct, and the busy condition goes away, but my operation complete is just not switching on.

-- mirror console entry switches --

My mirror console entry function does not see the proper bits. Worse, it exits its polling loop after it sees the first read. The latter problem is in the GUI but I suspect the first part is FPGA based. I fixed the loop termination, but not sure yet whether I did anything for the problem capturing the data word.

The data is still not being captured by the mirror device adapter, although the bit switches complete their read into memory. This will take some more diagnostic instrumentation, but since it isn't a high
priority, it will go on the back burner for now. It may in fact be a problem with the XIO instruction processing, detected during the debugging of the virtual disk drive.

With the machine working well, I still didn't capture the proper values for the console bit switches while executing an XIO Read to area 7. I will do some quick checks to see what should be happening and cross check with my mirror read logic, while I am testing with new diagnostic LEDS to troubleshoot the disk seek hangup.

My mirror read module was latching the contents of the B register at the T4 clock of XIO E3 stage, which could be a trifle early if the device gating its data into memory is a bit slow. I changed the module to latch on T6 of E3, when the value should be stable.

Still nothing showing up from the mirror read module. Not sure what is going wrong, since my mirror read code used to work. One possibility is that the high priced data recovery service may have found old stale files on the hard disk and not current versions that work properly. If so, I have to start from scratch working on that function.

SOURCE OF SPORADIC POWER ON ISSUES

About every ten or fifteen times I power on the 1130 system, it remains in Reset state and has to be power cycled to restore operation. I did some tracing of the way that power up reset is applied and removed, determining that the agent which drives the reset is relay 3 in the power sequencing logic in the power supplies. It should pick as a result of +48V supply coming up, which drops the reset.

I will have to trace the state of the relay 3 when next I experience the permanent reset - it is either sticky contacts on the relay or a failure of the 48V supply to come online. I had been worried that it might be intermittent SLT logic but this will be easily fixable if it becomes more persistent. 

No comments:

Post a Comment