Saturday, February 13, 2016

SAC Interface fan-out strategy change, more progress on 1442 function


As I power up and down to test my SAC Interface functions, I have had a few cases where the 1130 is misbehaving until it is power sequenced. One time it stayed in Reset continually. Another time the machine operated as if it were in SMC (single memory cycle) mode even when the main switch was set to Run or SI. I hope this isn't an SLT card getting ready to fail hard. I suppose there could be other explanations such as oxidation on the rotary mode switch contacts, but I worry.


Debugging 1442 reader/punch function

There are a few problems I observed with the test last night, including having the main feed cycle FSM skip ahead before the 80 columns were read with XIO Read instructions. That may be due to an error in the read FSM jumping ahead, since the read FSM signals the main feed one to cause it to advance. Time for diagnostic indicators to track down the failure circumstances.

I discovered the flaw that allowed the feed FSM to jump forward, and corrected it, but am dissatisfied with the read cycle FSM as is is introducing the wrong timed delays between columns of the card. I reworked it a bit until it seemed suitable. Armed with the changed code and new diagnostics, I went out at lunchtime to try this out.

I found that annoying problem where one of my output chips on the interface board needs to be moved slightly to make it work - I think it is intermittent +5V to pin 14 - which jams in bits in the middle of each word. However, my logic did manage to loop for all 80 columns and provide the opportunity for the XIO Read on each one. The contents changed and I saw the index that points at the column in the buffer advancing.

I can't consider this a complete validation of the read cycle as the data pattern couldn't be matched exactly due to the jammed bits. The read cycle completed and my main feed FSM moved on to inform the Python program of the false XIO IR that asks it to fetch the punch buffer contents. At this point, I had the timeout on the USB link. Time to debug the fetch logic and its FSM.

I found the cause for the stall of the routine that fetches the pre-punch buffer contents back up to the Python program. I incidentally found why the DSW had some random bits indicated and cleaned up the cause. I had some time in late afternoon where I could get back out for further testing.

While preparing to test, I took some time to tack solder a jumper on the +5V pin of the errant chip and bring it over to a known good power source. I don't want to be experiencing the jammed bits any more.

Having completed the jumper, the jammed bits went away! I was able to examine the contents of the memory range in core where my program reads in the card image - words x0040 to x008F - and see that while it is mostly correct, somehow the card data from card column 1 is placed in words 00040 and 00041, with column 2 at 0042 and so forth. I will put in diagnostics to determine which of two places causes the off-by-one problem:
  •  I could be storing the data in the pre-read buffer this way
  • I could be handling the XIO Read offset by one
My logic will latch up and show the value of a known bit for column 2 as it is stored in the buffer. Another place will latch up and show the value of that bit when the buffer is read during an XIO Read. Finally a third process will latch up the known bit at the time we write it to core during the XIO processing.

Something is going awry in the machine when I run at full speed, as it is jumping past a wait instruction and running amok until it hits a parity error fetching some corrupted data word. I think I know how to bypass the problem with a tight loop rather than a wait, but this may be a symptom of the erratic failures I am seeing with the 1130 at other times.

After some more study of the logic and testing, I refined the method for reading and writing the buffer memories for pre-read and pre-punch stations. I went out to test in the morning and got all the way through the read to the operation complete interrupt on IL4. The card image was stored in core at locations 0x0040 to 0x008F but still offset by one - meaning column 1 of the card is in locations 0x0040 and 0x0041 then columns 2 to 79 are stored but not column 80.

I successfully saw the message from my Python code that the punch file was not open so we didn't try to write what we fetched from the card buffer. However, one anomaly remained - I received the fake XIO IR twice, which means I have a flaw in my FSM that doesn't recognize the end of a read cycle fast enough to stop posting a second XIO IR function code. This is a case where I am off by one FPGA clock cycle in how I set up the XIO IR function code, but it should be easy to fix.

I worked on some diagnostic LEDs and other information to track down my off-by-one error with the card contents, plus looked over my fpga code for the cause of this and for the timing flaw that emits double XIO IR codes.

The double XIO IR problem should be fixed, as I could see where this could occur and design to block it, but the cause of the off-by-one on card reading was more obscure. I found an unrelated flaw and corrected it, but need to keep on varying diagnostic outputs until I see the problem.

To resolve a design flaw in the Xilinx ISE toolchain, I had to put it in Windows 7 compatibility mode. It slowed to 10% of its previous speed, requiring quite a few minutes to do each round of synthesis to bitstream generation. Painfully slow.

The results of testing showed I had successfully blocked the double XIO IR emission, and that the contents of the second card column in the pre-read buffer were correct, not the off-by-one value I am seeing in core.

All this tells me that the problem I am having involves the logic to process the XIO Read instruction, which latches the value of the data from the pre-read buffer, for use with the XIO Read logic, then bumps the card index. I must be bumping the card index too early, before the XIO Read has completed.

I spotted a possible race hazard - the XIO Read logic turns off busy as soon as the processor enters T6 state, then the busy condition turns off. The core write is not yet complete, so it is possible that changing addresses on the read buffer in the next fpga cycle (20ns) could corrupt what is going into core.

I changed the FSM to wait for 1.15ms (time for a card to move 1 column over in the physical 1442) before I increment the address. I am not sure how this would result in the off-by-one failure I am seeing, which is in the opposite direction of what could happen with an early bump of the buffer index. If this were the problem, I should be card column 2 stored in word 0x0040 and spurious data in word 0x008F, rather than seeing column 1 doubled.

I did see a spot where I might have triggered a second IL0 too early, which might be causing the program to do two reads and bumping the address in the IOCC but the fpga is still working on column 1. This kind of race hazard could produce the symptoms I am seeing. I fixed this as well.

My pre-dinner testing was impacted by the flaky 1130 behavior. The machine would only execute a single memory cycle (one phase of an instruction at a time), but it did show me that at this speed, the data is arriving in the correct memory words, not off by one. If the problem was caused by a race hazard, I wouldn't see the issue in SMC mode but it would occur at full speed Run mode.

The 1130 powered up into perpetual reset earlier today, but a power cycle fixed it. Then later tonight I hit the SMC mode issue above. I fear that I have an intermittently failing card that is going to become a hard failure soon.

I decided to try one last time tonight, hoping the 1130 will come up normally. It did, but I discovered that the off-by-one was back. Further, my logic was no longer triggering the Python program, probably because it wasn't finishing the read cycle at all. Time to finish up for the night.

Change in strategy for medium and high speed links to additional boards for connectivity

I decided to drop the idea of medium speed links to Arduino or equivalent systems and concentrate solely on the high speed error correcting links to other fpga boards which can then fan-out or communicate as they wish.

I will use the first link to accomplish connections to the plotter, paper tape and some other physical devices. Everything should be in place on both ends for the link which just mirrors over a hundred signals in each direction, which means I just have to assign pins on the remote fpga board and do some modest coding to hook this through the link to the 1627 adapter logic.

Implementing physical plotter (1627 equivalent)

I updated the logic to pass the 1627 output signals over the high speed link to my Digilent Nexys2 board, which will handle the fanout to the 1627 interface electronics. I still have to work on the Digilent side to pick up these signals and route them to chosen output pins on the board.

Implementing physical paper tape reader/punch (1134 and 1055 equivalents)

Similarly, I updated the logic to pass the eight output punch channels (for the 1055 punch) out on the link, plus the two drive signals to move the motors on the 1134 and the 1055 units. I look for the eight channels from the 1134 paper tape reader on the input side of the fast link. As above, I still have to change the Digilent side and assign pins to the signals.

One additional task is needed for the paper tape units. I have to design and build the driver and sense electronics to interface the physical boxes to the Digilent board, accommodating the 3.3V logic levels and low current limits of the board.

No comments:

Post a Comment