Sunday, February 7, 2016

Fought space limit in FPGA and almost back to testing 1442

SAC INTERFACE FOR ADDING PERIPHERALS TO THE 1130

The new FPGA board is due to arrive Monday afternoon which is pretty good for a shipment from Germany for an item I just ordered and from a small operator, Ztex, who does not have a big warehouse and shipping department.

I began commenting out large chunks of my logic (1053 code, 1132 code, 1403 code) but the number of LUTs seems to be remaining at the same level. Something seems to be wrong. I might have another Xilinx toolchain failure that is falsely reporting the out of space problem. I will hack some more out (well, comment out everything) and see if this is a problem caused by ISE.

I commented out the two SPI links, all the CRC checking and still am above 100% on LUTs. Something is wrong, but I don't yet know which of three possibilities is the cause. It could be some code I added to the 1442 support which somehow implied a huge number of LUTs, it could be that my logic legitimately needs more than 10,000 LUTs and had been under the limit because of signal trimming when portions weren't completed yet,  or it is another of the bugs that crop up in ISE which cause bizarre behavior

If the problem is an ISE defect, this will go away since I have to upgrade to Xilinx Vivado to make use of the Artix chip. If the problem is that I legitimately need that many LUTs, then my much larger chip on the new board will give me the breathing room I need. The only case that is bad is the one where I coded something that implies a huge LUT count, but I am not noticing the error I made.

I think the only way to get to the bottom of this is to comment out the 1442 code and see what happens. If there is not a huge drop in LUT count, then I will comment out some of the latest changes to the transactional engine. Hopefully something will drive a major drop in LUT count.

When I commented out one process that sets a word from the card buffer, based on the index of the current column being processed, the size plummeted to 18% of LUTs used. I uncommented the process and did another synthesis just to confirm whether this was the problem. Indeed, the simple looking code drove me up to 109% of LUT capacity from 18% without it.

It appears that this is strictly due to the logic to mux 16 bits per column times 80 columns down to the single 16 bits needed for the column read. Since I am light on use of distributed and block RAM, I will convert the buffers into a memory in order to free up LUTs. The challenge here is that the code to load the pre-read buffer and dump the pre-punch buffer, which had been implemented to dump 10 columns in a single cycle, will now take 20-30 cycles instead of one. I may need to adjust the short packet timeout to accommodate this extra delay.

With this change, I can remove ALL the commented sections and get back to testing the 1442 function. I was able to get all the way to bit generation with no space issue but there was an issue with how I used the block ram that was only flagged during bit generation. The software inferred a 9K block RAM component for each buffer, but has a flaw that fails to initialize the contents. Xilinx flaw with 9K ramI don't need any initial contents but this stops the creation of my bit file. Most significantly, only 37% of LUTs and 3% of block RAM required so my change rebalanced the chip usage nicely.

Implementing and testing 1442 reader/punch

I converted the two 80 card column buffers from LUTs to block RAM, since I had plenty of the latter but was short on the former. The dump and load of the buffer becomes a bit more complicated as I had to watch where the various control signals are driven and have a process to step through ten sequential locations in the buffer.

Once I resolve the Xilinx ISE issue with 9K block RAM initialization, I should have a good bitstream to load into the fpga. I think the solution is to use the block RAM generator rather than having the ISE recognize my VHDL and convert it to the component.

No comments:

Post a Comment