Thursday, December 8, 2022

Can't trust the DRAM memory interface - considering radical restructuring to use on chip static ram

STILL DON'T HAVE CAUSE OF ERRATIC BEHAVIOR NAILED DOWN, BUT . . .

I can instrument internal logic analyzer cores for a relatively small number of signals at a time and can only record 8K cycles on the core. Secondarily, each clock domain requires a separate analyzer core and they aren't easy to trigger 'simultaneously'. Third, signals in the SPI link domain can't be traced by an analyzer core because it needs a constant rather than intermittent clock signal.

Thus each time I have a new suspicion I have to resynthesize, set up the testbed and then capture only a small sample in time. The failure is so likely that I can't get through a single unload of 321 words, but not deterministic thus it fails on different words and perhaps in different ways on each test. If it always failed on a given word of the sector I could set better triggering for the analyzer cores.

Grossly, however, I seem to have stalling of the state machines and only return the last good value for all the subsequent transactions. My current suspicion is that it is triggered by a refresh cycle of the DRAM at exactly the worst moment. 

The delay could be 32 clock cycles, which when added to the delays transiting through FIFOs across clock domains, can add up to the major SPI link state machine having moved beyond the point where it needed the RAM data. I don't have proof that this is happening, although I will likely continue to construct tests where I might observe the smoking gun.

What I do know, however, is that were I to have a memory with a known and consistent access time that fits inside the state machine steps for the SPI link, I could have a reliable upload. Thus, if I can't find and fix the cause of erratic behavior, I might shift to a deterministic and reliable method to avoid said erratic conditions.

STATIC RAM ON FPGA CHIP CAN OFFER DETERMINISTIC READ AND WRITE

FPGA chips have static ram available onboard. It comes as both block RAM and distributed RAM. The block ram are sections of SRAM that are embedded in the chip and available to the designer. The look up tables and flip flops that are usually employed to create logic circuits can also configured as SRAM, and these are distributed among the LUTS of the FPGA. 

Block RAM use has essentially zero impact on the amount of logic that can be instantiated on the FPGA chip, since it is distinct areas of the chip that are not involved in generalized logic. Each chip has a fixed capacity of block RAM - in the case of the board I am using, 1,658,880 bits that is organized in words of up to 18 bits wide. 

Distributed RAM, on the other hand, takes up LUTS that otherwise would be available to form logic circuits. The more memory you instantiate, the less logic you can create. There are only 3, 650 LUTs, the basic building block of an FPGA, for my chip. Each LUT used as distributed ram instantiates 16 bits. An entire cartridge would require 521,304 LUTs and even a single cylinder would consume a large fraction of the available LUT capacity. 

SIZE CHALLENGE AS ENTIRE CARTRIDGE IMAGE CAN'T FIT ON THIS FPGA CHIP

One cylinder of the 2315 disk has eight sectors of 321 words, each 16 bits, thus it takes only 41,088 bits to hold that cylinder. The problem is when you look at the entire cartridge, all 203 cylinders of it, which would take five times the capacity of the block RAM to hold in its entirety. Distributed RAM provides little additional capacity. 

The erratic DDR3 DRAM, on the other hand, is 256 MB, far more than it needed for a cartridge. This is the reason I selected the DRAM initially to hold the cartridge image while the virtual drive was operating. 

CONSIDERING USING BLOCK RAM FOR ONE CYLINDER AT A TIME, DRAM FOR REST

If I have an entire cylinder in the block RAM, then the Unload transaction up to the Arduino will be deterministic and reliable. The disk drive controller reading and writing through the head electronics would also be satisfied easily and reliably from this cylinder buffer. 

When moving to a different cylinder, the current contents (potentially updated if writes from the CPU have take place) would be written to the DRAM and then the contents of that new cylinder would be read from DRAM and written to the block RAM.

SOME CHALLENGES TO CONSIDER WITH THE DUAL MEMORY APPROACH

The time it would take to dump 321 words from block RAM to DRAM, then load new block RAM contents from DRAM, may take longer than the time a real disk drive would take to perform a single cylinder seek. The minimum seek time is 15 milliseconds, a relative eternity to the FPGA operating at with 10 or 20 ns cycles, which provides about 2,336 cycles per word to do both a read and a write. 

The consequence of not meeting that timing would be that my virtual 2315 will no longer be timing accurate on short seeks. Even worse, the drive controller signals that the access is complete via a single shot timer, not some signal from the drive, thus the CPU will be justified to begin reading or writing before our slower dump/restore has completed. 

Another issue results from the current SPI link protocol, where the Arduino specifies the particular sector (including cylinder) where it wants to load or upload as part of each transaction. Thus, it might be commanded to seek to a new cylinder as part of the first two words of the transaction, but expect to receive words almost instantly on word 3 which is far too quick for the swap to occur.

It is conceivable that I could implement a reverse feedback signal to the Arduino that would hold it in mid word of a transaction until the swap completed. This is the major problem because I don't really have a constraint with the timing issues raised at the start of this section.

THIS CONCEPT ON BEING EXPANDED

It appears I can keep up with the disk drive seeks rather easily, so my only issue is in holding off the Arduino Unload or Load transactions. I am looking at various ways to handle this elegantly.

No comments:

Post a Comment