Thursday, January 16, 2025

Uploader function of Diablo Archiver fully debugged

FINISHED CAREFUL SCRUTINY AND CLEANUP OF UPLOAD MODULE

I made extensive runs with the integrated logic analyzer (ILA), ensuring that the logic all behaved as I wanted. Based on the results, I made a few optimizations and changes to the design, until it looked as good as I wished. 

I verified that it was correctly stepping through all 321 words of the sector, fetching the dummy 322 word which provided feedback on any sync or data word ECC errors encountered, and that it advanced properly through every sector, head, and cylinder to the end of the cartridge. 

Using a log from the Putty terminal program, I captured the output and counted characters as well as lines to cross check that I received all the sectors and status lines. 

POSSIBLE FUTURE VERIFICATION

One step I could take to further verify that all the addressing and RAM operation were correct would be to populate RAM with contents prior to the start of the upload, with each word having its cyl, head, sector and word number as its data value. That way I could examine the output captured by Putty and ensure that we are storing and reading RAM correctly. 

Wednesday, January 15, 2025

Working with integrated logic analyzer to debug Diablo Archiver; found the issue blocking memory controller

MINOR BLIP WITH TOOLCHAIN AGAIN

I had generated an integrated logic analyzer (ILA) with 32 signals to watch. After a first run, I assigned a few more of the 32 and regenerated, but whenever I launched the ILA on the hardware, it showed only the original set of signals. This is yet another of the well known misbehaviors of the toolchain. Fortunately there was a way to force the update.

SPOTTED WHY MY MEMORY INTERFACE WAS SHUTTING DOWN

Since the example runs well but my logic does not, but we use the exact same memory interface and clock wizard settings, I began to look very carefully at any tiny difference. All at once, I spotted the reason my memory was stalling, both in simulation and real life. 

The user can ask the memory interface to accomplish three special tasks - refresh, ZR calibration and self-refresh requests. The designer can ask the memory interface to do a calibration if they believe that errors are occurring which can be minimized by another round of calibration. The designer can also control the memory refresh timing 

Normally the memory interface will generate refresh cycles to the DRAM chips to keep the cells charged, accessing each row and each column regularly enough that the capacitors that form the cell don't discharge. A designer who wants complete control over this can turn off the automatic refresh and instead call the memory interface to do a refresh at specific times. This can position the refresh activity at a time when it won't delay time critical read or write requests. 

These are not normally used by designers, thus I thought I had those requests turned off. However, I though these were active low requests, thus a 0 value would request them. As such, I instantiated the memory interface with constants of binary 1 for these three request lines. I was wrong. Arrrgh. They needed to be set to binary 0. 

I reran a simulation now that the requests are set to 0, and the problem with the memory interface disappeared. I can now debug my logic that calls the memory interface, since it can be trusted to provide memory access. 

WHY UPLOADER NO LONGER RUNS TO COMPLETION

The logic for uploading the disk cartridge contents from RAM writes out the first header, for cylinder 0, head 0, sector 0 then tries to read word 0 of the sector. Nothing is printed and the word address sits at 1 after that point. It was apparently working to completion of the cartridge when it was running at 921,600 baud but after my conversion of the output to 115,200 baud, it stopped. 

I moved the ILA from the dram controller module to the uploader module which is where the failure appears to occur. After wiring all the relevant signals, I generated the bitstream and started up the board. The problem was indeed relative timing. My dram controller module will raise the signal dataready but only for one cycle. If my state machine in the uploader is still busy pumping out the header message, by the time it looks for the dataready flag, it has gone back to 0. 

The solution was to interlock the request for data from dram controller. The caller, uploader module, will keep echoword high until it sees dataready come back. The dram controller won't drop dataready until it sees the echoword request turn off. 

I have a bit of cleanup and optimization to do in the uploader module now that it is basically working. After this, I imagine I will need a working Diablo drive to debug further. 

Tuesday, January 14, 2025

Working with real hardware to test DRAM controller operation

DUE TO SIMULATION ACCURACY DOUBTS, WORKING WITH REAL HARDWARE

I decided to avoid simulation for a while and concentrate on determining if the memory access works on the actual Arty A7 board. This is cumbersome due to the time it takes to generate and load the FPGA bitstream, as well as the contortions required to introduce signals to carry upwards from interior modules to the top level. 

DEBUGGING THE UPLOADER WHICH APPEARS TO STALL TRYING TO READ FIRST SECTOR

The Uploader function is transmitting properly at 115,200 baud now, but it stalls after writing the first header message. I routed a couple of signals to the LEDs to see which step of the state machine it has stopped in. Since the uploader appeared to work properly when it was running with 921,600 baud serial communications, it is possible that something related to the logic writing over the link is the cause, but it equally could be failure of the DRAM controller due to the same issue I am seeing in simulation.

My first run showed that the issue is with the DRAM controller, as the uploader function raised the echoword signal to request a word from RAM but we are stalled waiting for the confirmation. This is consistent with the simulation, showing the memory interface shutting down about 900 us after it initializes. 

I made a few tweaks and tried again, this time displaying LEDs to show if it was the first or second read that had the stall. In the simulation, one read is reported back as complete and the second never gets the data valid signal from the memory interface. This run showed me that we didn't get past the first request. 

GOING BACK TO RUNNING THE EXAMPLE PRODUCED BY THE IP GENERATOR

When I first set up the memory interface, it produced an example that ran a traffic generator into and out of the memory, validating its behavior. I installed that on the board and ran it, believing that it worked properly because of the visual indications. The example used the four LEDs to show when 

  • the clock generator locked onto the right frequency
  • the memory interface completed initial calibration
  • a flash per second as a cue that it was operating
  • a light to indicate that data didn't match when read back
I saw the first two light, the third blink steadily and the fourth stay dark, which I took as evidence that it was performing its writes and reads with no errors. HOWEVER, now that I see the memory interface shutting down while leaving init_calibration_complete up, I had my doubts. 

I installed an integrated logic analyzer into the example program and ran it again. It would show me the status of various signals between the traffic generator and the memory interface, which should be actively changing if it is indeed operational. 

As you can see, it was alternating reads and writes without any errors - it always came back with the data that had been written. Each time I took a snapshot with the integrated logic analyzer (ILA) I had a different pattern of clear activity from the memory. 

Since this was generated with the memory interface details and clock speeds that I chose for my project, it satisfied me that I should have a usable memory. Now to shift back to my own design, but with an ILA included so that I can debug better. 

Continuing to test logic of the Diablo 2315 Archiver - RAM initializing but still battling with Vivado toolchain

INSTANTIATED IN HARDWARE IN ORDER TO SEE CALIBRATION LED LIGHT

As evidence that I have everything working on the DDR3 memory, even though I haven't simulated successfully, I instantiated my entire project and loaded it onto the Arty A7 board. It is set up to light one of the four LEDs when the init_calibration_done signal is asserted, which will validate that all the clocking and reset and other basics are correct. 

The light went on! I have working RAM now. The logic won't move on from that initialization because the design depends on reasonable inputs from an actual Diablo disk drive, which is not yet connected, but I did gain a bit of confidence from this simple test. If I set the switch to the upload direction and push the button, I saw the USB activity LED flashing. 

I connected to it via a terminal program to see what comes out when I push the button. The link is almost reliable but every once in a while the output looks a bit wonky. I will drop the data rate down from the 921,600 baud of the module I borrowed to 460,800 or 230,400 or 115,200 which increases the upload time to as much as 15 1/3 minutes at the slowest speed. 

CHANGED TIMING VALUES FOR THE UART BUT STILL COMING OUT AT 921,600 BAUD

I changed the VHDL for the modules involved but the data continues to stream out at the same old rate. This ultimately was due to the tendency of Vivado to work with cached versions of things. I was able to get it working at 115,200 which allowed me to move forward to check out the upload results. 

BACK TO SIMULATION TO WORK ON DRAM CONTROLLER LOGIC

Grrrr. The FIFO simulations are completely absurd. I have two FIFOs, one with a data count on the write clock side and one with a data count on the read clock side. The write clock count goes up when I have only pulsed the other FIFO and not this one. The second time I pulse the r-FIFO, the rd count does go up, but the first was lost with the spurious w-FIFO write count increment. Most absurdly, a single pulse would not have the FIFO jump from 0 to 2 words. 

Instead of two types of FIFOs, one with read count and one with write count, I created a common type that had both types of counts. My logic only looks at the read count from w-FIFO and the write count from r-FIFO, but both are there and under different signal names. 

It still behaves bizarrely. I dug through the documentation for the FIFO and discovered that for the "accurate count" option I selected, if the FIFO is empty when the first word is pushed in, the counts won't be accurate. Other conditions produce inaccurate counts as well. 

Since my FIFOs are fall-through, where the word pushed in becomes visible to the reading side before a read is issued, I have a different way to check. The valid flag should rise when a valid word is present, due to fall-through. That is what I will use to drive my logic. 

The simulation of FIFOs built on distributed RAM failed to work properly, but when I switched to the native FIFO hardware everything did work as expected. After the simulation, I had all my read and write signals optimized. 

Synthesizing for real hardware showed me that the damned toolkit insisted on using cached versions of the FIFOs even though I had changed parameters when I created them with the IP tool. I had to keep changing names to stop this since no commands would actually force the idiot software to do an actual generation again. Even that was foiled as the toolchain saw the similarity. Eventually I had to delete the cache file manually and restart the toolchain. 

MEMORY INTERFACE SHUTTING DOWN RAM CHIPS DURING SIMULATION

One interesting thing I observed with the earlier simulations was that at some point, the memory interface turned off the RAM chips using signal ddr3_cke which I believed was a response to something being wrong with the memory configuration. However, when testing the write logic I noticed that if I did multiple writes without a long time between them, the ddr3_cke line seemed to stay high. I tested it with more activity and it will still stop about 900 uS after the initial calibration completes. 

To see if this occurs in real hardware, I set up my test on the physical Arty A7 board, wanting to use the ddr3_cke signal to drive one of the LEDs. If it winks out while I am reading data, then there is an issue I have to resolve, otherwise it is just a limitation in the simulation. Unfortunately, this is an output buffer deep inside the memory interface IP and thus I can't read or sense it from within my logic. Nothing in the defined user interface shows me whether this signal went low due to some error state. In fact, there is no signal to indicate an error state. 

I dug through the smallest nits in the messages during the creation as well as searched the paucity of examples for user interface access. Most people who use the MIG are using the memory with a soft processor that they instantiate, thus they go with the more cumbersome and complex AXI interface that the Blaze processor requires. I made changes, changing clocks, as well as switching the power-saving logic that was designed to throttle down power usage while the RAM wasn't being actively used. 

From the detailed simulation capture the last activity I see before the memory interface shuts down is when it is attempting a refresh of the DRAM. The charge on the capacitors in DRAM decays rapidly thus it must be rewritten in the background to retain memory contents. The memory interface is responsible for this, interleaving actual user access to memory with these refreshes. 

However, it was the handling of the first read request that appears defective. Internally the memory is 2Gb organized into 16 bit groups accessed by a row and a column address. Further, the interface is designed to grab eight consecutive words in each access (128 bits) thus we would have eight groups read out for an access. Below is what I see for the first read access:

Following that is a subsequent read access with only receives four groups not consistent with the "Burst Length 8" design of the interface. Note that one of the signals shows invalid values, meaning it is either driven by more than one source or not driven at all, not even hi impedance (Z). 

Something is screwy with the behavior right with the first access, leading to the interface shutting down. This is a lovely world for logic designers using these toolchains, where you have to wade through miles of debugging of other people's work before you can debug an inch of your own work. 

Saturday, January 11, 2025

Finally have the memory controller IP simulating in Vivado - trying to verify my dram controller again

WANDERED TWISTY PASSAGES FOR DOZENS OF HOURS UNTIL SIMULATION WORKED

I could find nothing in the documentation for the memory generator IP nor online that told me what was needed to get simulation generated into the logic of the IP. It showed me the simulation modules for the DDR3 chips and the overall simulation top level logic but as long as the generated IP was built without simulation support it would not work. 

The secret was finally discovered through experimentation. When you use the IP Catalog to generate the IP you want, for example the memory interface logic for DDR3 memory, it does not present any option for simulation. It generates the various simulation modules I mentioned above but has no way to set the IP itself to make use of simulation.

I discovered a Verilog file, one level down from the top Verilog file generated for the MIG IP, which had the parameters for simulation and for fast calibration support. Since these were all produced when I created the IP, I had assumed they would be rebuilt with the same choices but eventually I decided to manually update that second level file.

Next up was the search for a way to regenerate the IP logic without also recreating the second level file. I eventually discovered that resetting the generated products for the IP and then generating anew would retain my modified second level file. Eureka. The IP logic now simulated, toggling the signal lines to the DDR3 chips and the simulation modules appropriately toggling back responses. 

STARTUP TIME ADDS PAIN TO EACH SIMULATION RUN

The memory controller logic interacts with the DDR3 RAM chips performing a calibration, involving many accesses, until it asserts the Init_Calibration_Done signal and begins outputting the 83 MHz user interface clock that is central to my interface between the memory controller IP and my design logic.

This requires simulation for 123 microseconds even with the "Fast" calibration setting for the IP. In wall clock time, it requires more than two minutes before the simulation reaches that point. The complexity of the memory controller IP and other logic in my design imposes about a two minute startup delay before the first femtosecond of simulation time. In all, it is more than four minutes from a click until I see my part of the design begin executing. 

Iterative debugging, while I vary inputs to the design to test corner cases as well as its core functioning, imposes that four minute burden on each cycle. 

I do still have an error in the simulation - the toggling of some of the DDR3 lines by the memory controller IP is running 1000x faster than it should. Not sure whether this impacts the DDR3 simulation module, but I can look at the code to determine whether I need to find the source of this defect and correct it. 

The unexpected good news is that the clock wizard IP was also working properly, not requiring my simulated version, yet was properly producing the 200, 167 and 100 MHz clocks. I don't know why this would have changed but I welcome it. 

CLEANED UP RESET TIMING OF VARIOUS PARTS OF MY DESIGN

The memory interface logic was dropping the user interface reset (ui_reset) way before the init_calibration_done was raised. I needed to hold all my logic until the memory was ready for use, so I had to combine the two signals so that reset for my design lasted until calibration was done as well as while user interface reset was active. 

ADJUSTED THE PARAMETERS OF THE MEMORY INTERFACE 

I have the memory interface logic operating with a clock of around 325MHz, the user interface to the memory interface operating at 81.2MHz (4:1 ratio of the interface), and my general logic operating at 100MHz. This produced the proper timing of signals to the DDR3 chips as well as proper behavior when I drove the interface. 

OBSERVING MY LOGIC READING RAM AND WRITING RAM 

The testbench I had set up when I simulated my dram controller module using my own simulated version of the memory interface was used to drive this new simulation. It set up addresses for cylinder, head, sector and word before requesting to store away a word we had extracted from the Diablo drive or to fetch a previously written word from RAM when we were uploading the archived cartridge at the end of the run. 

The logic simulated well using my homebrew versions of the memory interface but I needed to see this work properly with the actual IP. First up was watching a request for a memory read, which I scrutinized on the simulation output. Second was a write request, which was subjected to the same level of diligence. 

Initially I saw my design completing only one read from RAM. Digging in I found that the memory interface was dropping the clock enable to the memory chips, disabling them from accepting my following read requests. I never did figure out why the simulation of the memory controller was complaining and shutting down during the first read process. 

Another issue became apparent. My design depends on the far side of a FIFO seeing the empty flag turn off, indicating that data has arrived from the near side. During the simulation, I saw the write enable turn on to push something into the near side but the empty flag never turned off. Upon closer inspection of the FIFO wizard, the empty flag seems to stay on with up to four words pushed in, which makes my design fail. 

To fix this, my logic looks at the count of data in the FIFO instead of the empty flag. When it is non-zero, I kick off the rd_en signal to pull out the word from the FIFO. Once the FIFO interaction with my main logic is working, according to the simulation, I can move forward.


Sunday, January 5, 2025

Setting up new display stations for Cape Canaveral Space Force Museum

CCSFM DISPLAYS HAD GRADUALLY FAILED

I volunteer with the Cape Canaveral Space Force Museum, which has multiple sites both inside the secure area of the Space Force Station and one outside the perimeter - Sands History Center. Videos would play at spots around the wall of the history center detailing activities from particular launch complexes, but the devices gradually degraded based on cumulative power failures that corrupted their data.

DEVELOPED PLAN FOR NEW DISPLAY STATIONS

We came up with a design for a new display station that would employ Raspberry Pi computers, HDMI based monitors, the open source Pi Presents software and modifications I came up with to make these bulletproof. We have a range of volunteers with varying skills, plus random power outages, requiring that these new systems will work automatically and are impervious to handling errors. 

I developed a printed circuit board that will plug onto the top of the Raspberry Pi (RPi), termed a 'hat' when attached to an RPi. It implements a battery, real time clock and a terminal strip to attach up to five optional switches. Pushbuttons or other sensors can be used in the future to allow visitors to choose content or to activate upon motion nearby. 

The Linux system for the RPi can be set up to use an overlay file system, making the microSD card (that holds the operating system and application software) read-only. No more corruption if the RPi is not shut down properly. Pi Presents is configured to run automatically at boot up. The software runs presentations based on scheduled times we configure, thus the need for the battery protected clock. The RPi has wifi allowing us to connect to the various displays remotely to kick off special events. 

The content is stored on a USB thumb drive, tailored to each station. I set up the RPi software to detect when the drive is plugged in or removed. Pi Presents is restarted for those events and we run a default show that indicates the lack of content if the thumb drive is removed. 

I used an earlier prototype of my board to test out the display station, one that had additional circuitry that is unneeded. Before I decided to use the overlay file system, I was going to use supercapacitors to provide power for an automatic graceful shutdown of the RPi Linux when input power was lost. It had charging circuits as well as the power fail detection and triggers into the RPi to run the shutdown command. The new PCBs are here and the remaining components for all the stations will arrive later this week. I expect we can have them all operational within a week. 

Monday, December 30, 2024

Dram Controller module verified by simulation. Memory controller not working on real hardware

MANY ASPECTS HAD TO BE REVIEWED IN SIMULATION

The user interface of the DDR3 memory controller can be busy with activities such as refreshing memory locations, thus it has some ready signals that the caller must obey. I had to validate the correct behavior of my logic with varying lengths of delay as well as no delay in the ready signals. 

Writing to RAM involves two separate actions - loading the data word and address is one, requesting the actual write is another. Each of those actions has its own ready signal - app_wdf_rdy and app_rdy - with the need to build in permutations of the two having delays or not. 

Even the address used for the RAM write has a complication. It is built from the current cylinder address, current head, current sector address and current word address within the sector. However, the sector address as used by the IBM 1130 disk controller and as modeled in my design shows the address of the sector coming up. Thus, once a sector begins (due to a Sector pulse), the sector address will advance to indicate the upcoming next sector. 

While we are reading the sectors into RAM, we have to use the sector appropriate to the data words, not the next sector to arrive. Thus, when the design sees a read request, it latches in the sector address as the Sector pulse arrives and maintains that regardless of the changing sector address afterwards. This saved sector address is what is used to write the data into RAM.

The uploader module that reads the RAM back and writes it out the serial port generates its own sector address, but this is the true one for the sector we want to read and upload. Thus, the dram controller module selects whether to use the saved or generated sector address based on write versus read activity. 

I was able to watch every signal flowing to the two FIFOs, to and from the user interface of the memory generator, and the behavior of the dram controller as I set up various cylinder/head/sector/word addresses and requested writes or reads. Once everything worked as it should, I concluded that this module is finished, subject only to actual hardware level debugging when we are connected to the Diablo disk drive. 

POSSIBLE ALL UP SIMULATION BUT SHOWSTOPPERS OF MEMORY AND CLOCK IP 

I could build a testbench to simulate the Diablo drive and respond appropriately. It would take a good deal of work to produce decent output, using external text files for the data words needed to simulate the bit stream coming from the disk head. If the memory controller and clock modules would simulate correctly, I would have attempted this. However, in their current nonworking state my design couldn't even start working as it will be waiting forever for the initial calibration of memory to complete. 

SIMPLE HARDWARE FIRST LEVEL TEST

I will instead build a version of this with one LED indicating successful initial calibration. It should at least come up with those two lit, on real hardware, before waiting for me to push a button to start the archiving. Once the button is pushed the design will stall as it is waiting for the Diablo drive to respond. 

HARDWARE STARTS BUT MEMORY DOES NOT COMPLETE INITIAL CALIBRATION

My design comes up on the board, as I can see from a signal that reflects the state of a slide switch on led2. However, led1 with the state of the calibration remains dark, just as it does in simulation. The overall reset signal for the design is released once calibration completes, so we have a mostly inert board. 

I began routing different signals to the leds in the hope that I could at least narrow down where things are going wrong, although the memory controller is a prime suspect. In addition it was time to pour over the detailed messages from Vivado to see if I could spot any issue.