Friday, December 9, 2022

Found the source of the erratic behavior and improved it

CAREFULLY INSTRUMENTED ANALYZER SETUP

Armed with an error latch that was set on when the byte transfer from the Arduino ended without having completed the read from the memory interface, plus a shadow of that latch properly transitioned across clock domains to the signals driven by the memory interface (ramclock is generated), I set up all the signals of the user interface with the memory module. 

FOUND THE CASE THAT PRODUCED THE STALL

When the error latch went on and the shadow latch appeared a couple of cycles later, I immediately saw the condition that would clearly and obviously cause a stall. The user interface required that the app_en signal remain asserted while app_rdy is false otherwise the request would be lost. It must remain asserted until the clock cycle when both the enable signal and the ready signal are true. At that point it is accepted and we can drop the enable. 

I saw that the app_rdy signal dropped to false, during a refresh cycle for the DRAM, at the same cycle when my logic asserted app_en. I then dropped app_en although the ready signal was still false. It appeared that if the ready signal was already false a cycle before I was to assert enable, I would handle it correctly and of course if the interface was ready when I was asserting it worked properly. The failure occurs if ready just happens to drop just as I prepare to assert enable. 

This is very timing dependent and indeterministic to my logic, as the times when the memory interface would pursue refresh cycles was buried inaccessibly in the interface IP. Things had to align just right (or just wrong) to fail, something that happened often enough to fail during a 321 word transaction of reads but not so often that every read would fail. Exactly the kind of erratic situation I knew was the cause of the bad behavior. 

DEFECT IN MY STATE MACHINE HANDLING OF MEMORY INTERFACE BUSY

My state machine for reading and writing memory would check the ready signal and if it were true, advance to a next state which asserted the enable signal for one cycle. This worked fine if ready were true during the whole process, and worked properly if ready was false during the testing, but if ready were to drop to false at the start of the next state, when I was raising enable, things would go wrong. 

This is due to the change times of the various signals. Inputs to the state machine such as app_rdy which determined the next state to enter at the clock edge might change at that clock edge just as we moved to the next state. 

I should have tested the ready signal in the same cycle where I was asserting the enable signal. This was a stylistic approach which led to the error in handling the case where ready drops just as I asserted enable. 

CORRECTION WAS EASY

I made changes to the state machine, such that it didn't raise the app_en (or app_wdf_en for writing) until a certain state where it checked to see if it needed to keep the enable high due to a false ready, or it could drop enable and move on to further states to complete the read or write operation. 

QUICK TEST SHOWS THE STALL CONDITION HAS DISAPPEARED

I reran my testing and the error latch never went on. Further, the state machine didn't stall and returned to idle once the unload transaction was complete. Looking at the data returned, however, shows that this is still not operating correctly.

Thursday, December 8, 2022

Can't trust the DRAM memory interface - considering radical restructuring to use on chip static ram

STILL DON'T HAVE CAUSE OF ERRATIC BEHAVIOR NAILED DOWN, BUT . . .

I can instrument internal logic analyzer cores for a relatively small number of signals at a time and can only record 8K cycles on the core. Secondarily, each clock domain requires a separate analyzer core and they aren't easy to trigger 'simultaneously'. Third, signals in the SPI link domain can't be traced by an analyzer core because it needs a constant rather than intermittent clock signal.

Thus each time I have a new suspicion I have to resynthesize, set up the testbed and then capture only a small sample in time. The failure is so likely that I can't get through a single unload of 321 words, but not deterministic thus it fails on different words and perhaps in different ways on each test. If it always failed on a given word of the sector I could set better triggering for the analyzer cores.

Grossly, however, I seem to have stalling of the state machines and only return the last good value for all the subsequent transactions. My current suspicion is that it is triggered by a refresh cycle of the DRAM at exactly the worst moment. 

The delay could be 32 clock cycles, which when added to the delays transiting through FIFOs across clock domains, can add up to the major SPI link state machine having moved beyond the point where it needed the RAM data. I don't have proof that this is happening, although I will likely continue to construct tests where I might observe the smoking gun.

What I do know, however, is that were I to have a memory with a known and consistent access time that fits inside the state machine steps for the SPI link, I could have a reliable upload. Thus, if I can't find and fix the cause of erratic behavior, I might shift to a deterministic and reliable method to avoid said erratic conditions.

STATIC RAM ON FPGA CHIP CAN OFFER DETERMINISTIC READ AND WRITE

FPGA chips have static ram available onboard. It comes as both block RAM and distributed RAM. The block ram are sections of SRAM that are embedded in the chip and available to the designer. The look up tables and flip flops that are usually employed to create logic circuits can also configured as SRAM, and these are distributed among the LUTS of the FPGA. 

Block RAM use has essentially zero impact on the amount of logic that can be instantiated on the FPGA chip, since it is distinct areas of the chip that are not involved in generalized logic. Each chip has a fixed capacity of block RAM - in the case of the board I am using, 1,658,880 bits that is organized in words of up to 18 bits wide. 

Distributed RAM, on the other hand, takes up LUTS that otherwise would be available to form logic circuits. The more memory you instantiate, the less logic you can create. There are only 3, 650 LUTs, the basic building block of an FPGA, for my chip. Each LUT used as distributed ram instantiates 16 bits. An entire cartridge would require 521,304 LUTs and even a single cylinder would consume a large fraction of the available LUT capacity. 

SIZE CHALLENGE AS ENTIRE CARTRIDGE IMAGE CAN'T FIT ON THIS FPGA CHIP

One cylinder of the 2315 disk has eight sectors of 321 words, each 16 bits, thus it takes only 41,088 bits to hold that cylinder. The problem is when you look at the entire cartridge, all 203 cylinders of it, which would take five times the capacity of the block RAM to hold in its entirety. Distributed RAM provides little additional capacity. 

The erratic DDR3 DRAM, on the other hand, is 256 MB, far more than it needed for a cartridge. This is the reason I selected the DRAM initially to hold the cartridge image while the virtual drive was operating. 

CONSIDERING USING BLOCK RAM FOR ONE CYLINDER AT A TIME, DRAM FOR REST

If I have an entire cylinder in the block RAM, then the Unload transaction up to the Arduino will be deterministic and reliable. The disk drive controller reading and writing through the head electronics would also be satisfied easily and reliably from this cylinder buffer. 

When moving to a different cylinder, the current contents (potentially updated if writes from the CPU have take place) would be written to the DRAM and then the contents of that new cylinder would be read from DRAM and written to the block RAM.

SOME CHALLENGES TO CONSIDER WITH THE DUAL MEMORY APPROACH

The time it would take to dump 321 words from block RAM to DRAM, then load new block RAM contents from DRAM, may take longer than the time a real disk drive would take to perform a single cylinder seek. The minimum seek time is 15 milliseconds, a relative eternity to the FPGA operating at with 10 or 20 ns cycles, which provides about 2,336 cycles per word to do both a read and a write. 

The consequence of not meeting that timing would be that my virtual 2315 will no longer be timing accurate on short seeks. Even worse, the drive controller signals that the access is complete via a single shot timer, not some signal from the drive, thus the CPU will be justified to begin reading or writing before our slower dump/restore has completed. 

Another issue results from the current SPI link protocol, where the Arduino specifies the particular sector (including cylinder) where it wants to load or upload as part of each transaction. Thus, it might be commanded to seek to a new cylinder as part of the first two words of the transaction, but expect to receive words almost instantly on word 3 which is far too quick for the swap to occur.

It is conceivable that I could implement a reverse feedback signal to the Arduino that would hold it in mid word of a transaction until the swap completed. This is the major problem because I don't really have a constraint with the timing issues raised at the start of this section.

THIS CONCEPT ON BEING EXPANDED

It appears I can keep up with the disk drive seeks rather easily, so my only issue is in holding off the Arduino Unload or Load transactions. I am looking at various ways to handle this elegantly.

Sunday, November 20, 2022

Update on debugging the SPI link between Virtual 2315 FPGA and Arduino sides

EVIDENCE INDICATES THIS IS ERRATIC AND TIMING DEPENDENT

I can run the test transactions from the Arduino to the FPGA multiple times and I see it failing at different points. I load a fixed pattern where each word of the sector has its word number as the content - 1, 2, 3 etc. I then fetch the sector content up to the Arduino and report where the returned value does not match the word number. 

I will find that two broad cases. In one case, after the FPGA hits some unknown state it will return gibberish that is constant for every word and every transfer over the SPI link. The second and more meaningful case is where it begins with agreement for some number of words and then the value returned is a fixed one at some word value for all subsequent words. 

The interesting observation for the second case is that the word number where it stops sending the proper value will change from test run to test run. It might be on the third word, it might be on the 50th word, but it will occur for certain sometime during the 320 words of a unload transaction. 

This tells me that I don't have a rare situation like a metastable signal or cross clock domain problem, it is a large window that is certain to hit a transfer sometime during a signal transaction. This is good, in that it should be easier to find than a very infrequent issue. However, it has not been obvious to me so far. 

MANY FIXES IN ATTEMPT TO TIGHTEN UP RESISTANCE TO TIMING VARIATIONS

Because this was clearly a timing issue that varied from run to run, I focused on timing between state machines and in all signals crossing clock domains. I had put synchronizers on all external signals coming into the FPGA. I even put on a synchronizer plus debouncer/hysteresis for the key signal that bracketed each two byte word of the SPI transaction. 

In my refactoring I put in a tightly interlocked set of signals to keep state machines in sync. One raises a trigger for the other but won't drop that trigger until the response signal is seen. The driven state machine will raise a response signal when it sees the trigger and won't drop the response until it sees the trigger go away. 

CURRENTLY LOOKING AT THE RAM STATE MACHINE AS IT IS LOCKING UP

I have recently found that the central memory access state machine the one that drives the memory interface IP that in turn controls the DDR3 DRAM, will end up stuck on some state other than its rest or idle state. That aligns with the symptoms, in that when it locks up it will stop responding with incrementing word values or it will not return even the first - thus the mismatch values I saw in the Arduino. 

When the first error case occurs, no meaningful match for any word, the value being returned is consistently the first value that was received to declare this as an unload transaction. That is, the value F8 09 which is the code for unload (B11111) and the value for the targeted sector number for my test which is B00000001001 and thus the outbound link remains frozen with the first value returned back to the Arduino. 

Normally, we send the value x0000 as we are receiving the first word defining the command, we then send the command value back in the second word of the transaction as we are receiving the inverse of the command. Error checking verifies that x07F6 is the valid inverse of the command word xF809 and we proceed to reach RAM and send up the contents for the next 321 words. Being stuck, we see xF809 coming back. 

In the second case, we do fetch the RAM locations properly for a while, sending that value up the link, but then we are frozen so the upward bound link keeps sending the last properly fetched value all the way to the end. 

Saturday, November 12, 2022

Hurricane Nicole now in the rear view mirror; tweaking simulation and nailing the reset and startup sequences with refactored design

HURRICANE NICOLE VISITED UNEXPECTEDLY

A rare November hurricane formed with little warning and was upon us midweek. It reached Category 1 intensity, windspeeds around 70 mph and made landfall approximately 40 miles south of me. The experience was similar to Ian, which while it had been more intense when it first made landfall in the gulf side of the state, was down to Cat 1 by the time it passed over us last month. 

Zero water or damage to the workshop and its computers, zero damage to my condo. Water flying at the windows with gusts to 85 mph finds its way through even the best sealing, such that I had maybe two quarts of water puddling on the tile along the ocean side windows. A few towels soaked that all up and all was well. 

The beach in front of my building did get pounded. Lots of erosion. My building has a very high seawall so that even with a full moon, high tide and six foot storm surge, no water made it up to the ground level or the garages. The sand down on the beach was vacuumed away, however. There is a plan to dredge up sand and reestablish the beaches all along the coast here, although not instantly. 

Also the crashing waves, coming sideways due to the rotating hurricane winds, did smash up most of the wood stairways that lead down to the beach. All 12 of the public access walkways in our town, for example, were damaged and impassible. My building used to have a stairway, too, but all we have now is a 'diving platform' looking down to the sand below. 

IMPROVED MY SIMULATION WITH RANDOM TIME DELAYS

Since the major issue to validate is how my logic performs with the SPI link which is driven by the Arduino completely unrelated to any of my clocks or logic, I made use of a random number generator for the testbench which varies the timing of the bytes which I present to the FPGA simulating the Arduino. 

NAILED DOWN STARTUP/RESET TIMING WITH SIMULATION

Using the post-implementation simulation, modeling the actual structure of logic cells and routing from my design, I was able to spot and repair some weaknesses in the relative timing of starting various state machines and the initialization of the memory interface and FIFO IP that I am using. 

Monday, November 7, 2022

Completely refactoring the SPI link logic

STATE MACHINES DEPEND ON TIMING OF INCOMING WORDS TO ADVANCE

Several of the state machines driving the SPI link depend on the SlaveSelect line which is active for each two byte word being transmitted. Alternatively it used an SPIbusy signal which in turn was driven by SlaveSelect. In both cases, the state machine first sits waiting while the transmission/reception of a word is underway then advances to snag the output when SlaveSelect turns off. 

I suspect that there are times when I have SlaveSelect already active but I am first waiting for it to turn off, or vice versa, because of the relative timing of the Arduino driven SPI signals and what I am doing inside the logic in the FPGA. That certainly aligns with the symptoms I see, where the SPI machine is out of sync with the words being sent by the Arduino or one of the state machines stalls. 

REFACTORING IS MY SOLUTION WHEN I AM ENCOUNTERING FLAKY BEHAVIOR

If I spend enough time fighting with erratic behavior, it is time to look at the problem again and refactor the design. I try to come at the required behavior in a different way, focusing especially on interlocking or other means of ensuring that various state machines work together as intended.

SPI LINK LOGIC BEING REDESIGNED

It is now time to refactor all the state machine gear. I have evolved it several times, in some cases because the way the Arduino worked was different than I expected and in some cases due to defects or poor approaches I found. The longer you layer fixes atop some code, the worst it tends to get. Refactoring lets me redesign with the benefit of all the correct information about the Arduino and all the experience I gained working on the logic.

Sunday, October 30, 2022

Erratic results cast suspicion towards testbed itself or subtle issue

ERRATIC RESULTS UNDER TESTING

The last few test runs have produced puzzling results. I saw mostly correct results going up the SPI link, but the first value sent was incorrect, then we were off by one for the next 285 words of the 321 word sector, then it repeated the 285th value over and over until the end. The SPI state machine did not complete nor reset.

With one run, I saw garbage values again and the integrated logic analyzer showed that the state machines for driving the SPI link froze after the first word, acting as if the SlaveSelect line was never asserted again by the Arduino master. The way that my state machines are set up, as soon as SlaveSelect is asserted we start over pumping out the first byte of the word to the SPI link module, but that wasn't happening. 

STALL IMPLIES THAT SLAVE SELECT FAILS TO ACTIVATE

SlaveSelect is a signal that I set and reset from my Arduino code for every word that is exchanged over the SPI link. That is, I assert select, exchange two bytes, then drop the selection line, thus delineating every word on the link. An overall signal, SPItransaction, is asserted to start a multiword transaction and dropped at the end of the 325 word exchange. 

A signal from the Arduino Mega 2560, with 5V logic levels, is converted by my level shifter MOS transistors to the 3.3V levels of the FPGA board. My Arduino itself produces both 5V and 3.3V to power the two sides of the level shifter. Previous oscilloscope probes showed very good swings of logic levels, so that when the +5 dropped to near 0 on the Arduino side I would have the +3.3V level on the FPGA side drop near zero. 

WIRING OR LEVELS OR METASTABLE OR SOMETHING ELSE

I don't have the scope here where I am testing, but that is the next step in investigating this weird behavior. A number of possibilities exist:

  • The Arduino output may not drive low enough to produce an asserted low level at the FPGA
  • The level shifter may be misbehaving
  • Resistance in my makeshift wiring and connections may be producing invalid logic levels at the FPGA input
  • The FPGA may reach a metastable invalid state if SlaveSelect changes near a clock edge
  • Timing issues in the routing on the FPGA chip may produce state machine errors or other logical 'farts'
I have synchronizers on every exterior signal, including SlaveSelect, which should have reduced the change of a metastable state to extremely low odds, especially to occur as often as it seems to be.

I can use an oscilloscope to validate the voltages appearing at the FPGA input pins, ruling out the first few potential causes or directing me to corrective action. 

If the issue is timing, I will have an extra frustrating road ahead. The timing report shows continual failure to meet timing, driven by error messages about inability to place the clock buffer and clock generating resources in the same portion of the FPGA chip. It forces me to override the conditions. I am loathe to allow this but the microscopic detail level necessary to work on this, particularly as it involves Intellectual Property (the memory interface) that I didn't write and which is in Verilog - a language I don't know. 


Saturday, October 29, 2022

RAM retrieving data but not yet transferred up SPI link - good progress

LOGIC ANALYZER CORES MONITOR READ AND WRITE OF MEMORY INTERFACE

My two integrated logic analyzer cores, one operating at the 4:1 speed of the memory interface used to clock in requests and grab data from memory, the other operating at my general logic frequency, were useful in spotting the state of signals as my logic dealt with SPI link requests to load and unload data from a target sector of the virtual cartridge image as held in the DDR3 RAM on the FPGA board.

I was able to see that the data was properly written into the memory interface and that information came out later when reading the same addresses. I will need this facility both to feed the SPI link during unload operations for virtual cartridges but also to feed the signals into the disk drive controller when we are simulating the head signals if it were a real cartridge spinning on the drive. 

CONTENTS OF READ DATA BUS FROM INTERFACE IS FLEETING

From the analyzer I could see that we only had valid data from the memory interface for the two clock cycles when the app_rd_data_valid signal is asserted, telling us we have good data. It then reverts back to the wrong data. Thus the timing of when we latch in the app_rd_data bus is critical to successfully getting memory contents out to the functions that need them. 

ADJUSTING TIMING TO CAPTURE THE DATA IN ORDER TO FEED IT TO SPI LINK

The fix seems pretty straightforward, so I will implement it and enter a new round of testing. Ideally, we will grab and hold the memory contents, pass it properly to the SPI link state machine, which will properly load it into the SPI slave link module itself where it will be properly clocked up to the Arduino. 

Tuesday, October 25, 2022

Bizarre clock domain discovered by logic analyzer core - digging deep

 CROSS CLOCK DOMAIN AND RESET SIGNAL DESIGN COMPLICATIONS

My design has multiple clock domains which brings with it the challenge of synchronizing signals going across domains. The board has a 100MHz hardware clock (also 12MHz but I didn't use that) which generates various clocks for the RAM controller and another (50MHz) used for my general logic. The controller gets 100MHz and 200Mhz, then it will generate a ramclk of 25MHz for driving my memory interface logic. Finally, the Serial Peripheral Interface (SPI) link has its own 4MHz clock which runs intermittently. 

We therefore have five active clock domains driving logic, plus two hardware clock domains one of which generates most of the others. Even if two of the domains would be at the same frequency, they are not in phase nor do they have aligned clock edges. 

Any external signal such as the SPI data lines but also all the signals from the 1130 disk drive, should be synchronized as they might otherwise be changing right near a logic clock edge leading to metastable state errors. As such I had a pool of synchronizers to make sure very signal in a particular clock domain changes only at clock edges. 

Too, I needed to reset various state machines and elements in a proper sequence, thus there are reset signals generated in steps - original, a FIFO clearing state, and a reset for the logic running under the ram clocks. In that ballet of startup steps, I had an issue which resulted in my main ram handling state machine stalling. This didn't occur in the regular simulation, but when I did a functional simulation with the post-synthesis design, I was able to dig out the issue previously. The fix was easy even if finding it was not. This was several rounds of testing ago, but interesting to understand the wrinkles involved in this sort of project.

INTEGRATED LOGIC ANALYZERS ASSOCIATE SIGNALS WITH CLOCK DOMAINS

When I select signals to watch with the integrated logic analyzer cores, the Vivado tool chain will determine the clock domain and build an analyzer core for each clock domain which has signals to monitor. I decided to watch the SPIbyteout bus signal which is the eight bits that are sent to the SPI protocol module to shift out to the Arduino. This was the last point outside the SPI module logic and thus I could monitor to see whether I was passing the RAM values properly.

STRANGE ASSOCIATION OF SIGNAL WITH CLOCK DOMAIN

The toolchain built a third logic core for the 100 MHz clock domain. That is only passed into the memory interface module and not involved in any of my logic. There is no way that the bus value I want to monitor should be tied to that clock domain. This suggests some subtle error which is the root cause of my difficulties but it is a very opaque sign. 

The logic that is generating SPIbyteout is clocked by the 50MHz clock domain but also tests an unsynchronized input from the SPI clock domain (SlaveSelect). That is a flaw that I have to correct, altering the logic to remove any reference to SlaveSelect or at least synchronizing it properly. In spite of the issue I see, I cannot imagine how that produces the errant clock domain assignment by Vivado. 

Monday, October 24, 2022

Data returned from the response FIFO correctly, not getting up to Arduino properly

NEXT SET OF RUNS SHOW THAT MY RESPONSE FIFO IS WORKING PROPERLY

I set the internal logic analyzer core to detect when I was reading the sixth word of the sector, by triggering on the return pattern 0006 and then watching all the related signals such as the SPI state machine.

The value 0006 is clearly detected and the state machine moves forward, thus my problem lies somewhere past the FIFO. It may be in the timing of passing the data word to the SPI outbound routine, in the way that the message is encoded on the SPI link itself, or on how the Arduino routine is detecting the results. 

I will reimplement with new watched signals to view the handoff from FIFO to the SPI out routine and inspect the word passed to the SPI logic. It would be wonderful if I could directly monitor the SPI link from a logic analyzer core, but the SPI clock is not continuous thus I can't start a logic core that is driven by the SPI clock. All signals in that clock domain, MISO, MOSI and SCLK itself, are thus inaccessible by the internal analyzer cores. 

RAM writing and reading good, data not getting back to SPI link state machine correctly

INTEGRATED LOGIC ANALYZER CAPTURED RAM OUTPUT

My load transaction stored the ascending integers in the word addresses of the sector, e.g. 0001 for word 1 and 0002 for word 2. The unload transaction read the same sector back and shipped the value back on the SPI link.

While the upstream data appeared to be gibberish, the values coming from the memory interface was indeed the same ascending integers as I had written and they were properly set up in the data in field for the FIFO that would transport that result back to to normal FPGA clock domain for use by my SPI link. 

THE ISSUE IS SOMEWHERE IN THE RETURN OF THE DATA

The data is set up properly in the FIFO that transports results. I will now focus on watching the FIFO operate and judge the correctness of the data returned on the regular clock side of that FIFO. If that is good, I could still have a problem capturing that into the SPI link state machine for the outgoing byte that is sent up to the Arduino. 

I reconfigured the logic analyzers to focus upon that data I need to watch for the response FIFO and my SPI link state machine. This does cross clock domains, which means I can't watch both the data entering the FIFO and the data entering the FIFO in the same logic analyzer core. I can, however, trigger on the FIFO empty flag to at least watch each result being presented in my FPGA logic clock domain. 

QUICK RESCUE OF SOME HARDWARE

I learned of a mainframe and some related hardware that was at risk of being scrapped as a person nearby had to clear out a storage unit. He was no longer interested in restoring the system and rents for the storage space had doubled suddenly. 

Today I had a small moving business I worked with previously meet me at the storage unit and haul those boxes to my workshop before the end of October. It is an IBM Z9 BC mainframe, almost 1,700 pounds which was a beast to get rolling up and down ramps for transport, plus a 3490E tape system and some ancillary parts. The tape unit was no walk in the park either, just easy in comparison to the behemoth. 

IBM Z9 Business Class mainframe

IBM 3490E 36 track tape cartridge drives


Thursday, October 20, 2022

Stripped down ram verify logic is working! Now to back port the changes to my main design

MY TESTBED, STRIPPED DOWN

I set up the memory interface and clock modules, two FIFOs just as were used in my full design, but stripping essentially everything else away. I will drive it with the four pushbuttons on the Digilent Arty S7 board, receiving feedback from the four monochrome LEDs and two tricolor LEDs on the board. 

I set up the left two buttons to access different addresses, writing a target but different data pattern to each. The primary location was set to x1234 repeated to fill 128 bits, while the secondary location was set to x5A0F repeated to fill 128 bits. The right two buttons triggered a read of the two selected addresses, then when the memory access was complete it compared the output from the read with the target data values.

If the primary location read did not come back with x1234, the left colored LED turned red. If the result matched, the color became green. The right colored LED would turn green if the secondary location returned x5A0F else it would turn red.

When the board initialized, all four monochrome LEDs were lit. Pushing the four pushbuttons turned on just the LED associated with that button, as evidence it was detected. Further, the left tricolor LED would be turned blue if we wrote the x1234 to the primary location and the right tricolor LED would be lit blue if we had written x5A0F to the secondary address. 

If the read was begun to either primary or secondary, but the memory didn't complete returning a value, the tricolor LED would be off and the relevant monochrome LED would be on indicating an incomplete read.

RESULTS OF THE FINAL INCARNATION

After starting up the FPGA board, I observed four monochrome LEDs lit and both tricolor LEDs dark. I began by attempting a read of the locations where I had not yet written data values. The third and fourth buttons did a read of the primary and secondary address. In each case, the related tricolor LED was bright red. 

This told me that I got something back, the read mechanism completed, but the returned value was not the expected x1234 or x5A0F. That was expected at this point.

I then pushed the left button which gave me blue on the left tricolor LED. I pushed the second button and the right tricolor turned blue. This was the indication that I had written the target data to my primary and secondary addresses. Again, as expected.

The final step was to push the third and fourth buttons. In each case, their tricolor blazed a glorious green as the returned value matched our expectations. Any number of reads would result in green status and I could interleave as many repeat writes with the left buttons without causing a red status light.

BUSY COLLECTING CHANGES

The changes are for the most part in the intellectual property (IP) modules I used not in my own logic, but without discovering the combinations of settings for all of them that would produce implementable and correctly operating logic, my own efforts would go nowhere.

The clocking scheme, the clock MCMM module and way it was organized was a key part of driving the memory interface at the right clock rates. This included as well a change to the constraints file to override an error that would otherwise block implementation from completing; that change was discovered from Xilinx tech notes after exhaustive google searches.

The memory interface too required its particular set of parameters. Since the number of parameters for the memory interface is large, in addition to the substantial number of clocking alternatives, I was not going to get anywhere with a random walk of changes. Sadly, there was no clear example in VHDL to follow either. Boo to both Xilinx and to Digilent for laziness. 

The change also converted the memory interface from the 2:1 mode I originally used to the 4:1 mode, which necessitated changing a bit of my own logic. When reading DDR3 RAM, the memory outputs eight bursts of 16 bit words for a given read request. Correspondingly a write will have to send eight 16 bit words to RAM for 8 contiguous word addresses. 

In the 2:1 mode, the generated RAM clock from the memory interface to my logic operates at 1/2 the rate of the memory. I therefore will be presented with half of the memory output in one of those cycles - 64 bits or four words, the other half of the eight bursts comes in a second cycle. My logic had to write the first half, bump the memory address by four words, then write the second half of the full burst across two cycles.

In 4:1 mode, all eight words are delivered or sent in one cycle, because the RAM clock operates at 1/4 the speed of the internal DDR3 chip operation. All eight bursts are accomplished with that one cycle from my logic. This was actually a simplification, as I had to only address the bottom address of eight words and send all 128 bits at once to write; receiving gave me all eight words at once. 

If I haven't missed anything else, I should be able to simplify my ram access logic in my real project, convert the clock and memory interface IP to the magic parameters, then move ahead debugging with working RAM assured. 

Wednesday, October 19, 2022

Battle on two fronts but still bashing away at memory controller on FPGA board

TWO FRONT WAR

In addition to the long term war trying to sort out how to get the memory interface system of the Vivado toolchain to successfully drive the DDR3 RAM on the Digilent Arty S7 board, I am fending off covid-19 virii.

Eleven days ago I got my latest booster shot plus the yearly influenza vaccine, but I didn't get it early enough. Monday both my wife and I began to feel upper respiratory congestion, which at first we ascribed to seasonal allergies, but by the evening it was clearly more than allergies. That had been our first thought since we both had the symptoms essentially simultaneously, which is typical of allergies.

Because the antiviral Tamiflu must be taken early in the illness to be effective, we decided to go to a clinic on Tuesday morning to get testing - expecting it was either a cold or the flu. The test swabs were jointly tested for flu and covid. We were floored to learn that we both tested positive for Covid. Evidently we were exposed at the same time, perhaps at a doctors visit she had the week prior for a checkup on her eye surgery. No way of knowing, of course.

There are antivirals for Covid, akin to Tamiflu. We were prescribed one and began taking it immediately. Hopefully we started it rather quickly in the course of the illness and it, plus the partially activated vaccinations we recently had, will lessen the severity and shorten the duration. 

This will keep me out of the shop (and cooped up in quarantine) for a few more days before I can venture out with masks on. 

MEMORY CONTROLLER BATTLES CONTINUE

I stripped down my logic to a bare minimum that will simply write and read from the memory interface based on pushbuttons on the fpga board. It will light up to show me that the write and reads were started and use color LEDs to tell me if the returned value matches what was written. 

Saturday, October 15, 2022

Wading through error messages trying to resolve memory interface issue

 INTERESTING ERROR MESSAGES ONE HAS TO INTERPRET

[Place 30-172] Sub-optimal placement for a clock-capable IO pin and PLL pair. If this sub optimal condition is acceptable for this design, you may use the CLOCK_DEDICATED_ROUTE constraint in the .xdc file to demote this message to a WARNING. However, the use of this override is highly discouraged. These examples can be used directly in the .xdc file to override this clock rule.

< set_property CLOCK_DEDICATED_ROUTE BACKBONE [get_nets ddr3clock/inst/clk_in1_clk_wiz_ddr] >


ddr3clock/inst/clkin1_ibufg (IBUF.O) is provisionally placed by clockplacer on IOB_X1Y26

mymemory/u_cartmemory_mig/u_ddr3_infrastructure/plle2_i (PLLE2_ADV.CLKIN1) is locked to PLLE2_ADV_X1Y0

ddr3clock/inst/plle2_adv_inst (PLLE2_ADV.CLKIN1) is provisionally placed by clockplacer on PLLE2_ADV_X0Y0


The above error could possibly be related to other connected instances. Following is a list of 

all the related clock rules and their respective instances.


Clock Rule: rule_pll_bufg

Status: PASS 

Rule Description: A PLL driving a BUFG must be placed on the same half side (top/bottom) of the device

ddr3clock/inst/plle2_adv_inst (PLLE2_ADV.CLKFBOUT) is provisionally placed by clockplacer on PLLE2_ADV_X0Y0

ddr3clock/inst/clkf_buf (BUFG.I) is provisionally placed by clockplacer on BUFGCTRL_X0Y5


Clock Rule: rule_pll_bufhce

Status: PASS 

Rule Description: A PLL driving a BUFH must both be in the same horizontal row (clockregion-wise)

mymemory/u_cartmemory_mig/u_ddr3_infrastructure/plle2_i (PLLE2_ADV.CLKOUT3) is locked to PLLE2_ADV_X1Y0

mymemory/u_cartmemory_mig/u_ddr3_infrastructure/u_bufh_pll_clk3 (BUFH.I) is provisionally placed by clockplacer on BUFHCE_X1Y7


Clock Rule: rule_bufh_bufr_ramb

Status: PASS 

Rule Description: Reginal buffers in the same clock region must drive a total number of brams less

than the capacity of the region

mymemory/u_cartmemory_mig/u_ddr3_infrastructure/u_bufh_pll_clk3 (BUFH.O) is provisionally placed by clockplacer on BUFHCE_X1Y7


Clock Rule: rule_bufhce_mmcm

Status: PASS 

Rule Description: A BUFH driving an MMCM must both be in the same clock region

mymemory/u_cartmemory_mig/u_ddr3_infrastructure/u_bufh_pll_clk3 (BUFH.O) is provisionally placed by clockplacer on BUFHCE_X1Y7

mymemory/u_cartmemory_mig/u_ddr3_infrastructure/gen_mmcm.mmcm_i (MMCME2_ADV.CLKIN1) is locked to MMCME2_ADV_X1Y0


Clock Rule: rule_mmcm_bufg

Status: PASS 

Rule Description: An MMCM driving a BUFG must be placed on the same half side (top/bottom) of the device

mymemory/u_cartmemory_mig/u_ddr3_infrastructure/gen_mmcm.mmcm_i (MMCME2_ADV.CLKFBOUT) is locked to MMCME2_ADV_X1Y0

and mymemory/u_cartmemory_mig/u_ddr3_infrastructure/u_bufg_clkdiv0 (BUFG.I) is provisionally placed by clockplacer on BUFGCTRL_X0Y0


Nowhere, and I mean absolutely nowhere, does the IP for the memory interface give any spot where I can place these elements - the toolchain is doing this and then throwing a fit about its placements not following the rules. Tonight I will drink heavily, tomorrow I will try to dig into the lowest level internal details of the FPGA chip to understand what MMCME2_ADV_X1Y0 and BUFGCTRL_X0Y0 and BUFHCE_X1Y7 are. 

FPGA issues are in the clocking setup for the DDR3 RAM memory interface - once again not in my logic

MADDENINGLY EVERY SIMULATION WORKS BUT REAL TESTS FAIL

I continue to monitor the signals I am presenting to the memory interface, a bit of intellectual property provided with Vivado that manages the DDR3 memory device on the Digilent Arty S7 FPGA board I am using. This memory interface is provided two clocks and produces a third one which is the driver for most of my logic. 

The memory interface has do perform some calibration of the DDR3 memory which takes so long that it can't be simulated, thus I created a mock memory interface module to link in when simulating. My mock interface behaves according to the documentation, but only as well as I comprehend the specs. 

Everything works perfectly under simulation, even when I do it with post-implementation nets, but I just am not getting the memory to operate properly in real life. I have battled the internal logic analyzer capability until I could watch directly and everything I am producing matches the signal timing diagrams from the documentation but the results from the memory interface don't make sense.

CONCLUSION - SOMETHING IS WRONG WITH MEMORY INTERFACE IMPLEMENTATION

I am now suspecting that the memory interface is not set up correctly. Various web searches have flagged bulletins and notes from others pointing to issues with the clock setup It is a deep rabbit hole to dive into, far down into the gritty details of clock resources, signal routing on the FPGA chip, and the clear-as-mud documentation for the memory interface IP. 

Digilent provided a sample set of files for the memory interface to use with the Arty S7 board. I have located the actual implementation control files (.prj and .ucf) that were produced in Vivado when I generated my memory interface. Comparing the two flags differences and of course those are in the clock parameters. 

STEP 1 - DEEPLY STUDY WHAT IS NEEDED FOR CLOCKING THE MEMORY INTERFACE

Here is the helpful high level diagram for what is needed to clock the memory interface.

High level clocking

Next I have to study the very long list of rules, the first of which are show below:

Some of the rules for the memory interface setup

A number of the rules are for the choice of where to wire the DDR3 lines to the FPGA - but those decisions were made by Digilent when they built the Arty board. Other choices, such as clock frequencies, will be constrained as I only have two real clocks for Arty - 100MHz and 12MHz - thus I would need clock logic added to convert those to the frequencies necessary for the memory interface.

STEP 2 - CORRECT THE SETUP AND RESUME TESTING

Once I know what is wrong, I can bash along until I am able to get the proper setup configured and the FPGA implementation to match. At that point I can resume testing to see if things work any better. 

Thursday, October 13, 2022

Spent time in the shop working on 1053 typewriter and Virtual 2315 build

TYPEWRITER MAIN SHAFT CLEANED UP AND REINSTALLED

The main shaft in the Selectric 1 mechanism holds the rear of the carrier, which slides left and right along the shaft to the various columns of the page. The shaft is keyed and rotates to energize the typeball mechanism rotation, tilt and strike onto the ribbon. 

The shaft on this 1053 had corrosion and pitting along its length, inhibiting the free movement of the carrier for spacing, backspacing, tabulation and carrier return operations. I had been concerned that the pitting might be so deep that I would need a replacement shaft. Fortunately, sanding the shaft smoothed it out enough for free sliding, particularly once I grease the shaft. The remaining pits are small and don't have raised edges.

The shaft is installed in typewriter allowing me to move on to adjusting and repairing the portion of the carrier that moves one column for spacing and backspacing as well as allowing free movement during tabulation until it reaches a set tab position. 

 THE VIRTUAL CARTRIDGE HARDWARE

I prepared the connector and wire harness to connect to the Arduino inside the project box. A ribbon cable will plug into this connector and carry the signals over to the connector on my interface board. Another cable runs from the interface board to an adapter with small wire-wrap lines that I can connect to the appropriate signal points of the IBM 1130 internal disk drive electronics backplane. 

Ahead I will connect the wire harness to the Arduino connector block for each relevant signal. The interface board will need to be mounted in place and the adapter for the wire-wrap connections must be secured before I can accomplish the actual connection to the backplane pins. 

Monday, October 3, 2022

Many hours spent fighting with debug cores

SCORE A WIN FOR OBSCURE VIVADO ERRORS IN THE LONG RUNNING WAR WITH CARL

I read as much as I could about the integrated logic analyzers and carefully went over the process in the hopes of actually resuming debugging of my own logic instead of the arcania of Vivado. I removed all previous debug probes, generated a version without the logic analyzers, then started over.

I selected signals I thought I should monitor during execution, adding in the input and output from the SPI link itself just to capture the correctness of the raw data transfer. This resulted in three debug cores, one for each of the three clock domains where I was watching signals. The main FPGA logic clock was one, and the DDR3 memory interface clock was the other, with the SPI link SCLK clock as the third.

I got clean implementation and bit files, but every time I tried to program the FPGA board I received an error and had the three debug cores deleted from the programming. The message mentioned configuration options to check or the need to have free running clocks for the debug core.

That should have been my hint, if I thought about it, because the third debug core was tied to the SCLK clock incoming from the Arduino and therefore was not steadily running at the time of programming. There may be a way to set up a debug core to use an external clock that is not continuous, but I didn't bother to struggle through the mountains of documents to discover it. 

After I deleted the one signal which is connected to that clock domain, I was able to regenerate the bitstream, now with just two debug cores. Now on to testing.

Saturday, October 1, 2022

Finally back into shop for testing; fighting with the internal logic analyzer process

VISITING FAMILY IN NORTHEAST FOR TWO WEEKS, FRIEND VISITS, HURRICANE VISITS

I have been away from the shop for an extended period due to a succession of events which needed my full attention. First, my wife attended her 52nd high school reunion and we visited friends and my daughter. 

We had no sooner arrived back in Florida when an old friend of ours stopped by to visit. He is considering buying a large boat to live on, anchoring it in this area. We jointly investigated marinas and various boats until he left.

Of course, by that time Hurricane Ian had formed in the Atlantic and was headed our way. I scrambled to seal up outer fixtures and vents on my home, lay in supplies for a potential long duration without power or water and then raise everything in my shop so that any minor water running on the floor wouldn't damage anything.

I am relieved to say that both the shop and my home survived the passage of the hurricane's center only 10 miles north of us without any damage. Both the shop and my home have impact windows that should stand up to impacts from up to category 5 winds. Ian was nearly Cat 5 as it arrived on the gulf coast of Florida but had weakened to barely Category 1 after crossing the state to reach me on the Atlantic coast. 

FIRST DEBUGGING RUNS UPON RETURN

I have just begun to test again, attempting to use the internal logic analyzer functionality of the Xilinx chip to debug the SPI link transactions reading and writing data to the FPGA board RAM from the Arduino. Spent time banging my head against the table figuratively as I attempted to get the logic I created, the logic analyzer and the memory configuration file for the onboard flash memory to be consistent. 

What would happen is that I would see the activity going on but the logic analyzer insisted none of the signals were varying. I believe that I had them out of sync, thus the analyzer was looking at FPGA lookup tables or other resources which were not used in the latest implementation, given that the software makes those assignments fairly dynamically on each run of the toolchain. 

Friday, September 9, 2022

Triumphed over toolchain

DISCOVERED REMNANTS OF DEBUG CORE DATA TACKED ONTO XDC FILE

I found that the process of setting up debug wrote data onto the end of the main constraints file (XDC) which conflicted with attempts to set up a new set of ILA cores with the signal moved to the correct clock domain core. 

If I erased those entries, I could run the synthesis, set up the debug and then run through bit file generation without issue. I haven't been to the shop to test it but I expect that I have resolved everything and can get back to testing.

REASSIGNED THE SPI LINK CLOCK TO A BETTER FPGA PIN

The SPI link provides the link externally, generated by the Arduino. Thus this clock is not related to any clock inside the FPGA. The design of the SPI slave module I leveraged handles the handling of the different clock domains, so I don't have to deal with the signals. However, it is treated as a clock by that portion of the logic in the slave module and that means that Vivado wants to use clock buffers and the special clock lines inside the FPGA chip. 

I had been receiving warning messages about the inadequacy of the clock line and buffer connections to this input pin, so I dug into the more intimate details of the chip and its clock resources to fix this. I found that a subset of the input-output pins are capable of good interface to the clock resources - named CCIO pins. 

I found the list of CCIO pins and identified a few of the IO connector pins on the FPGA board which matched those. By shuffling two signals, I had the SPI SCLK signal hooked to a CCIO pin which resolved all the clock inadequacy messages. 

Wednesday, September 7, 2022

Dopey toolchain design blocks any progress today

VIVADO WON'T FULLY REBUILD DEBUG CORES, DIES ON ERRORS

I took my corrected design, the one that put the signals in the correct clock domains, fired up and tried to debug but noticed that the app_wdf_data signal that had been in the general logic domain was still tied to the debug core for that clock, not moved to the memory clock domain as it should. 

I removed the debug cores and implemented just fine. I then redid the debug setup but when implementing it throws errors about missing debug cores. No matter what I do, it won't properly build debug cores. 

Vivado is notorious for this sort of misbehavior. It remembers things and has no method to force a full clean build. Workarounds require manual deletion of files, dangerous at best. I wasted the entire session I had available today fighting with the tool rather than debugging my logic.

INSTALLED ANOTHER SPRING AND TOOK OUT THE PRINT SHAFT OF THE 1053

The small spring I attempted to connect last time beckoned, while Vivado was busy failing to implement the logic. I was able to conquer the spring using many different tools and angles of approach. 

The print shaft is the main support that the carrier slides across from left to right, and it has a notch to transfer rotation of the shaft to throw the typeball forward when printing a character. The shaft had corrosion or hard crusted junk on it that was rubbing against the rubber sleeves and restricting carrier movement.

While I reserve the option to replace it with another shaft, I did use emery paper which is removing quite a bit of the coating and making the surface smoother. It won't remove every pit but if the carrier can move freely then I can keep the original shaft on the printer. 

Tuesday, September 6, 2022

Short debugging run highlights design flaws - signals changed in the wrong clock domain; some Selectric work done

ERRATIC RESULTS IS USUALLY DUE TO TIMING ISSUE

Simulation is not a perfect solution to testing for these issues, particularly since my testbench sets the timing of all the signals driving the logic. In the real design, we have an external clock coming from the Arduino SPI link as well as generated clock modules in the FPGA that determine the relative phase and timing of signals. 

That is the value of having oscilloscopes to look for relative timing of signals as well as logic analyzers to show what is actually occurring inside my mechanism. I was able to dash over to the shop for a couple of hours during which I looked for clues that would explain the erratic behavior I am seeing.

DATA TO WRITE INTO RAM IS SET UP UNDER THE GENERAL LOGIC CLOCK

The internal logic analyzer cores placed in my design by Vivado are segregated by the clock domain in which the signals operate. For the signals I am watching, those are the general logic clock at 50MHz and the memory controller RAM clock at 100MHz. There is also the SPI clock domain which I am not currently monitoring as well as an internal 200MHz clock in the memory controller. 

While I was looking at the RAM access signals, which are in the RAM 100MHz clock domain, I noticed that I wasn't seeing the data being input to the memory controller to write into storage. I then spotted that signal in the other ILA, the one under the general logic 50MHz clock. Aha! It is exactly this sort of design flaw that can produce erratic results, where most of the words have the expected value but some do not. 

RESTRUCTURING THIS PART OF MY DESIGN

I will be carefully planning this portion of my logic, to ensure that any data going into or out of RAM is handled under the general clock at the input to the request FIFO or the output of the response FIFO, while all the memory controller inputs and outputs come from the request FIFO output and response FIFO input since those operate under the RAM clock. 

The data being read and written in RAM comes from one of two broad areas - the disk drive modeling logic or the Arduino SPI link logic. Where I was selecting between the sources and destinations is where the flaws exist. 

SELECTRIC WORK ACCOMPLISHED WHILE REBUILDING FPGA BIT FILES

I decided to monitor two more signals in the internal logic analyzers which mandated synthesizing, implementing and generating bit files again. As that takes almost a half an hour, I went over to the Selectric typewriters, the 1053 Console Printers of the two 1130 systems, to make some progress.

I was able to free up a critical area of the machine that controls spacing, backspacing and tab operations. These levers on the rear of the carrier need a lot of lubrication and exercising to free them up from the years old sludgy oil and grease in the machines. I had it moving freely then observed that a key spring was missing, one that shifts a sliding bar involved in tab operations.

I located a suitable spring and installed it. The carrier is still not moving well, which is due to the corrosion on the round shaft that it slides across. That shaft is keyed and also imparts rotation to swing the typeball up to strike the ribbon and paper. I am skeptical that I can clean up this bar adequately to allow good clean movement of the carrier, so much so that I am looking to find a donor Selectric typewriter whose bar I can salvage. 

Space and tab operations operate for one cycle, with the mechanism pushing the activation lever forward at the end of the cycle. It should latch into place, based on a smaller spring activated lever. One of the activation levers was missing the small spring, thus it would not relatch, causing the machine to take multiple tab or space operations. I found and installed an appropriate spring. 

The space operation depends on a mechanism tilting a bar behind the carrier, but there was a spring that had snapped there. I tried to get it back into position but have not yet been successful. In theory it is easy to attach springs. Tools called spring hooks can grab each end, the spring is maneuvered into place and the ends coaxed into their mounting holes or tabs. 

In reality, most locations do NOT have room to bring in a spring hook on one side. The sides that can be handled often are with a tilt that not ideal. This spring for the space mechanism is in a challenging area, thus I will have to keep at it until I can get both ends secured. The original spring had corroded and broken in the middle. 

Monday, September 5, 2022

RAM access working, but slightly erratic operation of the link still needs some debugging

SHORT SESSION SQUEEZED IN USING NEW DIAGNOSTIC TOOLS

I was able to spot an issue, correct it with only a 30 minute turnaround for a new bitfile for the FPGA. I began to see that most of the returned words on the Unload transaction were what I had written previously with the Load transaction. This shows I am indeed writing to and reading from RAM addresses. 

I did see that the results were mostly correct - the value returned should be the word number which is what I wrote, but some came back incorrect. Worse, I would see the checksum and flag coming back from the FPGA while the Arduino still thought it hadn't finished all 321 words of data. Things are getting out of sync. 

Since it will be a day before I can get back to the shop, I will look over the logic and see where I can tighten up the operation of the state machines to keep everything locked properly together. 

The other thing I haven't yet validated is that the address I am sending is going to the correct RAM locations. I need to know that a read or write to a particular cylinder, head and sector will take place against the correct part of the disk cartridge image. 

Sunday, September 4, 2022

Debugging tools working well - just need time to work with them

LED ARRAY PANELS

Somehow I didn't expect the lights to alternate directions, but that makes sense with the most direct wiring path on the panel. Looking at the panel, the top left pixel is number 0. Running down the left edge we have pixels 0 to 15, then it turns around and runs upwards one column over, with 16 at the bottom and 31 at the top. Reversing again, the third column runs from 32 at the top to 47 at the bottom. 

It took a moment to determine what number each array position is associated with, but then it was easy to see the condition of the logic with all those pixels active. Very easy when I have the Arduino send each word of a transaction with a human sized pause - the values going up and down the SPI link are shown clearly during the pause. I can also see the transaction type latched as the related LED lights up. 

INTEGRATED LOGIC ANALYZER WORKING AS WELL

It only took a few minutes poking around until I figured out how to link up to the ILA cores and begin watching the signals. The only complication is that finite state machine names are converted to a hex number, thus I need to look these up before attempting to trigger the logic analyzer on a specific state. 

In just five minutes, before I had to head back to my home, I found a defect and have already corrected the logic (VHDL code) in preparation for my next time at the shop.

Saturday, September 3, 2022

Invested time in learning the integrated logic analyzer and other debug capability of Vivado toolchain

PROGRAMMERS ATTITUDE

It is common for programmers to spend time writing code to automate activities that they will otherwise have to accomplish more manually and tediously, if those activities will occur multiple times. The up front investment in writing macros, tools or other programs is recouped when the activities are accomplished many times with far less effort than otherwise would be required.

My original approach to debugging was to choose a limited set of signals to bring out to the external I/O pins of the FPGA board, to which I would connect oscilloscopes or logic analyzers to hunt for signs that illustrated one or more logic design flaws. The time it takes to reselect signals and regenerate the FPGA is about half an hour of wall clock time, but may recur many times as I iterate through signal sets before finally seeing the definitive evidence. 

After poking around at the documentation for Xilinx's Vivado toolchain, I came to the conclusion that investing some time now would not only offset all those random walks through sets of signals in half hour sessions, but would be a useful tool for all future FPGA designs. I thus began reading and experimenting to figure out the capabilities.

INTEGRATED LOGIC ANALYZER CORES

Once a design has been synthesized, it is relatively easy to open the netlist of signals and mark all those that you would want to have available for a logic analyzer - casting a wide flung 'net' to grab any signal that potentially be viewed or used as a trigger. Then the debug setup tool will prepare debug cores to capture these signals. It groups signals into their clock domains and produces one core for each clock. In my case, one core for the signals operating at the normal logic frequency of 50MHz and another for the RAM operating clock of 100MHz. 

These are instantiated onto the FPGA alongside my logic and are connected to over the USB cable from Vivado to the FPGA board once I program the board with the new bitfile. A logic analyzer program runs under Vivado and communicates with the ILA core that was produced. 

VIRTUAL INPUT OUTPUT CORES

Another type of core can be produced, where signals are marked that you wish to read or change from the Vivado software while the FPGA logic is running. This VIO core talks to the Vivado software over the USB cable and with it one can vary inputs and read outputs purely by software. This is interesting because it could be used in lieu of the working disk drive and 1130 computer to produce the control signals and therefore drive my logic through its paces. 

AXI DRIVING CORES

ARM developed Advanced eXtensible Interface - AXI - to allow a more standardized way for modular logic components to interact with others and with processor cores. Many bits of available functionality and intellectual property for use with FPGAs exists with an AXI interface option. Because the interface is so well described, a new design that uses AXI can be debugged by driving it with an AXI core talking to Vivado that will handle all the interface details. I don't expect to need AXI cores, but I discussed it here to round out the list of capabilities built into Vivado.

STEPS TO INTERACT WITH MY SIGNALS

It didn't take long to open my synthesized design, mark the signals I wanted routed to the ILA core, then set up the cores themselves to be included on my board. From that point forward, I just implemented, generated a bit file and created the memory load to put that file into the flash memory on the board in the same way I worked without the debug cores. 

Select netlist signals for use with ILA core

Signals are 'probes' in the ILA

When I connect to the FPGA board to download the bit file to the flash memory, from whence it is loaded whenever the FPGA board is powered up or reset, it also shows me any debug cores that are implemented. Selecting one or more will bring up the software that interacts with ILA or VIO cores. The ILA software is very similar to the logic analyzer capabilities in HP and Tektronix physical analyzers, thus easy to figure out and work. The resulting display is exactly like the waveforms displayed by the simulator built into Vivado, which I have quite a bit of experience using. 

Friday, September 2, 2022

Implemented LED panels plus added diagnostic output pins to debug FPGA

512 LED ARRAYS, TWO AT 16x16, SET UP TO OUTPUT INTERNAL SIGNALS

Due to the error in the design of the leveraged logic for driving the panels, I am only using 510 of the 512 pixels. I hooked up just about every signal in the design that might make sense to observe statically or at slow speed and still had plenty of capacity. I chose to set up the second panel of 256 with a light for each of the 203 cylinder locations, plus lights for the head selected and which of the four sectors is active. 

ADDITIONAL OUTPUT PINS ON PMOD CONNECTORS PUT INTO SERVICE

I knew that some of the pins on the PMOD connectors of the ARTY FPGA board were shared with the other input-output connectors I was already using, but upon close inspection I found that 16 signals, those on the A and B PMODS, were not shared and thus available for additional diagnostic outputs.

These are appropriate for timing or short activation detections where the LEDs are not suitable. I brought out the logic clock as well which will allow me to hook up a logic analyzer if I need to see the interplay of many signals at the same time. 

FUTURE POSSIBILITIES WITH LEARNING CURVE BUT MORE POWER

The Vivado toolchain includes some powerful tools to let me inspect and even inject signals without having to pre-select external connectors for this purpose. These are the Integrated Logic Analyzer (ILA) and Virtual IO (VIO) functionality that I can add into my design. ILA is what it sounds like, a logic analyzer with rich trigger conditions that I could use to follow signals from my PC over the USB cable to the board. VIO lets me see virtual pins and inject signals to drive my design. 

It takes hardware resources to implement these, the more signals you set up for potential connection, the more of the board is consumed. I don't have any idea whether my entire Virtual 2315 logic could coexist with ILA or VIO or both on the Spartan 7 version on my board. The more powerful factor holding me back is the time it takes to learn how to use each of these, time taken away from the project if I could otherwise finish debugging without these tools. 

Thursday, September 1, 2022

Having to debug and fix the Neopixel logic I leveraged to drive the panels from the FPGA

USING SOMEONE'S DONATED LOGIC TO DRIVE THE NEOPIXEL ARRAYS

This logic was developed by Blaž Rolih and shared under the MIT License on github. It seemed like a great fit thus I made use of it to add the two 16x16 pixel arrays for diagnostic displays.

# fpga-neopixel

FPGA module for NeoPixel led-strip written in VHDL. Works with ws2812b (RGB) and sk6812 (RGBW).

Made and tested in Xilinx Vivado, with Nexys A7 Artix-7 CSG324 FPGA board with 100 MHz clock but should work on other boards with minor adjustments as well.

My logic is operating at 50MHz and the first issue I discovered and had to fix was that its customization for different clock speeds wasn't complete. That is, it allowed modification of the number of cycles for the on state of the signal line for both a 1 and a 0 bit but not of the remaining off time. The result was that my overall time per bit was longer than specification for the WS2812B based Neopixels (1.25 microseconds). With some modifications to the logic, it worked properly.

The next issue is a latent defect when the number of pixels in a string being driven is an even power of two. The logic that tests for when the last pixel was transmitted will fail and it looks forever leaving the pixels frozen with their initial value. This is due to a counter set up to hold the count of pixels, origin zero, thus in my case of 512 pixels, the counter runs from 0 to 511 and needs only 9 bits to hold that value. 

The logic tests to see when the current pixel is not less than the configured count, thus testing that the counter is not less than 512. It will always be less than 512 because it rolls over from 511 to 0 due to the counter width. I don't need all 512 pixel positions thus I will configure this for 510 which should work fine with the current logic. 

Saturday, August 27, 2022

Adding diagnostic output from FPGA via Neopixel arrays

NEOPIXEL ARRAYS

Neopixels are tricolor LEDs with an integrated controller, wired in a long string. The single data line that connects all of the LEDs allows a string of pulses that address every LED so that one can control any or all of them by sending the proper pulse stream. 

Each of the three colored LEDs inside a single Neopixel takes an eight bit brightness value, thus we are shifting a 24 bit quantity into the Neopixel to set the brightness of the three colors. When we have a long string of Neopixels, we begin shifting out the 24 bit value for the last Neopixel, followed by the next to last and so forth until the last value we shift is for the first Neopixel in the string. 

The bits flow through each Neopixel and out to the next, thus the first Neopixel has seen the bits for all its successors flow through it. The final Neopixel only sees the 24 bits intended for it, while the next to last sees 48 bits. 

I bought arrays of Neopixels in a grid of 16 by 16 positions, these are addressed as 256 linear units. They can be chained, thus I will start with two arrays that consist of 512 addressable Neopixels. 

DRIVING THE LIGHTS

The wire that runs through our 512 Neopixels is clocked at 800KHz sending out 24 bit values at the rate of about 33,333 positions per second. The string of pulses begins with a minimum low interval of 50 microseconds to reset the devices then the pulses go out. The proportion of on versus off time for each of the bit positions encodes whether that bit is a 1 or a 0. 

Each bit position is 1.25 microseconds long. The full 24 bits takes 30 microseconds and my string of 512 Neopixels will therefore consume 15.41 milliseconds to set their values including the initial quiet period. If the chain is updated as fast as possible we will have a refresh rate of almost 65 per second. 

The single data line that drives the chain of Neopixels requires 5V logic levels, thus we will be passing it through a level shifter from the 3.3V of the FPGA. The Neopixels are also fed +5V and ground to feed the LEDs themselves.

USEFUL ONLY FOR STATIC OR ONE TIME LATCHED SIGNALS

Some debugging requires me to capture fleeting signals or to see the time relationship. These would be quite difficult to capture via the light arrays, but any static, long lasting or latched signals can be routed to the arrays for display. As an example, I can use one position for each state of a finite state machine I want to monitor. If the FSM sticks in some position I will immediately see which one by the light that is illuminated. 

I can also track interesting values such as the cylinder and track active for the drive. I may latch the last address on the SPI link as another idea for helpful data to show on the array. I only give up one output pin to drive all 512 lights, a good tradeoff even with the limitation to slow changing or static information. 

Friday, August 26, 2022

Aha - spotted a defect in my logic that was stalling the SPI link state machine

SPOTTED A FLAWED APPROACH IN MY STATE MACHINE

I should have realized that since I saw the two RAM access controlling commands - RAM to drive and RAM to SPI - working successfully but the load and unload commands stalling, the error had in parts of the SPI link state machine that weren't involved in the two good transaction types.

It was then that I spotted my error. I was triggering requests to read or write RAM and then waiting for a signal that was generated by the RAM state machine, but the done signal is in the faster clock domain unique to the RAM and not valid in the general logic clock domain. 

My simulation worked okay but there must have been just enough phase difference in the real world that it never caught the go-ahead signal. That pulse was one cycle long in the 100MHz clock domain but if it wasn't high at the beginning of a 50MHz clock cycle we could miss it. Apparently we did miss it consistently. 

Imagine that our clock rises a few nanoseconds before the RAM clock rises. When we look at the rising edge of our, the done signal from RAM has not yet been emitted. It goes high while we are in our 50 ns clock period, but we only look at our rising edge. The RAM clock advances twice as fast thus it has already dropped the done signal before we get to our next clock edge in our clock domain. Signal missed entirely. 

TESTING THE IMPROVED LOGIC

The solution was to emit a signal in our clock domain - from the FIFO that fetches the responses from RAM. That FIFO is loaded under the RAMs clock and thus is synchronized with the done signal I originally used - it works fine. The output of the FIFO is running under our slower general logic clock and the state machine for that side of the FIFO can emit a new type of done signal in the proper clock domain. 

The SPI link state machine now watches for the new done signal and will see it because it has a common clock domain. This should avoid the deadlocked condition we experienced earlier. 

While I was looking over the logic, I came up with an improved method of pulling the two bytes off the SPI link for each word, recoded it into two state machines, one for each direction on the link, and was happy with the simulation results.

I set up to test in the real world once again and this time I found it stalling only during the unload transactions. Will be digging into this.

First word x0009 - unload transaction

Green is RAM fetch completion before transmitting first word

FINISHED BUILDING THE LEVEL SHIFTER PCB AND CABLE TO WIRE WRAP ADAPTER

Level shifter board

Ribbon cable to wire wrap adapter


Wednesday, August 24, 2022

Watching state machines and hunting for defects

INDICATIONS ADDED TO SEE STATE MACHINES NOT AT IDLE

I set up a set of indications to display on the four LEDs for each of the state machines, illuminated when it is not at its idle or starting position. That way I can quickly find any machine that is stalled or active when it shouldn't be. 

I also set up the red color of one of the multicolor LEDs to light when ANY of the state machines are not at idle, a very immediate visual cue that something odd is happening. While it is processing transactions it should turn red, reflecting the activity, but then turn off in quiet times. 

OBSERVATIONS DURING TESTING

I could see that the main SPI transaction state machine was not returning to idle. This made no sense as every state where it can wait includes a top priority test that if the SPItransaction signal is off, it returns to idle. I will be instrumenting this more thoroughly as this is consistent with the flawed behavior I was seeing - one attempt at accessing RAM but no more than the first was attempted. 

Obviously I will be studying this one closely for any situation that might cause it to lock up, but also planning on the best instrumentation I can install to locate the problem. I have some ideas on how I can accomplish this without having to make multiple 25 minute runs reprogramming the FPGA board in between investigations. 

PCBS ARRIVED AND ONE NIT DISCOVERED

My PCBs arrived and look great. There is an extraneous REF comment on the silkscreen next to the four mounting holes, but that is irrelevant to its functionality. When I began looking closely at my connectors and cables for the links to the 1130 drive electronics I realized that I had bought IDE connectors not generic 40 pin connectors.

This is bad because there is a blank pin position on an IDE cable, right where I intended to run a signal. Both the IDE cable and the connector I bought to anchor the wire wrap lines has this missing pin. Fortunately the connectors for the PCB itself are full 40 pin units.

I have new cables coming in two days which resolve the blank position issue, but I do have to work around the missing pin on the anchor board for the wire wrap leads. It appears that I can press in a pin and solder that onto the board, restoring the connector to full functionality, but it will depend on what pins I can scrounge up. 

Finished PCB

Tuesday, August 23, 2022

Digging into stall in state machines driving RAM access; switch cover mounted; header for backplane wiring prepped

BACKPLANE WIRING HEADER PREPPED

The connection for the Virtual 2315 Cartridge Facility to the disk drive electronics inside the 1130 is by wire wrap to the backplane on the drive, with those signals routed over an IDE ribbon cable to the level shifter board I designed. 

Wire wrapping uses individual 30 gauge wires which I will solder to a header with an IDE connector. I found a gender changer for IDE that uses a small PCB between the two connectors. By unsoldering and removing one of the connectors, I would have solder pads to attach the wire wrap lines. 

I completed the removal of one connector and cleaned the pads, ready to attach wires. I do need to design and install a mount to hold this firmly near the backplane, otherwise someone yanking on the IDE cable might rip the wires off the backplane pins or bend the pins. 

SWITCH COVER MOUNTED ON VIRTUAL/REAL MODE SWITCH

I installed the red safety switch over the DPDT switch that converts the disk drive between real and virtual mode. To protect the disk cartridges and heads, we want to stay in Virtual mode unless we are exceedingly certain that we want to risk the actual heads flying on a platter. 

In real mode, the pick signal from the electronics activates the solenoid, lowering the heads onto the spinning disk surface and via a microswitch indicating to the electronics that the load is complete. In virtual mode, the pick signal is instead sent to the Arduino in my facility and the Arduino tells the electronics when the heads are 'loaded', even though they don't actually move down in virtual mode since the solenoid is not connected.

LOOP TIME TO EXPOSE SIGNALS FOR SCOPE MONITORING IS 25 MINUTES

The FPGA board has only a few exposed input-output pins that are not used by my logic, to which I can route various state information to be displayed on a scope. Currently I have access to only five of them, thus every time I have a new area I need to examine, I have to select five signals or signal states, then run the Vivado toolchain to reprogram the FPGA board. 

The longest step by far is the routing process. The tool synthesizes my VHDL into logic elements, it places these on locations across the board, then generates the signal routing between all the elements to complete the circuits I have designed. Routing hundreds of thousands of gates is a slow process, well over twenty minutes on a pretty fast 'gaming' laptop. 

This means that each time I watch the current set of diagnostic signals and form a hypothesis, another group of signals must be monitored to progress onward with debugging. That means another 25 minutes staring at the wall, waiting on the toolchain. 

PROCESS TO DRIVE RAM ACCESS IS COMPLEX

The memory controller for the DDR3 RAM on this board operates with multiple clocks at 100 and 200 MHz, actually slow compared to the speed this memory can attain. It produces a 50MHz clock that drives all the rest of the logic on the FPGA.

However, this means that signals are changing in different clock domains, an issue as they may be switching right at some boundary where a logic gate is being clocked. To deal with this, clock domain crossing techniques are necessary. In this case, I am using FIFO logic built into the FPGA chip which operates with different clocks on each side of the queue. 

Thus I need state machines to push a request into the request FIFO, another to see that a request entered and pull it out under the memory controller clock. A state machine sees that request and drives the signals to the memory controller to accomplish the read or write. 

The results of the memory access appear in the controller clock domain, so they must be pushed into a response FIFO by yet another state machine. The output side of the response FIFO, running in the FPGA logic clock domain, sees a response, pulls it and puts the data (for a read) on the appropriate register for further processing. 

These must all run appropriately and not deadlock, smoothly handing requests through to the memory controller and pulling out the responses. This all seems to work in simulation, but I wasn't able to accurately model the different clock domains and transitions for my simulation, as it works in sim but not in real life. 

SEEING ONLY ONE PASS THROUGH RAM STATE MACHINES

What I have observed so far is that we get one request for RAM access, which passes in as a request, the memory controller signals completion by the ram_done signal, and the request FIFO is reset to its idle position. 

However, there is only one request for RAM. We never move on to send the next read or write and are stalled somewhere. I am going to have to figure out a way to instrument the various state machines to my diagnostic outputs to see which state machine(s) are stalled. 

Sunday, August 21, 2022

SPI link debugged, transactions working; next up is testing whether RAM is properly written and read back

FINAL TESTS IDENTIFIED PROBLEM

As I though the issue was resynchronizing after the first transaction had ended. I spotted the issue which wasn't caught by the particular signal timing I had set up in the simulation. The fix was obvious and quickly made.

RESOLUTION TESTED WITH DRIVE ON AND OFF COMMANDS

My test setup sent a repeating sequence of transactions that flipped the FPGA between the SCspi and SCdrive states, that is with the RAM access controlled by the SPI link or by the drive emulation side of the logic. It was observable by the color of one of the color LEDs, where the blue color would be on when the RAM was set to SCspi. 

Indeed, it nicely switched between the two states, every time I started it up. I am happy that the link is working well and moved on to check out the access to RAM from the SPI link side. 

TEST SENDING DUMMY DATA TO RAM AND READBACK

I set up a routine that, during startup of the Arduino, will send a defined pattern to the RAM, a different pattern to a different location, then read back the original sector and verify it. If that works we know that we are accessing RAM, uniquely addressing it and the Load and Unload transactions are working at least to some level. 

The Arduino code that is testing the data coming back from the unload is indicating a flaw. The FPGA side is happy with the format of the transaction, thus not indicating any error. However, the Arduino side is giving the indication that the data was not what was expected. 

I looked superficially at the MISO and MOSI lines during the transactions - the pattern I saw coming back from the Unload transaction appeared to match what I had sent during the first Load transaction and not what was sent in the second Load. I was only looking at one word near the end of the transaction and therefore may have missed an error with the first or 321st words, or it may not really have matched.

I am back at my home working out better diagnostic information. I suspect that I will have to use the USB serial link to my laptop to view better information about what is coming back on the link. I will also switch the five diagnostic outputs of the FPGA board to signals that will show me key memory controller signals. 

Virtual/Real switch being wired into drive


Saturday, August 20, 2022

More battles with SPI link debugging; built bracket to hold the Virtual/Real mode switch for the disk drive

TWO STEPS BACKWARDS IN TESTING

Because of obligations I had elsewhere I only got to the shop late in the day to run the tests with my new and improved diagnostic outputs. When last I tested, the first transaction received was successfully executed, switching the FPGA RAM access to the drive from the SPI link. However, it seemed to hang up at that time and didn't indicate the subsequent transactions, nor execute them.

I thought I had made changes that would ensure resynchronization to avoid the observed behavior. When I hooked up and tested, however, I no longer saw the transaction command code latching at all. My diagnostics showed that the SPI transaction state machine was not reaching various states I had instrumented, nor was the state machine that assembled the two transmitted bytes into a 16 bit word. 

CAUSE WAS A MINOR TYPO, DISCOVERED AFTER INVESTIGATING SUBTLE MESSAGES

Because I mistyped the state of the SPI machine that would trigger the byte assembly machine to start, they were deadlocked. This is why neither reached the states I was monitoring. When I looked at the warning messages from Vivado when it was synthesizing my logic, I found messages that it was trimming SPI link FSM One-Hot register bits 18 to 2 from the design, as well as most of the FSM One Hot Register bits from the byte assembly state machine.

FSM is Finite State Machine, and it is implemented by setting bits in a register. The One-Hot method assigns one bit to each possible state of the machine, which speeds up testing to see if the machine is in any given state as only one bit is interrogated. 

This message was subtly telling me that my state machine only had a few states it could ever reach, therefore there was no need to implement hardware to hold the other one-hot bits. It doesn't say "Your FSM can never reach all the states", it refers to FSM One-Hot Register bits.

I did agree that it was going to deadlock and verified that in a simulation. I then corrected the typo, simulated to successful outcome, then synthesized the logic once again. When it was complete, I looked VERY carefully over the hundreds of somewhat obscure warning messages to be sure that nothing like this was occurring elsewhere in the logic. 

Next test session I hope to have more success. I have prepared a test routine so that if I can alternate flipping the FPGA between drive and SPI access to RAM, I can move on to write data into RAM in a few locations and verify that one of them is read back as written. 

MODE SWITCH FOR DISK DRIVE MOUNTED ON BRACKET BEHIND THE DRIVE

Most of the connections I am making to the internal disk drive of the IBM 1130 are passive, they simply detect what various signals are doing. There are two that are passive as long as I don't assert them by pulling them low - the raw read pulses I produce and the write error halt I inject if errors occur during capture of written data from the CPU to the drive.

However, there are some active changes that must be made for my Virtual 2315 Cartridge Facility to work. I must block the signal from the drive electronics that activates the head loading solenoid. I must instead route that signal (pick) to the Arduino. The microswitch that detects when the heads have physically been lowered onto the platter must be unwired from the drive electronics and instead the Arduino must cause the drive electronics to believe the microswitch has activated. 

I want the drive to be capable of working in either mode - Virtual or Real - thus there is a toggle switch with those two settings. In the Real mode, it connects the pick signal to the head loading solenoid and leaves the microswitch hooked to the drive electronics. The heads will really load down onto the platter in Real mode. In Virtual mode, the active changes in the prior paragraph are implemented so that the Arduino sees the pick signal and it simulates the microswitch closing. 

I am using an aircraft style toggle switch, one with a red plastic guard that protects against inadvertently switching the switch to Real mode. The operator must lift the red guard to throw the switch, otherwise it stays in Virtual mode. I chose this scheme because of the risk of head crash if the heads are actually down on the platter surface. 

I formed a bracket to hold the switch in a convenient location just behind the drive, then drilled holes in the machine frame for the mounting screws. I have begun wiring the switch to the drive side - the solenoid, pick signal line, microswitch and microswitch signal line to the electronics. I have not yet connected the wires that will go to the Arduino as they have to be connected to the level shifter board which is still in fabrication in China.