Monday, July 24, 2023

User interface test verification part 18

DISTRACTIONS WITH VISITORS HOWEVER STILL STUCK

I am finally able to put some time back into the effort to restore a U-boot that brings up my Linux image on the DE10-Nano board. I have been unsuccessful recreating whatever I discovered or did last time that produced a working version. Sadly, after some progress testing the design, I had to make changes that would require an updated U-boot and preloader. 

This is because it is the U-boot software that implements the pin multiplexing settings, starts up the DDR RAM and does other initialization such as the F2SDRAM bridge before Linux can begin booting. With changes to the chip configuration, the U-Boot files had to be updated. 

I did figure out something that regained the booting when I last built and installed U-boot, so somehow I will find it again. Alas I didn't write down what I discovered nor document all the permutations and adjustments I had tried during that journey, so I have only hazy memories to guide me this time. If you can all be patient, I will be relentless until I get this fired up again and can continue perfecting the design of the Virtual 2315 Cartridge Facility. 

Tuesday, July 18, 2023

User interface test verification part 17 - one step backward

DECISION TO REBUILD PRELOADER AND U-BOOT

The mechanism to create a working System on a Chip (SOC) using the Cyclone V involves a number of tools and software products that are hooked together with a patchwork of scripts and steps, versus an integrated tool that would accomplish all of it in one product. This means that there are important times when you have to repeat steps to reflect important changes else the SOC won't work properly.

Since I had made changes to the address widths of some parts of the HPS to FPGA Lightweight (H2F LW) bridge, it might have changed the four files that are produced by Quartus that have to be manipulated and transferred to the U-boot and Preloader software before those components are built. In particular, there is a file that sets the multiplexing state of connections between the sides of the SOC. 

I therefore had to locate the script that grabs these from the Quartus folders, massages them and saves them in the proper place in the U-Boot folders. I found one and ran it. From that point I issued a make command to hopefully correctly rebuild the files I needed. There is one file, u-boot-with-spl.sfp that is copied to a special partition on the SD card to allow it to boot up.

REDISCOVERING PATH TO A GOOD VERSION SINCE THIS ONE WON'T BOOT

When I tried to boot after updating the Preloader/U-boot on the card, it sat with no sign of life, just as it had when I first attempted to build these components. I had figured out the issues and eventually was able to make a proper version. I simply had to rediscover what I did in order to restore my system to booting up. 

So far I am not getting the Preloader to boot and transfer to U-boot. No sign of life at all. I have only a couple of days before house guests arrive and take all my time through to early next week. 

Monday, July 17, 2023

User interface test verification part 16

BOOTING AND SEEMS TO BE OPERATING PROPERLY

I chose to do some verification that my functions were still working before I dove into more general debugging.

DO THE KEY BRIDGES WORK? 

Indeed I appear to be successfully reading and writing over the H2FLW and F2SDRAM bridges. Response are reasonable and I do see the telltale LED turn off when my loop code in the FPGA advances after having completed a write to RAM of the block sent down. 

I starting walking through the code of a load with pauses to verify I had reasonable responses. If I have a problem with holding off the next block coming from Linux to FPGA while I am busy writing to RAM, I won't see it at this slow pace but that is easily checked by removing the pause statements later.

FIRST ISSUES DETECTED AND BEING HANDLED

I knew that the ARM and x86 systems were little-endian, which means that a 16 bit word that is a single long integer x01F6 will be stored as xF601 in memory. I didn't realize that it also flipped the two 16 bit words returned from a file access. In other words, the first two 1130 words in the virtual cartridge image file are x0000 and x0658 but when I fetch a block of 32 bits from the file I see them in memory as 0x5806 0x0000 meaning that both the bytes in a word and the words themselves are flipped. I had hoped that the two words in a block would be N and N+1 from left to right, not the N+1 and N I encountered.

I can easily fix this in the getdata module that reads and writes blocks from the Linux application over the bridge. I reshuffle the bytes so that the disk contents make sense from an IBM 1130, 16 bit big endian perspective. 

Next I discovered that my understanding of how to address the H2F LW bridge was flawed. Initially I intended that I would only send one 32 bit command word and retrieve one 32 bit status word, thus could ignore addresses. Lately I expanded it to allow me to interrogate up to three 32 bit values from the FPGA in addition to the status word, simply by using the associated address for my fetch or store in the Linux application. 

The addressing can be either symbol or word oriented, plus a given number of bits of address which constrains the maximum range of addressability. For the command/status channel of H2F LW bridge, I wasn't using the correct addresses thus when I tried to fetch my current RAM address from the second word of the channel I didn't get the intended value returned.

Once I got to full speed on this, I chose 4 address bits (1 of 16 addresses) with the address specifying the byte. My bridge channel transfers 32 bits (four bytes) at a time thus the relevant addresses for my intended data are b0000 to b0011 for the status word, b0100 to b0111 for the RAM address word, b1000 to b1011 for the cylinder address of the drive emulation, and b1100 to b1111 reserved for a future fourth word. 

I chose the Cylinder value based on a reader's suggestion that I consider showing the arm movement graphically while the drive is in operation. I am still not convinced this will be worth the effort, considering that the box will be hidden on the side of the machine inside its covers, but if I have spare time and can implement it economically, I will have the relevant data at hand in the Linux application. 

Friday, July 14, 2023

Battle with toolchain software part 15 - Linux now starting with modern loader and U-boot

LINUX BOOTS AT LAST!

As I suspected, relocating the loading points of the kernel and the device tree blob resulted in a normal bootup of the Linux image. At this point I expect that I will have workable bridges, but that is the next thing to check. There was a lot of minor tweaking of U-boot environmental variables as well, to get the card to the point where it autoboots successfully. 

I will have to add a command to have Linux mount the FAT partition automatically as that is where we store the virtual cartridge images and bitmaps for the project, but I mounted it manually and did a quick test. The user interface worked correctly on the LCD screen although I still need to debug whether my bridges and my logic is working properly. 

Battle with toolchain software part 14

ABSOLUTELY NO INDICATION OF LIFE FROM PRELOADER AND U-BOOT BECAUSE . . .

The configuration set up in the Altera opensource distribution of U-boot generates it to use flattened uImage tree (FIT) files rather than the separate kernel, U-boot and device tree blob files we used before. Without a proper FIT file the loader will do nothing - as it does supremely well in this case. 

At this point however I discovered that the recipes on Rocketboard.org and in the various repositories for the Terasic DE10-Nano board are not correct. They have me configure for the generic development board (socfpga_cyclone5_config) instead of the socfpga_de10_nano_config that relates to my board. I redid the build of the preloader and U-boot with the correct configuration and that gave me some signs of life.

U-BOOT BEGINS BUT DOESN'T CONTINUE TO RUN THE SCRIPT NOR BOOT LINUX

I was delighted. Preloader came up, initialized everything and then branched to U-boot. I got the proper messages consistent with prior successful boots except that it stopped at a prompt rather than trying to autoboot. This means that the problems are solely in the script and/or the environmental variables for U-boot. Both of these are easy to change.

I will use an old SD card to boot but stop autoboot, then list out all the environmental variables. In addition I will list out the contents of the working u-boot.scr script file. The only challenge I will have is that I must modify things to address the change from a 2017 version of U-boot to the current version I am working with. Since the documentation is sketchy and not set up to educate someone, it will take some poking into source code and experimentation. However, I hope to succeed incrementally.

INCREMENTAL PLAN FOR U-BOOT OPERATION

First I want to check that my commands to load the FPGA bitstream into the hardware will work, so I will manually issue them at the prompt. This is the first action we take before we start enabling bridges and before we try to load Linux.

Second I will work on the u-boot.scr script to make sure it issues its commands, first by manually proving them and then altering things until this executes when we start up. This is the key to the rest of the bootup sequence.

Third I will work on the changes I need to issue the modern bridge enable command instead of the broken bridge-enable-handoff version from 2017. This is key to being able to use the F2SDRAM and H2FLW bridges from my design. 

Last I will manually test the commands that must be issued to start the Linux bootup, get them configured in the proper environmental variables, manually trigger them once to see Linux come up and then verify I get an autoboot from powerup to login prompt for Linux.

CHANGES THAT HAD TO BE MADE

The environmental variables for the new U-boot does not have a bootcmd variable set. Instead it depends on the script file to execute a script that searches in order and boot from different device types. It also has no capability to fetch my FPGA bitstream file and load it as it should. 

I added some scripts to do what I wanted and tested them. I am now able to find and run the u-boot.scr file, which in turn correctly finds and loads the FPGA bitstream file. We also find the proper device tree blob file. 

REMAINING ISSUE IS COMPLETING THE KERNEL STARTUP

The commands work and begin to run the kernel initialization which starts to bring up Linux, however it stops partway through. I believe this is due to a change in the location in RAM where the old scripts stored the kernel image and device tree blob, compared to the higher location used by the new version of U-boot. I suspect that the kernel initialization is stepping on some of U-boot or vice versa because of the changed addresses. I will modify them and see what happens, but I am pleased with the steady progress. 



Tuesday, July 11, 2023

Battle with toolchain software part 13

FINALLY HAVE EVERYTHING BUILT

I completed the work to get the preloader and U-boot compiled, ensuring it pointed at the output of my SoC generated system. I also worked to ensure that I had a proper device tree blob generated - it kept getting regressed or picking up some junk from a demonstration program that somehow got onto the board files.

I then built the boot script. The U-boot as generated will start up and look for a file called u-boot.scr in the FAT partition of the SD card. If it finds it, that is executed. Here I find and load the FPGA bitstream soc_fpga.rbf from my generated file. I run the new command bridge enable which in principle will start up the H2F LW and the F2SDRAM bridges for me before we boot Linux. Then there are the commands that grab the zImage kernel image, the soc_fpga.dtb device tree blob, and begin to execute the bootup of Linux. 

For some reason the web of nested Makefiles for the preloader/u-boot insisted on grabbing the wrong device tree file, not the one that I built to match my project. It took a couple of hours to detangle and understand it all before I was able to generate the full and complete system with the correct device tree along with the proper other files. 

These various files were copied onto the SD Card in the appropriate partition. 

The A2 partition holds the preloader and U-boot files. This is executed by the chip hardware bootloader, with the preloader running in the under 64K of RAM available to the hardware boot function. It in turn configures SDRAM and puts U-boot into RAM before branching to it. 

The FAT partition holds the  zImage,  soc_fpga.dtb, soc_fpga.rbf and u-boot.scr files. 

The ext4 partition is the root partition of the Linux system once it is running. 

TESTING AND DEBUGGING THE BOOTUP ELEMENTS - TO NO AVAIL

I first powered up with the new sd card installed, which prior to my update had booted properly into the console only linux. Nothing happened at all, no messages on the console showing the startup of u-boot. I carefully checked over everything - everything appeared, and I stress the words appeared, to generate correctly and I installed them on the proper partitions of the SD card. 

No messages on the console, which would have appeared early on as U-boot executed. No signs of flickering on the board. Complete radio silence. Apparently I am going to have to dig through hundreds of source files and many configuration and Makefiles to see if I can figure out why the board no longer brings up U-boot. I am not even stumbling over Linux issues, and miles and miles from the bridge activation challenges before I began this awful slog through the swamps of preloader and U-boot. 

HAVING A MANUAL ABOUT U-BOOT WOULD BE NICE

If in fact I had a manual that was even in near sync to the version of preloader and U-boot I tried to generate, I would dig through it to understand what changes might have been made and the necessary files and configuration options. What I have instead is mountains of open source code and reams of mostly useless web pages generally covering much older and changed versions. 



Monday, July 10, 2023

Battle with toolchain software part 12

UPDATING A COPY OF AN EXISTING SD CARD IMAGE

I chose the console only version of the images instead of the ones that attempted to fire up the graphical desktop over HDMI and USB-OTG ports. I guessed that there will be less to fight to get Linux running properly for this image.

I had to find the best possible source of preloader and U-boot files, as well as the best advice I could garner from among the many dozens of contradictory and mostly defective guides on Rocketboards.org, github and other repositories.

I did manage to generate the preloader and U-boot just as I had expected. Tomorrow I will move that the the copy of the console only system and begin the testing to get it to boot up. This version has the newer bridge-enable command instead of the bridge-enable-handoff command from the old version which didn't properly set up my bridges. 


Battle with toolchain software part 11

RESOLVING SD CARD IMAGE

I first worked out the right incantation to have the Linux build understand which processor I am using. That added the ARCH_CFLAG entry to add -march=armv7-a for the compilations. Everything built so i was on to the next step, which is creating the root filesystem.

Now, it strikes me as sketchy to build the root filesystem with all its libraries and binaries without it being closely synchronized to the kernel I just generated. However, I began working on it. Oddly, it is set up with a Makefile that depends on a independent gcc instance rather than the gcc that is part of the Soc EDS toolsuite. I couldn't trick it into building with the software on my Ubuntu system. 

Building root filesystems for embedded Linux systems is apparently done using one of several tools designed for the purpose, including Yocto and Buildroot. One might imagine that if a Linux kernel needs to be recompiled in order to handle changes made to the SoC, all one need do is recompile the kernel from the source of the known image, drop it in and go. Instead, apparently, it is expected that you have to create an entirely new Linux system on your own. Amazing to me that one doesn't have the 'golden' Linux image to change minimally. 

I therefore had to download and install the Linaro ARM toolchain just for this step, then build the root filesystem for my embedded Linux system. Sounds easy? Well, Linaro no longer hosts this. Clues led me to ARM who hosted some versions but not the one that supported Linux with hard floating point, thus to a secondary spot where only one release was available. The configuration for building the root filesystem with buildroot requires me to enter choices from a narrow range of GCC versions, kernel head file versions and so forth, not of which are available with the toolchains and downloads now available.

Next up as I configured the buildroot configuration, the recipe describes running a make command to configure the busybox, which is the set up startup scripts and actions that the new Linux system will execute on startup. However, error messages and no sign of busybox anywhere in the downloaded files. 

My level of confidence continues to sink into the abyss given all the mismatched versions, discrepancies and deviations. This seems like a fruitless journey that is certain to lead to a bad end.

THROWING AWAY THE RECIPES AND STRIKING OUT ON MY OWN

The difficulty with taking this approach is that it forces me to deeply and fully understand every bit of what has to be in place for the embedded Linux to run properly. Way further than I do now and frankly much more than I really want to have to deal with. 

I have a few Linux images on SD Cards that are self consistent, they just won't boot up correctly with my FPGA programming and give me the H2FLW and F2SDRAM bridges in operational status. There are a number of items on the card but not every part should need to be regenerated or built from scratch. 

  • Linux kernal
  • Root file system
  • Startup scripts and commands
  • Packages and libraries
  • Preloader
  • U-boot. 
  • U-boot variables and commands
  • U-boot start script
  • FPGA bitstream
  • Device tree blob
  • Flattened tree compilation of kernel and DTB possible

The kernel, root file system, startup, packages and libraries should NOT need to be recompiled. There is no new device driver or new function that needs to be added. 

The device tree blob is what links the components on the DE10-Nano board with possible Linux drivers. That is updated because of the changes I made to the System on a Chip (SOC) design using Platform Designer. 

The FPGA bitstream contains my new FPGA design so that is changed. There may be some changes to the U-boot start script and its variables and commands to support my new files. 

I have learned that the preloader and U-boot themselves must be updated as you make changes to the SoC design. My F2SDRAM and H2F bridges did not come up properly because the preloader and U-boot weren't correct on the existing SD card images. 

Therefore I should only need to build the preloader and U-boot, then tweak the variables, commands and U-boot start script. I also generated a new device tree blob representing my design. Nothing else should have to change. 

To test this theory, I scrubbed away all the attempts to build kernels and root file systems, giving me a clean environment to work on building U-boot and preloader. I had previously created the device tree blob and FPGA bitstream as part of the generation of my design. Once I have the two systems generated, I will copy an existing SD card image, replace just the FPGA bitstream and device tree blob before I update the preloader and U-boot. 

Fingers crossed, this will give me a running Linux system, with working H2FLW and F2SDRAM bridges, on a board that has my FPGA logic loaded. That is the work for the rest of today. 

Sunday, July 9, 2023

User interface test verification part 10 - death battle with everybody else's software in order to run mine

GETTING THROUGH THE PROCESS WITH A BOOTABLE SD CARD

I had to resolve many discrepancies between the various guides on the official Rocketboard.org as well as others. Some mention a filter script that must be run on the output of Qsys/Platform Designer, others do not. Some imply that the preloader builder (bsp generation programs) are included in the SoC EDS (System on Chip Embedded Development Suite), while others have me explicitly download code from various locations including Github. Similarly, the U-boot code has multiple sources. 

When I followed one set of recommendations the make command failed with a warning that I had to read the documentation for generating it for new chips but not the Cyclone V SoC. I don't know what versions are appropriate and of course need to be sure the versions of these piecemeal fetched bits of code are compatible with each other. 

Here is an example of the quality standards of the SoC tools. This is from the Intel website covering error messages using one of the tools necessary to create a bootable Linux image.

Description

When you are generating the device-tree source file (.dts) for an SoC HPS hardware design, you might see a large number of spurious error messages. The following list shows some of messages produced by sopc2dts:

sopc2dts --input soc_system.sopcinfo --output soc_system.dts --board

soc_system_board_info.xml

Failed to find h2f_lw_reset

Failed to find f2h_reset

Component hps_0 of class altera_hps is unknown

Component hps_0_fpga_interfaces of class altera_interface_generator is unknown

Component hps_0_hps_io of class altera_hps_io is unknown

Component hps_0_clk_0 of class hps_clk_src is unknown

Component hps_0 of class altera_hps is unknown

Component hps_0_fpga_interfaces of class altera_interface_generator is unknown

.

.

.

DTAppend: Unable to find parent for compatible. Adding to root

DTAppend: Unable to find parent for leds. Adding to root

DTAppend: Unable to find parent for pmu0. Adding to root

Resolution

No workaround is necessary. You can safely disregard the warning messages and proceed to compile your device tree sources normally.

Ignore the heap of warning messages from the tool. Trust us, it worked okay. Hmmmmm.

Eventually I made it through to the point where I would build the Linux kernel. I selected the oldest production version of the source on the Altera github section to maximize the chance that it will work with Cyclone V. Doing the build brought up a problem that after googling turned out to be code problems in that Linux version that needed some changes to be made in a couple of source modules to overcome. It was a linking problem for the device tree compiler, essentially duplicate variable names being pulled into one final module. Judicious application of 'extern' is the 'patch' I found. 

Later however I ran into an issue with the build process. The recipes and comments suggest that all I need is to include the keyword assignment of ARCH=arm but there are many different Arm architectures. This was highlighted when the kernel failed to build because some instructions it wanted to use were not part of the default Arm architecture. These are indeed supported on the Cortex A9 cores on my board. 

If this was a simple makefile, adding -march= with the code for the A9 would fix things, but building Linux is a convoluted mess of makefiles and other scripts. Finding a place to add that where it will take effect for all compiled modules is the challenge. Couldn't find any direct documentation for this. It will be the subject of tomorrows day of unnecessary labor, as I have other things to do for the rest of today. 


Friday, July 7, 2023

User interface test verification slog part 9

FIXING THE RAM BRIDGE PROBLEM - A LONG TALE WHOSE END IS NOT YET WRITTEN

I did come across a snippet that suggested an alternate set of commands during U-boot to turn on the F2SDRAM bridge. It still branches to that code in RAM which terminates, suggesting it isn't correctly loaded, but I was hopeful it might have gotten far enough to enable some RAM fetches. Instead, my test program blew up because the H2F LW bridge is now disabled. Somehow I can't get both of the bridges working simultaneously. 

Since the defect is in the U-boot startup, I then dove into the rabbit hole of regenerating the preboot and U-boot for the design. In order to do this I had to install quite a few tools that weren't previously on my Linux system. Bison and FLEX, libSSL-dev and according to a few bits of documentation I would need the older Python 2. Then I had to fetch the U-boot source from Github and work through generating everything needed. 

Eventually I go everything to cleanly 'make', however the results did NOT include the bridge_enable_handoff and related scripts that actually enable the bridges. More surfing and extensive reading at Rocketboards.org which looks suspiciously like a dumping site where Intel tossed all the System on a Chip stuff they bought from Altera when they lost commercial interest. Not saying it is, but . . . 

Some of the articles suggest that part of the process is a python script that will modify the U-boot source tree based on the particular SoC design I have, reading things from a folder in the project that is handoffs from Quartus. Sounded promising until I went to the directories where they said the python scripts would live, and found NOTHING. No scripts. All I can find in the official 'documentation' are references like this that are contradictory between posts and document pages.

One can take the Golden Reference Hardware Design, an example, and generate it EXACTLY as it sits with ease and success. Similarly there are some examples from Altera and from Terasic that other hobbyists simply implement like a formula, without modification, and post on YouTube as videos teaching how to use the SoC environment. Deviate one step and you are on quicksand without enough information to leap to safe ground. Enough ranting, back to my experiments, digging and hoping.

I will hunt to find the source for bridge_enable and bridge_enable_handoff scripts. I will hunt to find the source or at least explanation of what mysterious code is copied to a location in RAM by some other mysterious code or process such that it can be executed from the aforementioned scripts. If I can figure this out I may be able to rescue my current boot SD card or generate an alternative that will work for me. 

At last I have enough clues to determine that the preloader is what should have put the proper instructions into RAM at that address that is branched to by U-boot purportedly to configure and turn on the F2SDRAM interface. What this tells me is that I will indeed have to generate the ENTIRE chain of preloader, U-boot , Linux kernel, device tree blob and finally the flattened tree blob for this project. 

Secondarily, other clues led to the realization that in spite of what Rocketboard claims, I don't want to use the official U-boot stream from github but instead the special Altera version. This is where the bridge_enable_handoff and other statements are produced. 

WORKING WITH OTHER SD CARD BOOT IMAGES

The other images provided by Terasic are for desktop versions, meaning that it uses the HDMI output to provide a full graphical interface on a monitor. The problem with this is that the HDMI hardware is connected to the FPGA side of the chip not straight to Linux, thus it expects logic in the FPGA to handle this. That logic is proprietary IP and thus not available for me to integrate into my design, even if only to ignore it. 

Tantalizingly, these versions use a newer U-boot that has the improved bridge_enable command which MAY not blow up. It appears to turn on all the bridges between HPS and FPGA sides of the chip. The challenge is getting this boot of Linux to coexist with my FPGA load of logic. 

I spent about an hour hacking various U-boot environmental variables that control the boot process, after swapping in my FPGA output file for theirs. The code to initialize the HDMI interface tried to run and toggle various signals in the FPGA side, probably resulting in a hang. I tried to disable the configuration of the HDMI.

There are hang messages from the port into which the keyboard and mouse are plugged - which I am not using. I might be able to bypass this with physical items, but the device tree for my system does not have the kb and mouse connections hooked to the proper driver. I may have to blend the device tree blobs to get one that will match the Linux kernel but also my FPGA logic and my application requirements. A lot of 'if' involved here but I will keep plugging along. 

I do have access to the serial port on the UART to log in and control the Linux image, at least, if I can get it to start with a hacked up SD card in my desired configuration. There are still many permutations of the different variables I can change to try to get Linux to start up with my FPGA logic in control. 

The device tree blob includes all the peripherals in use, sets them as enabled or disabled and associates drivers to them. Perhaps I can modify the tree to shut down the use of HDMI and the USB OTG port that supports the mouse and keyboard. If it comes up without those, presumably it won't do bad things on pins of the chip that could affect my FPGA logic.

The results are the same - the flattened tree blob is invalid. This is yet another file that is used by U-boot and generated by building U-boot from scratch. I don't think there is any way to deconstruct and modify an existing one simply to update the device tree blob included within it. It appears that I will have to generate this at the tail end of my soup to nuts, ground up generation of everything. 

BUILDING THE SYSTEM FROM SCRATCH

I have found a fairly close guide, but it diverges sometimes from what actually occurs or where files are stored on my system. I downloaded and began generation of the preloader, the device tree blob and the U-boot loader. Generally I will sail along getting exactly what is expected until we run into a wall.

This guide was written to make trivial modifications to the "Golden Hardware Reference Design" which appears to be 99% of the universes approach to working with this SoC - to hack on one of the few designs set up as magic incantations blindly followed, rather than actually documenting what is needed to create real work de novo. 

Thus one of the bumps in the road was the recipe suggestion to run the sopc2dts tool of the embedded SOC development kit, which purports to take the design information created by Qsys and Quartus and produce the device tree source. This points to two XML files which provide the key descriptions of all the components on the Terasic DE10-Nano system that are not inside the Cyclone V chip. Most of the peripherals like SPI, I2C, USB and so forth are described in these. They were not put into my project at all by Quartus!

I have to go back to the GHRD demonstration folder, find the two XML files and bring them over manually. Only then can I spit out a device tree source file. Alas, as soon as I tried to turn it into a blob (using the device tree compiler), I received errors about led_pio objects that were invalid. These are the signals set up for the GHRD project but not in mine or in the basic SoC system. I stripped them out and could get a blob file that is hopefully correct.

Parts of the process of generation involve selecting which tree of the github project you want to use - I selected the latest production release of U-boot for example, but when I try to make it I get messages that tell me it is suitable for the successor Stratix and Atria SoC chips but not the Cyclone V. Damn. Now I have to back all this out and figure out the last production release which supports my chip, then redo the work to build my preloader and U-boot.

At each step, guided by lots of rote recipes and documentation which simply points to the GHRD for everything, I run the risk of having picked the wrong release or missed something needed for a real project that is not included in GHRD. Further, all these pieces have to work together properly in order to have an SD Card which boots up a workable Linux with my bridges operational and all the functions like SPI operational. What are the chances?

Wednesday, July 5, 2023

User interface test verification failures part 8

MANAGING THE WAITREQUEST BETTER TO AVOID STALLS

I did a lot of thinking about how to increase ruggedness, detect and properly recover from errors and manage the synchronization between the Hard Processor System (HPS dual ARM Linux system) and Field Programmable Gate Array (FPGA) sides of the chip. 

I have careful control over the waitrequest signals as they can stall the ARM processors if the application attempts a read or write and the signal is held high. I ensure that reads and writes over the bridge channel don't stall, but are just thrown away if we are not actively in a load or unload operation. 

I developed a method of returning the loop count in the status word on H2F LW channel 1 to let the application on HPS determine when to issue its read or write on H2F LW channel 2. 

All this was useful but to work the bridge to RAM must work first. 

CAN'T GET ANY BRIDGE BUT H2F LW OPERATING - DEFECTIVE ENVIRONMENT

The default for the System on a Chip is to have the HPS to FPGA Lightweight (H2F LW) bridge operational. The boot process for the HPS involves several steps - preloader, U-boot and then Linux bootup. The Terasic DE10-Nano environment and the toolchains from Altera/Intel come with Linux images, preloader and U-boot set up. 

They don't work right. In particular, the version shipped executes a command in U-boot called bridge_enable_handoff that should switch on the bridges we are using in our implementation. It terminates in the middle of running and never finishes turning on bridges. 

I thought I had overcome this by modifying it to skip the troublesome step in the command, but I have been absolutely unable to communicate over either the H2F (full width HPS to FPGA) or F2SDRAM (FPGA to SDRAM) bridges. They just won't work. 

While there are ways to kick on the H2F bridge, supposedly, while in Linux I couldn't get that to work. This is why I moved to using dual channels on the H2F LW bridge that is indeed operational. Sadly there is no easy way to replace the F2SDRAM bridge with another channel of H2F LW nor to directly move my image file from the Linux application to the last 1MB of physical SDRAM. 

Newer versions of U-boot for Cyclone V chips like mine changes that to a bridge_enable command. It might work better. The issue in the existing command (kind of a script actually) is that it branches to a routine called FPGA_SDRAM_APPLY that should be code copied into a fixed place in RAM which will configure and turn on the SDRAM bridge. When we branch it terminates, although giving an unhelpful return code of 0. Since it terminates, the remainder of the bridge_enable_handoff script isn't executed. This means no H2F bridge and of course no F2SDRAM bridge. 

It appears that I will need to create my own bootloader and U-boot just to get the darned bridges operational. More time spent fighting tools I am not skilled with, grabbing github repositories, compiling objects and building the parts that should have been black boxes for my use. Kind of like having to use lathes to form your own screws and nuts because the basic parts don't work. Aaarrrrrrgggggghhhhhh.


Sunday, July 2, 2023

User interface test verification part 7 (this one not actually testing but refactoring)

REDESIGNED AND IMPLEMENTED THE BRIDGE ASSOCIATED FPGA LOGIC

I put in a dozen hours working on the most bulletproof schemes for transferring data over the H2F LW bridge as well as the F2SDRAM bridge that accesses the last megabyte of RAM. One weakness was the need to complete two H2F LW transfers for each F2SDRAM transfer since the bridge widths were 32 and 64 bits respectively.

I shrank the F2SDRAM width to 32, which triggered a warning that this is suboptimal for throughput, but it allows a very clean 1 to 1 handoff as I move data from one bridge to the other in the load or unload loop. The loop iterations doubled to 260,625 but I am now only pushing one transaction on the H2F LW bridge in the loop body. 

The small change of width also changes the addressing mechanisms for the RAM. Where before I transferred a 29 bit address to grab a block of eight bytes, I now have to send a 30 bit address to retrieve a block of four. That is, the bottom two or three bits of the RAM address are actually pointers to the byte within one block, and the addressing excludes that since you get the entire block in one transaction. Thus a 32 bit address becomes 29 for the 64 bit width, 30 for the 32 bit width, and would be 28 if the memory bridge were widened to 128bit. 

I calculate the relative 1130 word (a 16 bit value) within the cartridge by the cylinder, head, sector and word within the record. This is only 16 bits and our memory is fetched in units of 32 bits, e.g two 1130 words at a time. The bottom bit, therefore, of the calculated relative 1130 word is dropped since that is simple selection logic for which half of a 32 bit block contains the desired word. 

The design of the bridges is such that they can transfer one block in every clock cycle and can stream continuously. The only way to hold off a transfer is for the slave side to raise the waitrequest signal. Essentially we will see the read or write request but the other side remains frozen until the clock cycle when we have dropped waitrequest

My logic has to correctly handle the waitrequest signal to keep the applications running on the Linux processor side in sync with my logic in the FPGA. My prior state machines had points where there were extra cycles that could allow a write from the application to be presented and dropped by my side. 

My solution was a dual state machine that worked within the single cycle constraints of the Avalon Memory Mapped bridge protocol used with the bridges. One grabs or sends data in a single cycle. The other raises waitrequest immediately to hold off the master (Linux) side until we are sure we processed our transaction. Thus I am throttling the bridge allowing data in only as fast I can am ready to handle it. The state machine managing waitrequest issues a signal that a read or write has occurred but doesn't drop waitrequest until the requesting logic raises its own signal that it received our results. 

The load and unload logic is a big loop in the FPGA which should be mirroring the loop in the application on Linux, running 260,625 times to move an entire 2315 cartridge image between RAM and the Linux side. The FPGA is the slave to the H2F LW bridge, since the Linux application controls when we load or unload. This means we issue the waitrequest up to the application. 

The RAM, on the other hand, has the FPGA as the master for the F2SDRAM bridge. That means that it is the RAM circuitry that raises or lowers its own waitrequest, to tell us that a particular attempt to write or read RAM is not yet complete.  The number of cycles is variable, a characteristic of DRAM, which is why we depend on the waitrequest signal. 

Load is conceptually just a matter of having the application write a block to us, which we then write to the RAM. The RAM may take a while to complete the write operation, but the Linux application may have already attempted to write the next block down to use on H2F LW. Thus, we need the waitrequest throttling to freeze the Linux application until we know we finished putting the last block into RAM. 

Unload is the inverse, where we read a block of RAM and then have it ready to provide to the application which is reading from us. Note that in this later case, the application may be issuing the read way ahead of when we have the data from RAM. We need the throttling of the H2F LW waitrequest to make it freeze until we are ready to complete the read with the data we have received from RAM. 

The bridge state machine that handles the H2F LW waitrequest is the key to this working properly. We have told the load logic that we have gotten a write from the Linux side, it takes the data and issues the F2SDRAM write, but we don't confirm back to the H2F state machine that we got the written data until we see the RAM write complete. It is at that point that we confirm to the H2F LW state machine that we got the data and it correspondingly drops waitrequest

For unload, we are always prefetching the next block from the F2SDRAM bridge and then waiting for the H2F LW side to inform us that it has had a read request. We only allow the waitrequest for H2F LW to drop if we have finished our prefetch. 

In all cases, unless our logic handling a load or unload loop is active, at most we will get one read or write transaction presented by the dual ARM processor running our application, because as soon as we see either the read or the write we freeze the processor using waitrequest until we can satisfy that transfer request. 

For a write down from the application to the FPGA, we don't accept the block contents until we release the waitrequest by sending the confirmation that we have used it (getdata_grabbed). In a load loop which is where the application does the write down to us, we would already see the contents of the block, pass it along to write it out to RAM using F2SDRAM, and only release the waitrequest after the RAM has successfully completed the write. This means that my state machine processing the read or write on H2F LW bridge must register the written block as soon as we see the operation and not wait until we let it complete by dropping waitrequest. This is a subtlety of the design that must be considered as it has to fit into the larger interlock scheme where the loop is being implemented.

For a read by the application seeking to get the next block that had been extracted from RAM, we have to first request the block over F2SDRAM and wait until it is complete. At that point we set up the data and release the waitrequest to let the read operation over H2F LW complete. All of this logic for read and write by H2F LW has to consider that the master is the application and it is able to issue a write or read at any time relative to where our state machines might be inside the FPGA. Thus we always have to see the request, freeze it, and only release at the end of each loop iteration of the load or unload. 

The state machine for F2SDRAM is easier than for H2F LW because we are not the side that generates the waitrequest when the slave is not ready to complete any transfer. It is the RAM controller circuitry up in the Hard Processor System (HPS) side of the chip that does this and has implemented a slave that can read or write to RAM independent of the accesses by the two ARM processors or other mechanisms using DMA. 

As there are multiple users of RAM and because DRAM itself has periods when it is refreshing cells, the time it takes to complete a RAM access is not predictable from the FPGA side. Thus waitrequest is essential to honor in the state machine. It is our state machine for F2SDRAM that heeds the signal.

All we have to do is set up a block of data and address, then raise the write signal to F2SDRAM. We keep that raised as long as we see waitrequest asserted, then we drop it and our write transaction is complete. If reading, we set up the address and raise the read signal. We keep it asserted as long as the waitrequest is active then we wait for the readdatavalid signal is sent by the slave on the HPS side. This tells us it is time to latch the data as the read from RAM has completed. 

For safety reasons, I can put in a mechanism to unfreeze the ARM processor by forcing waitrequest to drop. The challenge is to find a way to detect a failure where we are in a deadly embrace, drop the request and then inform the application that the load or unload has failed so it doesn't corrupt the virtual cartridge file or continue attempting to do a load or unload. 

The reason this is challenging is that the application thread is frozen - it executed a store or load instruction to a memory address that is mapped to drive our H2F LW bridge. The processor hangs and thus the application can't process interrupts or read status over the over channel of the H2F LW bridge. We don't have a good means of informing the application that its most recent transfer request failed and it needs to abandon efforts. 

The easy part is implementing something like a timer that is fired off when we start a loop iteration and reset when we finish each iteration. Detecting a hang is not the problem, therefore, it is getting the application in sync with the failure condition so that it doesn't blindly continue to attempt the load or unload. I will be musing over methods to make this happen reliably, because that is required before I implement this watchdog mechanism.