CROSS COMPILING CONFUSION
I am using a laptop to build the software for the DE10-Nano board, thus we are cross compiling since my laptop has an x86 processor but the board has ARM Cortex A9 processors. This should be straightforward since Intel/Altera provides the SocEDS tool that is the development environment for cross compiling for boards such as mine. The tool starts a shell terminal window that does the cross compilation automatically. When I build my applications to run under Linux on the board it properly compiles them for the ARM A9 chips.
Apparently the build process for U-Boot software steps on this somehow and makes different choices for cross compiling which produced code that won't execute on my board. ARM chips come in a vast number of versions and feature sets, thus one must be very explicit about the chip used so that the compiler and assembler use the proper ARM instructions.
On my laptop, the SocEDS tool runs under Linux in a virtual machine and has the cross compile tools installed using prefixes. Thus, the C compiler GCC has versions such as arm-none-linux-gnueabihf-GCC, arm-none-linux-gnueabi-GCC and many others. The cross compilation process has to set up the right prefix such as arm-none-linux-gnueabihf- in order to work properly.
The Cortex A9 chip has hard floating point, meaning that it implements ARM instructions to do floating point operations. This is indicated by the last two characters hf in the prefix. The prefixes without those letters are 'soft floating point' meaning that the code calls a subroutine to perform the operations without using the hardware instructions. The subroutines must be provided somehow in the module in order for those calls to run successfully.
Also, the code generated by GCC and set up as executable modules differs depending on whether the code will run under Linux or runs standalone. Under Linux there are system calls which provide services the code can request, but they do not exist or must be explicitly created for standalone use. This is the meaning of the linux in the midst of the cross compilation prefixes.
Each variant of ARM processor version, hard versus soft floating point and whether Linux or standalone requires its own version of the tools for the cross compile. This can involve installing multiple packages as they are not all provided out of the box by the SocEDS tool.
Once I recognized the necessary version of the cross compile programs, I made sure that I had those installed on my linux system and set up the cross compile environmental variable to point to them. This was not enough to restore U-boot to operation but important.
DEEP GOOGLING TRIPS OVER A VAGUE HINT
In one of the myriad of conversations I read about those having issues with the preloader (SPL) of U-Boot hanging up without printing anything, there was a mention of two reasons that the SPL might go into a loop - incorrect specification of memory or the pmmu details in the device tree. This highlights that there are in fact more than one device tree for my design, the second one hidden inside the process to make the preloader.
A device tree is a structure of keywords that the Linux kernel uses to identify the various features, facilities and components it has, understand the addresses to control them and load the appropriate software drivers. It is created as text, the format dts, then it is compiled by the device tree compiler dtc into a binary format file dtb.
Why not just use the one true device tree for the preloader, you might ask. Well, the preloader runs out of very limited RAM in the processor chip before the SDRAM main memory is initialized and U-boot is copied into that SDRAM. Thus, there is precious little space available for a full device tree.
A Rube Goldberg (Heath Robinson for my UK oriented readers) mechanism takes the compiled dtb file and runs a binary time version of the grep command (fdtgrep) to strip out all entries not needed for the preloader and U-boot. This essentially opaque process strips down the device tree to its smallest in order to include it with the preloader and U-boot. Apparently there are comment tags in the device tree source file dts that indicate what must be included in the preloader version. These tags are set up by Altera/Intel in the files they provide with their toolkits that should in theory produce a fully usable stripped down dtb.
The device tree compiler dtc can be run in reverse to convert the binary dtb back into source form dts files. I did this, receiving error messages about faulty formats for the memory and the serial port entries. Both of these are extraordinarily central to the preloader initializing RAM, loading U-Boot and printing the first of the boot time messages on the console.
I therefore have to figure out what might be wrong with the stripped down file, correct it, and see if I can just build the preloader and U-boot from this point forward. All this assuming that this, perhaps the one hundredth possible cause I had to investigate, is actually the reason I can't run U-boot.
COMPARING OTHER BOARD CONFIGURATION DTBS TO THE DE10-NANO VERSION
I chose to configure and make the software as if I was running on the Intel Cyclone SDK board rather than the Terasic DE10-Nano. I am hoping that whatever fails for the DE10 will not occur on the most mainstream board they have.
Comparing the two, however, didn't reveal any obvious differences or issues. so I am still nowhere in the quest to get the darned thing to boot up. All I need is to get U-boot to work, the rest is something I can work out.
NEXT WILD STAB - PERHAPS THE ENVIRONMENTAL VARIABLES CAUSE CONFLICT
The environmental variables used by U-Boot are saved in flash memory on the board. My next desperate attempt to figure out the problem involves looking into or resetting the values. What I suspect might happen is that some value specifying a load address is clobbering the preloader before it is done or failing to branch to the properly loaded U-Boot.
To fix this, I will use another SD card I have that boots up, use it to restore the environmental variables to their default state and then retry my failing SD card. Nope. I did write down the loading addresses it would set in the environmental variables.
FOCUSING ON DEVICE TREE PROCESS
The process of creating a device tree that can be used by the U-boot as well as Linux involves many many convoluted steps in the toolchain. Some processing has to occur to take the output of the Quartus toolchain and fold in other entries to create the source tree. It then has to be somehow merged into the U-Boot building process.
Complicating this is the fact that the preloader/U-Boot works with a stripped down version of the device tree. Further complicating debugging is that the creation, modifications and use of the device trees are in its binary form (dtb files) thus at every step I am investigating I have to run the device tree compiler dtc to back out to a representative source (dts file) version.
The files are built by compiling, applying macros, running scripts, running the fdtgrep tool and generally shuffling things around like a skilled practitioner of Three Card Monty. If indeed my U-boot won't run because the stripped down dtb file created for it is defective, then to uncover the source of the defect I have to inch my way backwards.
The recipes and how-to guides suffer from the massive problem of procedural evolution. That is, the Rube Goldberg processes involved in the device tree and other data slithering from Quartus to the U-Boot programs have changed many times. Some use a BSP editor. That went away. Some use shell scripts, other versions use python scripts. Some of these take files needed to set pin mux tables but not the device tree. Other things fill out and move the device tree information.
This renders the system a nightmare to work with, since every reference document, recipe or script says something different. I am going to have to reverse engineer the whole process and perhaps even the entire code base of U-boot in order to produce a loader that will bring up my board to let me in turn boot Linux.
I was just wondering whether you think a different CPU/FPGA combination (assuming others are available) might have a more reasonable tool set? Not that you would likely want to change now, but just as something to think about for the future.
ReplyDeleteI first was working with the Xilinx toolchain for the FPGA side and Arduino for a standalone microprocessor, then switched to the Altera/Intel toolchain. Two of the major toolchains.
ReplyDeleteIt's been long enough that I've forgotten. I take it the Xilinx/Arduino combo was about as bad?
ReplyDeleteThe toolchain itself was much flakier.
DeleteThe environments both have hundreds of thousands of pages of documentation, thus quite a bit to figure out and not well organized to learn. They are references.
Generally these environments have some very superficial simple tutorials and projects, but no methodical way to work down in depth to more of the complexity. Thus you will find hundreds of people who essentially reproduce the simple examples and confidently upload github, youtube and other documents, but almost nothing any deeper.
What is frustrating is that you can work on a project that involves some logical problem - solved with code or hardware descriptions - but spend 99% of your time and effort on issues that have ZERO to do with the design itself. They are all complexities, details, subtleties, flaws, obscurities . . . having to do with the toolchain and the process rather than the function of your design.