Sunday, February 26, 2023

Building up and testing the logic to swap virtual cartridge file between HPS and SDRAM - first module

BOTTOMS UP CREATION MODULARLY

It is easier to grasp and debug functionality if it can be logically grouped and divided into reasonably compact VHDL files. I chose to create such modules to handle each of the three bridges between the Hard Processor System (HPS) and Field Programmable Gate Array (FPGA) sides of the System on a Chip (SoC) - HPS to FPGA, FPGA to SDRAM, and Lightweight HPS to FPGA. 

Thus I built the functions getdata, getsdram and getcommand which handle those three bridges respectively. Each controls the signals to the bridge interface, using the Avalon Memory Mapped protocol. The first name of each bridge indicates which side is the controlling element (master) with the other side its vassal. 

One level up, I wrote a module loopcart that would perform the entire process to load or unload a virtual 2315 cartridge into the SDRAM.  A cartridge is organized as 521, 304 words of 16 bits each, which are grouped into 321 words per sector, eight sectors per cylinder and 203 cylinders for the entire cartridge. This requires just under 1Mbyte of SDRAM which I assigned to the tail end of the 1GB of SDRAM installed on my Terasic DE10-Nano board. 

When a user selects a particular file on the SD Card attached to the HPS side, the user interface program will send a load command over the Lightweight HPS to FPGA bridge. That command kicks off the mechanism which will accept the stream of 521, 304 words and write them onto SDRAM in the final 1MB. 

Our HPS 2 FPGA and FPGA 2 SDRAM bridges transfer 64 bits at a time - four words - thus we actually need only 130, 326 writes from the HPS side to transfer the entire cartridge image down to the FPGA. My loop module starts at the beginning of the 1MB zone on SDRAM, then waits for the HPS side to write all those blocks of four words sequentially. As each block arrives, we write it to the SDRAM and advance our address. 

The FPGA side has no control over when a block of four words will be shoved at us by the HPS side, thus we depend on a pacing signal waitrequest for the HPS 2 FPGA bridge. Turning on waitrequest tells the master side of the bridge that we cannot accept its write request so it remains waiting patiently until we temporarily turn off the signal to let the write complete. 

We need this because the process of writing the block to SDRAM might not proceed as fast as the HPS side can attempt to write groups of four words. In that case, the master side is held waiting until we have completed the write to SDRAM and can accept the next block. A status word is emitted when the loop module has completed the entire cartridge worth of transfers, which then can be read by the Lightweight HPS to FPGA bridge as a confirmation we have the virtual cartridge 'loaded' onto the disk drive.

After the IBM 1130 has made use of that virtual cartridge by reading and writing, eventually the user flips off the power switch to unload the disk drive. We see this and begin an unload operation, again using our loop module. 

The unload begins at the start of the last 1MB of SDRAM, fetching blocks of four words, then dropping waitrequest so that the read from the HPS side can complete. The HPS side reads 130, 326 blocks of four words to fetch back the entire virtual cartridge file into the Linux application. We write this updated file back to SD Card at the end of this unload operation. 

SIMULATING MODULE BY MODULE

I set up some testbench code for each module and developed suitable signals and timings to validate the operation of each of the modules - getdata, getsdram and getcommand. Varying times and conditions let me prove out how they performed against my understanding of the Avalon MM protocols. Later of course this must be proven to work properly on actual hardware. 

Here is the simulation of getsdram under various read and write requests, also testing the waitrequest functionality. This module is a master type interface and the vassal side over in HPS will use waitrequest when it is unable to accept another read or write because SDRAM is busy. 




Thursday, February 23, 2023

Discovered that I may not need the onchip RAM approach at all

MY ORIGINAL ISSUE AND ASSUMPTION

The original challenge I saw was that the Linux program in the HPS side is driving the timing of the read and write of the file contents over to the FPGA, specifically for the unload operation at the end of using a virtual 2315 disk cartridge. 

My assumption was that there was no straightforward way for the HPS side to know that the SDRAM contents had been read in the FPGA and were ready to be fed to the HPS side when it does the read. The H2F bridge picks the cycle when the read command signal will arrive at the FPGA.   

When we do an unload of the updated contents of the SDRAM back to the Linux file, we have the master issuing reads to get the next word (actually groups of four) but in the FPGA we might not have successfully completed the read of the word (four words) from SDRAM yet. The HPS side has no visibility into whether our read over the F2SDRAM channel has completed yet.

As a result, I was going to set up a complex system making use of 64K words of embedded RAM in the FPGA. At the start of an unload operation, the FPGA would read 64K words from SDRAM into this buffer, then signal to the HPS side that it is ready to deliver 64K words when the H2F channel issues those reads. The HPS side then asks for the next 64K block of words, we preread them from SDRAM, we indicate that we are ready, and the HPS side issues another 64K words worth of reads. This continues for the eight groups of 64K words needed to unload the entire disk cartridge image from SDRAM. 

The way we signal back and forth with the HPS side is via the H2F LW bridge which is used to send a command word to the FPGA. When that word is Unload, we start the preread of the first 64K words into the embedded RAM. The FPGA side then sets up a status word indicating we are ready for the HPS side to read our words - this is interrogated over the H2F LW channel when the HPS side reads that command/status word. 

I added the embedded RAM because the alternate of using the H2F LW command/status word to synchronize every single word of the unload would involve over a half a million such protocol exchanges. Still, it is a bit of a Rube Goldberg/Heath Robinson sort of mechanism. 

WAITREQUEST SIGNAL CAN HOLD OFF THE MASTER

Part of the memory mapped Avalon protocol used to read and write on the H2F LW, H2F, and F2H bridge channels is a signal called waitrequest which I suddenly realized was the much simpler mechanism that I needed to solve the synchronization issue. It is a standard part of the protocol and by controlling that signal, I could make the HPS side wait when it issued a read for a word, so that I could complete the read from SDRAM before deasserting waitrequest to let the HPS side read proceed. 

Showing waitrequest delaying the read

This does stall the channel, as the controlling side of the channel in HPS will remain active waiting for its read operation to complete. If I pause a long time or never release the waitrequest, the controlling side (M side) remains busy. This could be a problem if that H2F channel were connected to multiple mechanisms in my FPGA which independently could be attempting to communicate, but I can control this. 

Thus, I will go back to a simple loop for all 521,304 words which is 130, 326 times when I read a group of four words from SDRAM and then release waitrequest so that the read of those four words from the HPS side can go forward. If I happen to complete the read from SDRAM before the HPS side does a read of the H2F channel, all is good and it immediately satisfies the read by HPS. If I am not done yet, the HPS side read will hang as long as waitrequest remains asserted. 

That is a much simpler and more straightforward way to handle the unload process, relieving the need for a ballet of command words and status signals over the H2F LW channel and eliminating the need for advance reading into embedded RAM. I am pleased with this approach. 

Wednesday, February 22, 2023

REFINING THE DESIGN FURTHER

HANDLING THE UNLOAD OF CARTRIDGE

The control pattern of the bridges offered between the Hard Processor System (HPS) and Field Programmable Gate Array (FPGA) sides of the Cyclone V System on a Chip (SoC) is absolute dictator to vassal, formerly know as m**ter-s**ve where the M side is the initiator of all transactions. There is no way for the S side to wait with a read pending until the M side writes data. 

The S side will see a control signal go high to tell it that it must respond to a read or write, otherwise it must do nothing. The design challenge with the prior approach where I would fetch the data from SDRAM by reading on the F2SDRAM interface then write that data back on the H2F link is that the M side decides when to do a read of the word I am ready to write and it isn't synchronized with my SDRAM reading activity. I can't tell it to wait, the M side will determine when it asks for that word and I better have it ready.

Instead, I am setting up an embedded RAM on the FPGA side which will hold an eighth of the cartridge file at a time. When doing an unload, I will fill the ram with the 65, 536 words then respond to a status interrogation being repeated from Linux that it is time for the master to loop and read those words and store them in the memory mapped virtual cartridge file. A continue command on the H2FLW channel tells us to read the next eighth of the data from SDRAM into the embedded ram buffer, after which we tell the HPS side it is time to fetch those. 

Evolving design as I learn more about SoC communications

ADDRESSING DETAILS ACROSS THE HPS-FPGA BRIDGES

One might naively think that addresses are a straightforward and universal feature of the SoC system, but there are in fact very many distinct address mappings that you will deal with in this type of device. Some can be modified to a degree while others are fixed aspects of the hardware design of the SoC. 

The Hard Processor System (HPS) is a dual core ARM system with the ability to talk to SDRAM and various on chip memories on the SoC. Its level 3 interconnect has interfaces to the bridges that allow communication with the Field Programmable Gate Array (FPGA) side of the chip and all the devices that may be accessible through the FPGA. 

To begin with, none of the hardware addresses you see from the FPGA or the bridges into the HPS side are virtual addresses. Thus, even if you have a mechanism to read some location in the memory of the Linux image running on the HPS, you would need to translate to the real SDRAM address in order to access it. 

When looking from FPGA side over to the HPS, there are multiple address mappings you will see depending on which mechanism or bridge you use for the access. There is an MPU view, a level 3 view and an SDRAM view. Within the L3 or MPU address mapping, locations are reserved for various hardware devices on the HPS side including the various bridges that communicate with FPGA. Further, one can have a mapping within the subset assigned to a particular bridge which would be recognized on the FPGA side as addressing some particular device or logical signal. 

The most straightforward is the SDRAM mechanism, which sees all 4GB of possible connected SDRAM and has therefore a straightforward addressing map. Not virtual, but if you know the real RAM address that the processor has associated with a location under Linux, you can read or write to it. This must be coordinated with Linux on the HPS which is also reading and writing to SDRAM while you are doing so from the FPGA. 

Boot ROM and onboard RAM on the HPS side do take up some of the MPU and L3 address mapping, but these can be, to a limited degree, remapped in the address ranges. A bit less than 3GB of the possible SDRAM space is visible in the L3 mapping. The MPU mapping sees 1GB of the SDRAM where it maintains cache coherency - thus updates from either side are viewed appropriately by the other (ACP window). This can be changed to some degree to pick the 1GB range which achieves coherency. 

In the MPU, another subset of SDRAM, less than 2GB is visible but not cache coherent, thus one has to be careful about coordinating access from Linux and your bridge. All L3 map addressing of SDRAM is not cache coherent but some may be from the MPU, thus set the ACP window to 'protect' coherency of some part of the SDRAM. 

The hardware for the HPS side has assigned ranges of memory that are mapped to control hardware devices. These include ethernet, SPI, SDcard, SDRAM hardware signals, and the three bridges. Only the processor running Linux sees the MPU view of memory. 

H2F is a high speed bridge with the HPS side determining when transactions take place, used to access logic and devices attached to the FPGA side. The H2F bridge has a block of the L3 address range assigned to it, where any read or write to that range will result in read or write transactions over the H2F bridge. 

A lightweight H2F bridge was also implemented, intended for rapid control signaling between the sides, thus it has its own address range assigned in the L3 mapping. Reads or writes to this address range become reads or writes over the H2FLW bridge. The hardware does convert the address used, thus if we write to the first word of the H2FLW bridge range in the MPU space - xFF200000 - that appears on the FPGA side's H2FLW bridge interface as address 0. Thus there is lots of mapping one has to do on top of the range complexity I mentioned. 

The F2H bridge has the FPGA determining when transactions take place. This bridge has a range of addresses from x00000000 to x3FFFFFFF that corresponds to the L3 address mapping. If the addresses are where HPS side devices, such as ethernet, are mapped, then the FPGA can directly control those devices. If the address used is within the SDRAM visible to L3 mapping, then the FPGA can read or write to the SDRAM there. It can also read or write to the ROM and other boot time memories that are visible in the L3 range.

My original design concept was to have the virtual 2315 cartridge file from the SD card be memory mapped by Linux and send the start address over the H2FLW bridge to my logic in the FPGA. I would then read and write from the SDRAM addresses visible in the L3 map view over the F2H bridge. 

The twist here is that all I see are physical SDRAM addresses, but the memory mapped file is in the virtual address space of Linux. I would need to force Linux to assign the block of memory for the memory mapped file to contiguous blocks of physical memory whereas normally the contiguous range of virtual addresses is strewn around in different physical blocks (pages). Even worse, due to demand paging these can change over time, with the same virtual address being held in different physical addresses. 

It would take quite a bit of Linux wizardry to pull of the feat of having my contiguous virtual address range for the memory mapped file correspond to a contiguous physical address range that would never be paged out. Different aspects of this are possible but the effort is not straightforward.

EVOLVED DESIGN CONCEPT

An alternative approach which I settled upon is to reserve the last 1MB of the 1GB SDRAM for the sole use of my FPGA logic. Boot time parameters tell Linux to not use that last megabyte, so that the HPS side never reads or writes to those addresses. My F2H bridge can issue reads or writes to that range and merrily make use of it.

The open issue is how I will get the file from the SD Card to this reserved memory in SDRAM and how updates parts of the file can be written back to the SD Card. Normally the SD Card is controlled by Linux on the HPS side, but then there is no method for Linux to write to the reserved 1MB of RAM. 

Two possibilities exist. First, I can loop data through the FPGA to move a file from Linux over to the reserved 1MB. Second, I can implement full control over the SD card from VHDL in the FPGA, which means that Linux no longer can use that card.

I planned a rich user interface running under Linux on the HPS side, able to display and select various virtual 2315 cartridge files on the SD card. This requires that Linux control the SD card. That makes the second method, direct access from FPGA, undesirable. 

I plan to make use of several bridges and links between the HPS and FPGA sides. I can use the F2SDRAM connection that gives me straightforward access to the SDRAM controller on the HPS side so that I can read or write any SDRAM address. I can use the H2F bridge so that Linux can read and write the virtual file data for me then exchange it with my logic in the FPGA. Finally, the H2FLW bridge lets me trigger a load of the cartridge file to my reserved SDRAM, or trigger a fetch of the updated file from SDRAM. 

SETTING UP THE BRIDGE MODULES

The bridges between HPS and FPGA sides are somewhat complex interfaces. Here is the list of signals one must interact with to use just one of those native memory mapped interfaces:

h2f_awid                              : out   std_logic_vector(11 downto 0);

h2f_awaddr                            : out   std_logic_vector(29 downto 0); 

h2f_awlen                             : out   std_logic_vector(3 downto 0);

h2f_awsize                            : out   std_logic_vector(2 downto 0);

h2f_awburst                           : out   std_logic_vector(1 downto 0);

h2f_awlock                            : out   std_logic_vector(1 downto 0);  

h2f_awcache                           : out   std_logic_vector(3 downto 0); 

h2f_awprot                            : out   std_logic_vector(2 downto 0); 

h2f_awvalid                           : out   std_logic;

h2f_awready                           : in    std_logic                     := 'X'; 

h2f_wid                               : out   std_logic_vector(11 downto 0); 

h2f_wdata                             : out   std_logic_vector(63 downto 0); 

h2f_wstrb                             : out   std_logic_vector(7 downto 0); 

h2f_wlast                             : out   std_logic;

h2f_wvalid                            : out   std_logic;

h2f_wready                            : in    std_logic                     := 'X';   

h2f_bid                               : in    std_logic_vector(11 downto 0) := (others => 'X');

h2f_bresp                             : in    std_logic_vector(1 downto 0)  := (others => 'X');

h2f_bvalid                            : in    std_logic                     := 'X';

h2f_bready                            : out   std_logic;

h2f_arid                              : out   std_logic_vector(11 downto 0);

h2f_araddr                            : out   std_logic_vector(29 downto 0); 

h2f_arlen                             : out   std_logic_vector(3 downto 0); 

h2f_arsize                            : out   std_logic_vector(2 downto 0); 

h2f_arburst                           : out   std_logic_vector(1 downto 0); 

h2f_arlock                            : out   std_logic_vector(1 downto 0); 

h2f_arcache                           : out   std_logic_vector(3 downto 0);

h2f_arprot                            : out   std_logic_vector(2 downto 0); 

h2f_arvalid                           : out   std_logic; 

h2f_arready                           : in    std_logic                     := 'X'; 

h2f_rid                               : in    std_logic_vector(11 downto 0) := (others => 'X');

h2f_rdata                             : in    std_logic_vector(63 downto 0) := (others => 'X');

h2f_rresp                             : in    std_logic_vector(1 downto 0)  := (others => 'X');

h2f_rlast                             : in    std_logic                     := 'X';

h2f_rvalid                            : in    std_logic                     := 'X';

h2f_rready                            : out   std_logic;

The above 36 signals are used on five distinct protocol channels. Signals that begin with h2f_ar are the address to be used in a read. Those starting h2f_r are the data used in a read. Analogously, h2f_aw and h2f_w are for the write address and write data interchanges respectively. Finally, there is a h2f_b set of signals which control the channel where status responses are exchanged. Within each channel there are details like caching, strobes, locking and protection that would have to be handled. 

Quartus provides a memory mapped pipeline bridge module which handles the complexity of interacting with the native interface, exposing a simpler set of signals and required protocol that I can make use of:

f2sdram_address                       : in    std_logic_vector(28 downto 0) := (others => 'X');

f2sdram_burstcount                    : in    std_logic_vector(7 downto 0)  := (others => 'X');

f2sdram_waitrequest                   : out   std_logic;

f2sdram_readdata                      : out   std_logic_vector(63 downto 0);

f2sdram_readdatavalid                 : out   std_logic;

f2sdram_read                          : in    std_logic                     := 'X';

f2sdram_writedata                     : in    std_logic_vector(63 downto 0) := (others => 'X');

f2sdram_byteenable                    : in    std_logic_vector(7 downto 0)  := (others => 'X');

f2sdram_write                         : in    std_logic                     := 'X';          

The interface immediately above with its 9 signals is much simpler and more straightforward. Supply an address, set a signal to ask for a read and grab the data when the readdatavalid signal is received. Qsys offers the simple protocol for reading and writing as a guide to the logic designer:

My FPGA logic sees the MM pipeline bridges to both the H2F and the H2FLW bridges. The F2SDRAM interface is itself set up to the simpler protocol of the pipeline bridge, thus I don't need to add a bridge for that channel of communication and still get the simple nine signal interface. 

MY TASK IS TO READ FROM ONE BRIDGE AND WRITE TO THE OTHER

The logic behind loading a virtual 2315 cartridge file into the reserved 1MB of SDRAM is pretty straightforward. I begin receiving words from the Linux program over the H2F bridge. I take each word received and write it to the SDRAM area over the F2SDRAM bridge, converting to the proper address of the word within that last megabyte of SDRAM. The virtual cartridge consists of roughly 500,000 16 bit words, but with a read and write width over my bridges of 64 bits, that means I have to read from H2F and write to F2SDRAM about 125,000 times. 

When the Linux program implementing the user interface has a new file selected to load as the virtual cartridge, it sends a control command over the H2FLW bridge which triggers my logic to iterate over reads from master and writes to SDRAM.

Once the drive goes not ready, the user interface program sends a command over the H2FLW bridge to unload the presumably updated virtual cartridge contents. This triggers my logic to read the SDRAM words over F2SDRAM and write them to the Linux program using the H2F bridge. The Linux program updates the memory mapped file which causes the changes to be written back to the SD card file. 

Sunday, February 19, 2023

HWLIB is for bare metal, i.e. not running under Linux; barking up the wrong tree

BARE METAL VERSUS LINUX IMAGE ON HPS

The Hard Processor System (HPS) side of the board is a processor system, dual ARM core, plus peripherals, which can run either Linux or standalone programs. The system running Linux boots through a boot loader and some startup modules, then firing up Linux itself. The boot loader and startup environment can also transfer control to code that runs without an operating system - this is bare metal mode. 

ROLE OF HWLIB

The API and library functions of HWLIB are designed to access and control the SoC resources when running in bare metal mode. One would code applications in C for this purpose, compile them and launch them under the boot loading system. 

I had found some makefiles that included the HWLIB modules such that the source code of HWLIB would have been compiled as part of the make process for the example project. I could have made use of this to get my alt_bridge_manager.c code compiled, but this is not the route I want to go. Rather, I want the full Linux environment to support mmap access to files as well as hosting the user interface for my project. 

LINUX DRIVERS FOR ACCESS TO SOC RESOURCES

A version of Yocto Linux was created to support the SoC systems such as my Cyclone V based Terasic DE10-Nano board. It provides device drivers to manage all the functions of the SoC, enabling a memory mapped means of controlling everything. Thus, no HWLIB is needed if you will run Linux on the system. 

DOCUMENTATION IS SPLIT

I realized why I was having so much difficulty finding this documented. Intel has documented their hardware and toolchains, but points people at a separate rocketboard.org site for anything related to Linux for the SoC. 

I had assumed this was just for those who wanted to modify Linux or were interested in how it was built to run on the HPS, but it appears that anything to do with using the SoC under Linux is shoved off to this site. I had worked through several Intel eLearning courses on SoC trying to find anything about accessing the SoC features through apps on Linux, finally realizing that the schism existed.

/RANT

Frankly, the entire experience exposes the lack of concern about the user, with all the decisions for how to organize toolchains (multiple parts not well integrated) and even the divide between Intel and Rocketboards.org, emblematic of a focus on fiefdoms or legal department separations in spite of the fact that this impairs the experience for the user/purchaser of the product. 

Convenient for Intel; don't care about the impact on the end user. They could easily have provided some overview level information that describes this taxonomy, artificial to the user but obviously meaningful to various organizations, managers and legal relationships inside Intel. Similarly, they could have provided integrative tutorials and documentation to show the user how to pull together the bits, but this is chaotically strewn through the website. 

I hate having my time wasted unnecessarily. I care about the projects, technology and getting things done. It is annoying to have my time diverted from my goals for specious reasons.

/END RANT

Saturday, February 18, 2023

Compiled program, now ready to set up the test on the DE10-Nano board

COMPILER ISSUE RESOLVED

I decided that part of my issue was the existence of the separately downloaded Quartus and SoC EDS products, which resulted in divided path names FPGA and FPGA-lite. I removed all and reinstalled just the SoC EDS with its option to include Quartus. 

To test this I verified that I could open my FPGA project and regenerate it successfully. From there I would move on to test the compilation. 

This did NOT work; couldn't open the project nor generate the FPGA side. Apparently there are two Quartus products, one for compiling the code and another for the FPGA, but confusingly named. Thus, I did have to download and install a second product. At least I could make sure that the path name was the same, rather than FPGA versus FPGA-lite.

Before I loaded the other Quartus, I did try a make of my C code. It did run the compiler and attempt to compile my code but it was missing the special headers for the Altera HWLIB. I assume this was due to the lack of the second product installation, so I fixed that issue before focusing on the C code side 

The term FPGA-lite comes from one of the three licensing levels set up by Intel. Professional, standard and lite. Only the lite version is free for hobbyists, but that comes with the default naming for the path which differs from the SOC EDS tool. That latter tool can be run without a license as a hobbyist but does NOT have a lite version. Unaligned product naming conventions are the issue here. 

Loading the FPGA tool Quartus, not the same as the Quartus Prime used for the ARM side processing, I changed the path name and installed over the path for the other toolchain. I was able to open the project, open Qsys, then compile the FPGA code once again successfully.

Back to the HPS side compilation, I reran the make on my C program. Previously it through up errors related to the lack of the headers for the Altera HWLIB, where I was using library calls and defined names for the bridge. 

The same errors came up with the SOC EDS command shell. I switched over to the NIOS II shell which failed with an inability to find the compiler. Sigh. Back to the SOC shell, where I realized that the socal and hwlib includes and libraries were not in the path for the make. 

This is apparently another popular problem people encounter often, as it came up with a quick Google search. First step was to locate the actual libraries and include files, so I knew how to point to them when compiling. There are three command shells, confusingly, NIOS II, SOC EDS and ARM DS, with no clear direction as to which to use.

I found the files at C:\intelFPGA\17.0\embedded\ip\altera\hps\altera_hps\hwlib\include\soc_cv_av thus the fix is just to point to them for my make process. When I looked at the makefile it appeared to do just this, but there seems to be some weird interaction with the Cygnus linux under windows behavior and the command shell. 

CLEAN MAKE OF THE PROGRAM

I was able to bodge things temporarily by copying all the include files which got me to compile and then I discovered there were no compiled versions of the HWLIB modules available to link. I am being run ragged by vendors pointing at vendors pointing at vendors. I can't get the ARM compiler to run without a serial number but since it was provided as part of the Intel SOC EDS I don't have a serial number. 

Nowhere do I find any references to where I grab the object code associated with the HWLIB. I do see a github project but it requires an entirely different set of pseudo Linux tools to be set up in order to compile the source. Sheesh. 

HOW TO TRANSFER C PROGRAM AND FPGA IMAGE TO THE BOARD

Once the object module is built, I can transfer it over to the SD card image on my DE10-Nano board where it will be visible to the Linux image after it boots up. I can use the programming links to get the FPGA bitstream loaded onto that side of the board as well. 

SETTING UP THE PHYSICAL TEST IS EASY

After hooking some cables to do the file transfers, then booting the board, all I would need to do is issue the command to execute my C program, which will transfer the address and print it out for me at the same time. 

Using the slide switches I can pick the four bytes of the latched register. I should see it matching the address from the C program execution as proof that I accomplished the task. 

Once I get the program linked this can be done. Back to the library run around. 

Built header after sorting out issues and wrote program - issue in compilation

MORE NAME CHANGING TOMFOOLERY BUT FIGURED IT OUT

My Quartus install (FPGA Lite) did not have the command shell that comes with the SoC EDA, but it did have a shell labeled Nios II Command Shell. On a whim, I fired that up, navigated to my project folder, and successfully generated the header file!

Shifting names - Qsys or Platform Builder, FPGA or FPGA-lite - but I did manage to build the header and could move on to coding the C program. 

WRITING SIMPLE PROGRAM TO SEND VIRTUAL ADDRESS OVER TO FPGA

Altera/Intel provided a programming interface to control the hardware from the ARM HPS side, SoCAL and HWLIB. It took me a while to chase down the documentation of the interface but once I had it, I could work with the various headers and to code up my program.

I had to open up the lightweight HPS to FPGA bridge and once it was done, all the program needed to do was to pick up the address of a known word, pass it to the FPGA side over the bridge, and output the address to the console port. 

ATTEMPTED TO RUN MAKE BUT COMPILER NOT FOUND

I imagine this is more of the same, where the installation of the Windows version doesn't really set up the environment and paths correctly, but alternatively it could be that I need to fetch the ARM compiler and install it since this toolchain does seem to involve many independent pieces that must be managed. 

Friday, February 17, 2023

Tweaking toolchain to compile code for HPS system

FIRST STEP IS BUILDING THE HEADER FILE FOR THE HPS CODE

The toolchain provides commands that read the files which generated my FPGA side for the SoC - pulling out the hardware components and the assigned addresses in memory to reach them from the program running in the HPS side. 

It establishes various constants that are used to create the virtual addresses when one has to control the bridges or access other peripherals that are attached to the FPGA side. The resulting header file is included in my user code that will run under Linux. 

FIGHTING PATH ISSUES RUNNING THE HEADER FILE CREATION SCRIPT

The toolchain is built from nested batch files, scripts and programs, but somehow it doesn't install itself in a way that will actually work immediately under Windows 11. First up, the batch file create_hps_qsys_header.sh fails because the system can't find sopc-create-header-files. When I explicitly point at the folder holding this, that fails internally calling sopcinfo2swinfo.exe and I suspect there would be further failures if I did a spot fix for that. 

ROOT ISSUE - SHIMS AND RUBE GOLDBERG ADAPTIONS

At heart all the toolchains appear to be Linux executables. Graphical interfaces are shimmed above them even for the Linux environment but then in Windows we introduce the cygwin environment to run Linux under windows, yet another shim. 

This obscures what is actually taking place and introduces many failure points with the shimming. In the Xilinx toolchain I would find that when things went awry it would leave files or not properly update files which are essentially undocumented and hidden by all the shims. If this were a simple command line environment than all the files would be overt and much easier to dig into when chasing bizarre behavior.

The toolchains for the Intel/Altera SoC environment are split into two parts - one for compiling code to run on the HPS side and one to build logic for the FPGA side. These two, Quartus Prime and Soc EDA, are both distributed separately and sometimes included in the same download. 

I suspect one of my issues right now is that I downloaded Quartus Prime 17.0 which installs under the FPGA Lite path, while Soc EDA 17.0 installed under a plain FPGA path. The materials on the web sites and documents are frankly terrible at specifying what order or what products to download, thus it is unclear whether I should have just installed SoC EDA and accepted Quartus as part of it, or if I did the sequence right with Quartus first, then Soc EDA. 

In any case, more completely wasted time chasing issues that have zero to do with my design or my own code. Zero. 

Thursday, February 16, 2023

Linux side test set up

ADDITIONAL TOOLCHAIN FOR THE HPS SIDE - WENT FOR SAME BACK LEVEL 17.0

The Intel System on a Chip Embedded Design Studio is a bit toolchain that does the compiling and building of the software which will run in the Linux image on the ARM processors in the HPS side. I loaded the same level as my Quartus tools, 17.0, which is as close as I could come to the version 16.0 used to create the Terasic demonstrations. 

STARTING A SIMPLE C PROGRAM TO SEND VIRTUAL ADDRESS AND ECHO CONTENTS

The first step of this process is to create the appropriate header file. The Intel toolchain comes with a program that reads the detailed SOC description I created with Qsys and produces a header file for use with C programs. 

Alas the scripts to run this are not working correctly out of the box. Not a surprise given the generally shoddy nature of toolchains, but something I will need to fix before I can whip up the relatively simple program. 

The means of accessing the bridges and other hardware elements attached to the Linux processor on the HPS side is through memory mapping. These all have addresses within the 32 bit virtual space that you access to control the hardware.

GPV registers activate and control the three bridges between the HPS and FPGA sides - thus the first step is to point to the proper location and store a magic value activating the bridge. Then the actual writing through the bridge involves similar stores to locations in memory, so that I can send the correct data over to my FPGA logic.

These magic addresses are based on address arithmetic from various header constants thus I need the header to be right before I can make much headway.

MEANWHILE SLOGGING THROUIGH THE SOC/FPGA SIDE

All my code synthesized just fine but I began to encounter errors with the fit and route functions of instantiating the design for the FPGA. This seemed to be because Quartus didn't have enough information about some of the hardware connections. 

I realized that the TCL scripts although named for the SDRAM function I am not including in my system, do provide some pin assignment and parameters that are essential to properly building the system. 

After running this, the build completed successfully. I have an FPGA bitstream that should set up the FPGA side to communicate with the HPS side and my program over there. 

Creating my first testbed - transfer of virtual address in Linux to the FPGA side

BUILDING MY MINIMAL TEST SYSTEM TO PROVE OUT A FEW KEY CAPABILITIES

My first cut at this is a simple SOC where I set up the HPS to FPGA Lightweight (LW) bridge as well as the FPGA to HPS (F2H) bridge. The intent is to pass one message over the LW bridge giving the FPGA a virtual address of a given word in the Linux system. I will use the LEDs and slide switches on the board to display the address latched into the FPGA, which I can compare to the address I write out of the UART port from the Linux image in the HPS side. 

If this works, a quick evolution of this will make use of the F2H bridge to fetch the word from the memory of the Linux image and latch it, using the address latched by the LW bridge. I can then compare it to be certain that I have the ability to read and write memory in Linux from the FPGA logic.

Assuming both of those work, I can use the same testbench with new programming in the Linux image to open a file, memory map it and pass along the address of the first word to my FPGA for display. The linux side will mmap() the file and prepopulate it, since I don't want to have page faults slowing down response when the FPGA is actively reading or writing sectors in the virtual 2315 cartridge file which is the ultimate purpose of these transfer mechanisms. 

CONFIGURED THE SOC AND BRIDGES PROPERLY

I was able to set up the SOC with the Linux (HPS) side implementing two of the three possible bridges - the lightweight link from HPS to FPGA side and the full FPA to HPS link. These expose the two memory mapped Avalon interfaces to my FPGA logic. The top level wrapper links all the board pins to the proper hardware and connects the SOC to either the pins or my logic to control all those signals. 

Configuration of my SOC

The visual connections from Qsys are visible in the diagram above along with the names of the signals exported from the SOC out to my FPGA logic. It also shows the address range of the two bridges, with the FPGA to HPS (F2H) bridge able to generate any 32 bit address but the lightweight HPS to FPGA (LW) bridge only addressing x00000000 to x00003FFF 

WRITING MY LOGIC IN FPGA TO OPERATE THE INTERFACES

Next up I have to code logic in my FPGA which implements the appropriate Avalon Memory Mapped interface which is how you drive the bridges. I will begin with an static idle set of signals for the F2H bridge as I will only be receiving transactions from the LW bridge in the first version of this project.

That logic will capture a single word sent over the bridge to location 00000000 which will be the virtual memory address from the Linux side. We latch that and have it available for display on the LEDs of the board. 

Since the board only has eight LEDs but the address is 32 bits, we will use the slide switches to select which of the four bytes are visible. I begin operation with all bits on in the latched register, but expect a different value to be sent from my Linux code.

LINUX SIDE CODE NEEDED

In addition to the HDL there is a very tiny program that must execute in the Linux image to drive the other side of the LW bridge. It must first activate that bridge by writing control values to some bridge control registers, selecting a specific address defined by the Cyclone V chip. It can then drive one write transaction over the bridge to the FPGA side, passing a virtual address which it also writes on the serial port for my confirmation.

Wednesday, February 15, 2023

Regressed toolchain to old version of Quartus - back on track since this works properly

FAILURES WITH CURRENT 22.x OF QUARTUS MAY BE DUE TO EVOLUTION

Since I am encountered a slew of issues with the TCL and shell scripts that came with the demonstration/reference projects, it may be that the issue is incompatibility between them and the current version of the toolchain I am running. 

Nothing on the Terasic web site addresses my issues nor can I find others with similar issues using web searches, so I will experiment with this theory. If I can get to the version of the toolchain used with the demo projects, which was 16.0, I may get things to behave better. 

LOCATED OLD VERSION 17, ALMOST WHAT WAS USED WITH TERASIC DEMOS

The oldest version hosted by Intel is 17.0, so I picked that up. Removed the new version and regressed to the old flavor. With that in place, it was time to attempt to build the GHRD (Golden Hardware Reference Design) from the Terasic distribution CD image. 

ACID TEST - WILL IT BUILD DEMO VERSIONS SUCCESSFULLY?

My first run through the tools went reasonably well. Not a complete successful build but much less awry than with the current toolchain. I looked over the issues and it was a very simple matter of when to run the supplied TCL scripts that set the pin locations and key parameters for the board. I did get a clean build, exactly what I needed to feel comfortable introducing my own design and logic. 



Tuesday, February 14, 2023

Second laptop also cannot generate the 'golden reference design'

WILL NOT SUCCESSFULLY GENERATE HDL FROM QSYS FOR UNALTERED GHRD

I moved to my newer laptop as a way of eliminating corruption or mis-installed software as a cause of the difficulty I had rebuilding the Golden Hardware Reference Design.  Alas this too failed to generate HDL for the project from the unmodified content I got from Terasic. 

We may have issues that arose because Terasic produced the CD around an older version of the Quartus toolchain, with the evolution of the tools introducing various issues blocking successful generation. If that is the only issue I suppose I could try to locate the older version of the toolchain corresponding to the Terasic demonstrations.

Another possibility is to look for a current GHRD design directly from Intel that presumably will generate on the current versions of their toolchain. Frankly, I just need to have the GHRD work before I begin introducing my logic and modifications into the mix. 

ONE ERROR CONCERNING NEED FOR WINDOWS SUPPORT FOR LINUX

I did notice that one of the error messages stated that I was missing Windows Support for Linux, which I installed from the Windows App Store but the error continued. This may be a path issue for the running software or a misleading error driven by a deeper problem. 

Monday, February 13, 2023

Setting up to pass address to FPGA and display that word - but gremlins pop up

SIMPLE WAY TO SHOW THE ADDRESS SENT TO THE FPGA SIDE

I decided to make use of the four dip switches and eight LEDs on the board as a way of displaying the address latched in my logic, that address updated when the HPS side Linux program sends a message with a memory address. 

The essential part for this first half of the project is to activate the lightweight HPS to FPGA bridge that supports communication between the Linux system and my logic. I only need this to send one word over the link, which is either all zeros or the virtual address in Linux where my memory mapped file is placed. 

Even more simply, I only need to get the virtual address of some one known word value, then send that address over the bridge to the FPGA side. If I observed that the address was properly transferred, I could consider this essential functionality operational.

FIGHTING TOOLCHAIN TO TEST

After many battles defining a system from scratch for the experiment, resulting in many failures in TCL scripts and failures to generate the VHDL for the system, I threw in the towel and decided to hack on the existing 'golden' reference files as everyone else seemed to do. 

That didn't work - similar errors. Okay, lets just generate the golden reference file as it sits, unmodified, to make sure that the environment is sound before I introduce any of my own changes. Failure! 

At this point I am going to transition to full on internet idiot mode. I will follow the reference document step by step, clicking and typing exactly as they do, hoping to at least get the demonstration project to be generated. If so, then I can differentially analyze it to figure out where I went wrong. If it won't work here, perhaps my platform is corrupt and I should work on another laptop. 

Sunday, February 12, 2023

Gradually determining the names of the interface IP necessary in Qsys for my SOC

MYSTERY NAMES AND DESCRIPTIONS

I imagine that formal names for programs and IP have changed over the years, perhaps when Intel acquired Altera but certainly it could have happened from time to time based on marketing or design scope changes. What I know for certain is that there are inconsistencies in how various items are named that can be confusing to a new user. 

For example, the tool Qsys at some point became Platform Designer, which is the current name in the menu of Quartus. However, many documents refer to this as Qsys and older videos show that the menu item was labeled Qsys in earlier versions. 

More importantly, I find that the names for various interface IP provided as part of Qsys have names that don't align with the descriptions of functionality in the Cyclone documentation nor in the Qsys/Platform Designer documentation. Figuring out what IP provides a function described across the breadth of all the manuals is an unnecessary challenge.

Even the names that show up when a particular IP is included in a demonstration or reference project doesn't fully enlighten. Some of the interconnect IP used for the F2H, H2F and H2F lightweight bridges have descriptions that read JTAG to Avalon Master Bridge, but what does JTAG have to do with these allowing FPGA and HPS cross communication of memory mapped addresses? 

SLOGGING THROUGH VARIOUS TOMES TO BUILD UP MY LIST

The HPS manual as part of Cyclone is 706 pages. The Platform Designer manual is 777 pages. The Quartus II handbook is 1,681 pages long. Nowhere do the names one sees in the IP list of the running Qsys show up across all these pages. 

I am reduce to watching videos, reading web posts and examining various reference/demo designs to learn what names relate to each item. When is the HPS_Only_Master used? When is the Avalon_MM_Master_Agent used? When is the Avalon_MM_Pipelined_Bridge used? 

READY TO START EXPERIMENTS 

At this point it makes sense to try to use some of the IP that I see in reference designs to do simple things that are similar to or the same as what will be needed in my full design. Once I see these work properly I can try alternatives or get more sophisticated. 

SIMPLEST COMMUNICATIONS EXPERIENCE PASSING ADDRESS AND GRABBING WORD

The new areas for me in this version of the project are the bridge communications between the HPS and FPGA sides as well as the memory mapped file locked into physical memory within Linux. The starter experiment is just to see that I can look up an address in Linux and send that successfully to the FPGA side where I can access it. 

This is what I will build and play with next. Once I get that working, I can add the next level which is used of the F2H bridge to read various words from the area whose address I just passed along. The combination gives me confidence in the first of the new areas described above, leaving just the Linux bashing experiments to get the virtual cartridge file opened in a contiguous set of physical pages and memory mapped within Linux. 

Saturday, February 11, 2023

Making headway on Qsys, understanding the issue with uneven depths of required knowledge

FAILURE OF ABSTRACTIONS AND THE DETAILS PROBLEM WITH FPGA/SOC

Much of the environment to develop with FPGA and SOC provides abstractions that present a straightforward and relatively easy to grasp means of dealing with the devices, but they are not perfect abstractions which from time to time requires the user to plummet to the stygian depths of minutia. These diversions come up suddenly.

At the first level, we design logic for FPGA and SOC using high level hardware description languages, such as VHDL or Verilog. The actuality of the FPGA chips is many replicated tiles of look up tables, multiplexors, PLLs and clock modules, RAM units, and a byzantine distribution system for both general and clock signals. 

The toolchain synthesizes the HDL into logical schemes and then places the logic onto specific tiles on the chip. It routes all the signals and validates a myriad of rules for their placement and interaction. At the end it generates a bitstream file which is shifted into the chip so that the relevant bits control the function of each tile and signal router. Mostly this happens invisibly, thanks to the abstractions.

The challenge comes when errors arise which are presented as a violation of those inner rules and construction details. With Vivado I received errors about challenges routing clock signals. At the HDL level, I specified clocks and used those in logic. Suddenly I had to know exactly what was the chip's switching matrix for clock signals, which tiles held various clock and buffer functions, and the way these are configured based on my HDL. 

That failure of abstraction leads to hours of study of trivia and specific chip details - a kind of light years big leap from HDL to chip structure. It makes the errors a bit incomprehensible and introduces a long learning curve delay at random moments within a project. 

With Qsys, we also have a tool that abstracts and eases the configuration and interconnection of IP to form the system desired. However, there are those hundreds of very detailed parameters and timings that are needed, the polar opposite on the spectrum from abstraction to bloody detail. 

In my last post I mentioned that so many hobbyists seemed to lean on the crutch of reference designs, copying and pasting, rather than using any guide that steers one to the logic behind the parameters for the specific board, chip and design project. 

PRESETS AND REFERENCE DESIGNS - A SOLUTION TO WILDLY VARYING COMPLEXITIES

I did find that Qsys comes with preset files for various boards containing the Cyclone SOC chip, allowing a user to select that preset and have all the parameters appropriate to the board's implementation be populated into the design. 

Thus, the user can grab a preset file, just like they can open a reference example file and grab parameter values, avoiding the need to understand any of the depths behind those values. Alas, in FPGA land the potential remains that some situation crops up that requires different settings, plunging the user down to the abyss of hardware details.

Alas, not every board maker has provided a preset file or Intel have chosen not to share every one. Further, cases will arise where one needs different values, but these necessitate a trip to into the abyss of gritty hardware details. 

PRACTICAL SOLUTIONS EXIST BUT QUANTUM LEAPS TO THE ABYSS LURK

I can imagine a world where the stride is not so extreme - where there exists a stairstep of abstractions or explanations where one can avoid the leap into the basement and face a lesser learning curve and less extensive set of issues to address. I imagine a world where linkages are suggested to guide one from known references to handle portions of the parameters. 

For example, the datasheet for the specific DRAM chips on a board contain some of the parameters required in Qsys, but others depend on the murky IP wrapped around it that designs clocking structures, buffers and the like which also require parameterization in Qsys. Rant off for now, on to the project. 

Friday, February 10, 2023

Working my way down through the toolchain and identifying areas for further study

TOOLCHAIN IS THREE SEPARATE PARTS

The toolchain can be grossly divided into three portions that are used to design and debug projects on the Terasic DE10-Nano and other boards using Intel/Altera System on a Chip (SOC) components. These are Quartus,  Qsys and System Builder.

QUARTUS IS LIKE VIVADO AND OTHER FPGA SYNTHESIS AND SIMULATION TOOLS

Quartus is the tool focused on the FPGA side on the chip, where high level language inputs define the logic and functionality. These are synthesized and then can be implemented, producing a bit stream to load into the chip to instantiate the functionality desired. It provides a simulation capability to test out logic design. There is a large library of existing intellectual property (IP) which are functions that can be introduced into a project to provide functionality that otherwise might need to be designed from bare metal upwards. 


QSYS IS SPECIFIC TO THE SYSTEM ON CHIP COMPONENT

Qsys deals with the unique nature of an SOC, which has portions of the chip whose logic and structure are fixed and immutable, with others like a standard FPGA. The SOC chip has both a fixed hard processor system (HPS) which is a computing system based on ARM processors, and FPGA logic that is customizable. 

The two sides, HPS and FPGA, are tied together by various signals and interface circuits. Qsys is concerned with this level of the chip - the interconnections with the HPS and also the integration of both IP and user written FPGA logic.

Qsys rather uniquely presents a graphical view of the various control signals going into and coming out of each bit of logic - IP, HPS, SOC specific interfaces and user logic modules. Mouse clicks can cause these to be connected as desired, plus the detailed parameters for the bit of logic is configured at this point 


SYSTEM BUILDER IS A LAYER ABOVE QSYS WITH BOARD MAKER ORIENTED DETAILS

The Intel/Altera SOC chip, for example the Cyclone V used in my project, can be implemented on my own PCB or bought as part of various boards. One maker of boards, Terasic, produces a specific board that contains the Cyclone V SOC chip along with many useful adjunct devices. 

The Quartus tool set defines general FPGA logic, where look up tables and other elements of an FPGA are configured and interconnected to instantiate the logical functionality being designed. The Qsys tool adds recognition of the HPS side and the integration circuits between HPS and FPGA. However, neither of them have any idea what RAM, ethernet or other chips were placed on the DE10-Nano board by Terasic. 

The System Builder knows the external pins of the board and all the chip functions included on that board. It is used to pick which are going to be the used in a particular project and the output of this builder tool are the starting states for both Qsys and Quartus. 


DIGGING INTO QSYS

Now I understand the overall concepts and controls of Qsys (which is called the Platform Builder within Quartus because if the menu said Qsys it would be too easy to find). What I need to understand better are the interconnection specifics for the SOC chip and all the major IP elements that would be used to drive the SOC chip (and wider, to make use of other chips on the Terasic board). 

Intel and Terasic provide example projects along with the files to produce them, such as the "Golden Hardware Reference Design GHRD". One can follow along and implement the example project exactly or make small tweaks, but most of the understanding needed to accomplish more sophisticated projects does not come from those breezy introductory materials. 

Every YouTube video I have found about these Terasic DE SOC boards involves hobbyists following along with the example project and more or less implementing it without seeming to understand the depths behind it. The board is accessible to many hobbyists but only at this level. Take an example that blinks LEDs or communicates over Ethernet, hack it to your purposes, and the job is done. 

I need to understand this much more deeply. The details are available, but in multiple documents covering Qsys, Cyclone and other aspects, each may involve many hundreds of pages. The challenge here is going from the very superficial coverage of the examples down to the weeds of the details. Abstract concepts, gradually building into the myriad details, is the path I want to understand this SOC and board system. Unfortunately, I only seem to find the weeds and the fluff, not the explanatory manuals or videos or courses that bridge that gulf elegantly. 

The HPS component in Qsys, by itself, has many signals and a staggering number of parameters that have to be configured. I am recording the choices made in various example projects such as the GHRD but without understanding of what they are and why these choices were picked, I can't build general projects using the system. 

one of many configuration panels for HPS

I won't need total understanding in order to move forward with this specific project, but I need enough to know how to drive the bridge links between HPS and FPGA sides and to access specific memory over on the running Linux image on the HPS side. Much of this intercommunication works by memory mapped addressing, thus one has to build up the addressing scheme as well. 

Monday, February 6, 2023

Workflow for the Intel/Altera based Terasic DE10-Nano board

SET UP PINS AND OTHER DETAILS USING TERASIC SUPPLIED APPLICATION

The Terasic DE10-Nano board has a Cyclone V System on a Chip (SOC) chip along with quite a few peripheral chips providing functions like DRAM, HDMI video, USB and the like. The Quartus toolchain is fully generic for anyone who makes use of the Altera/Intel FPGA or SOC chips, including this one, but is unaware of how various product and board makers have packaged the chip with other components.

Terasic offers a System Builder tool that configures the high level HDL file and the settings file appropriate to their DE10-Nano board. The user selects checkboxes for particular components or functions on the board and those are included or excluded, generating a project based on any subset they choose of the board's capability. 

The settings file for Quartus has all the important configuration information - which pins of the SOC are connected to various devices or external connectors on the DE10-Nano board, what direction and IO parameters it uses, plus other details that will customize the project to this specific board. 

USE QUARTUS TOOLCHAIN FOR SUBSEQUENT DEVELOPMENT

The output of the System Builder is the start for our Quartus project, where we continue the development and ultimately load the functionality into both FPGA and HPS sides of the chip. Starting from this point, we first configure the SOC chip itself using the QSYS tool inside Quartus. 

With that done, my user logic coded in HDL files are added to the project to complete the FPGA side programming. Any code that has to run on the HPS side, for example within Linux, is also produced by this toolchain, compiled and introduced into the Linux image sitting on the SD card on the DE10-Nano board. 

The programmer tool can load the FPGA or our Linux image can load the FPGA side after Linux boots up. 

I will go through details of each step as I proceed, in future blog posts.


Saturday, February 4, 2023

Complications in using the Linux side to hold the virtual disk cartridge file

BASIC CONCEPT FOR ACCESSING DISK CARTRIDGE FILE

The files sit on an SD card, from which the operator selects one to be mounted on the disk drive. When the actual disk drive in the IBM 1130 is switched on, it has a dummy cartridge inserted which spins up. As it hits speed, the drive goes ready and the software on the 1130 begins to position the arm and attempt to read or write words. 

My logic in the diskmodel module will know when it needs a particular word of the disk cartridge image in order to stream the appropriate bit pattern into the actual disk drive hardware and eventually move into memory in the 1130 system. The address of the cylinder, head, sector and word that is needed is used to figure out the byte index into the virtual cartridge file. That index is added to the start of the file, which was memory mapped in the Linux system, in order to read that word from the Linux system memory and send it over the bridge to our FPGA logic.

The challenge is that the FPGA can address memory on the Linux side but it is addressing physical memory, not virtual memory. The file was mapped to virtual memory, which may be paged into noncontiguous areas all across physical memory. It can move at any time as other activity in the system forces old pages out. 

HOW I MIGHT RESOLVE THIS COMPLICATION

There are three steps to this process, in order to have an address I can send to the FPGA that points at a contiguous block of physical memory pages where the virtual disk cartridge file is mapped. First is to ensure I have the 1MB block as continuous memory. Second is to ensure that my file memory mapping is assigned to the exact same range of virtual addresses. We must verify that the mapping remains 1 to 1, that is that a given virtual address has the exact same physical address at all times. Third, I have to be certain that the entire file is populated so that the contents of the file are paged into the physical memory already, so that when I read the physical memory I am sure I am seeing the contents of the file mapped to that address. 

The Linux system reserves a large block of contiguous memory as part of the Contiguous Memory Allocation process, that block divided up and given to various device drivers and other functions that require physically adjacent pages. My needs are only for 1MB, whereas the system should have substantially more than that available after bootup. 

The system call that memory maps a file to virtual address - mmap() - has options that can insist that the map start at specific addresses unless they are already taken. I will have assigned by 1MB block from physical memory at boot up, thus I can point the mmap() at this location to get the file placed properly. A parameter used to request this is MAP_FIXED_NOREPLACE

The Linux system will preread all of the file and thus ensure it is available in physical memory by means of the MAP_POPULATE parameter however that requires that this file become a MAP_PRIVATE type.  The problem with the private type is that when a write is done to the memory location, it is not written back to the disk file on the SD card automatically. Further, this suggests a 'copy on write' semantic which might assign a different noncontiguous page for this address once we update it. 

The solution to the last issue may be the use of the msync() call once the virtual disk cartridge is unmounted - in other words once we see that the drive is no longer holding the heads down either because of the operator switching off the drive or due to some error we detected. 

There are a number of assumptions that are necessary in order for all of this to work as I desire. It will take some careful testing until I am certain that I can make this work properly. The alternative is an uglier mechanism where application code in Linux must take each transaction and deal appropriately with the virtual cartridge image file, perhaps keeping a 'dirty sector' map and flushing the sector back to SD card whenever I see a request in another sector. 

Developing the HPS to FPGA lite signaling path

STRUCTURE OF THE CYCLONE V CHIP ON MY DE10-NANO BOARD

The Altera/Intel Cyclone V chip consists of two sides - Field Programmable Gate Array (FPGA) and Hard Processor System (HPS). The HPS side is a processor with dual Cortex-A9 ARM cores which may run a version of Linux. It has 1GB of DRAM and plenty of peripherals. The FPGA side contains 110K configurable logic elements along with PLLs and other features common to modern FPGA devices. This too has its own rich set of peripheral connections. 

The two sides, HPS and FPGA, can communicate using bridges that are engineered on the chip to support communications between the halves. These are fast and flexible AXI interconnects, which have a controlling and a controlled (previously master and slave) end. The chip offers three bridges, HPS to FPGA, FPGA to HPS and a lite HPS to FPGA version. For these, the first name is the controlling side which initiates all transactions across that bridge. 

The HPS side of the bridges are linked into the ARM level 3 interconnect. Thus instructions running on the Linux side can emit transactions over to the FPGA side on one bridge, the FPGA side can emit transactions over to the ARM system on the second bridge, and instructions on the Linux side can send transactions over the lite bridge which is the third type on the Cyclone V chip. 

MY INTENDED USE OF BRIDGES

I will use only two of the three bridges for my project. These are the FPGA to HPS bridge and the HPS-FPGA Lite bridge. 

The HPS to FPGA Lite bridge will send over the start address in Linux memory of the memory mapped virtual 2315 cartridge file, or it will send over an address of zero to indicate that no virtual cartridge is currently spinning. 

The FPGA to HPS bridge will be used to access memory in the Linux system, which is how the data words of our disk drive are read and written. The FPGA logic grabs each word from the HPS side as the disk model logic determines it is time to begin emitting pulses into the disk drive read electronics to let the disk controller and IBM 1130 computer read data from its virtual disk cartridge. When the disk controller is writing words to the disk drive, we capture those, build up each word and write it onto the HPS side using the same bridge. 

FIRST SECTION BEING CODED AND TESTED IS THE LITE BRIDGE TRAFFIC

The Linux side only has to send transactions with a memory address or with a zero address depending on whether a virtual cartridge file is spinning and ready to be accessed. Eventually this will take place when the Pick signal is activated or dropped, with a user interface allowing an operator to have selected which of a number of virtual cartridge images on the SD card that they have 'mounted' into the 1130 disk drive. 

Initially I will just code up a single transaction to send a known memory address to the FPGA side. The goal is to have successful transmission of that message to the FPGA and proper reception and decoding on that side. Thus my first side is a teeny bit of code for the Linux system and some logic in the FPGA to receive the address. I will find a way to easily the results of testing so that I can verify it works on a real board, once I complete all the simulation testing of my FPGA side logic. 

Friday, February 3, 2023

Developed address computation module to generate RAM address for read or write of a word

VIRTUAL CARTRIDGE FILES ON A SD CARD

The ubiquitous simh simulator package has been used to create simulators for a wide range of computers, among them the IBM 1130. Disk images for use with that simulator are in a specific format which I have chosen to use for this project. That allows interchange of disk files easily between the simulator running on a PC and the actual 1130 system using this virtual cartridge system. 

The disk cartridge consists of up to 203 cylinders, each have a head on both its upper and lower surfaces. The surface holds four sectors of data in one rotation, each sector consisting of 321 words of data. The IBM 1130 word size is 16 bits, consisting of the high order bit 0 down to the least significant bit 15. 

Therefore, the disk files used with simh have groups of 642 bytes that hold the 321 contiguous words of a sector. These groups or sectors are arranged with sector numbers 0 to 3 under head 0 followed by sectors 0 to 3 under head 1. That constitutes a cylinder of information, eight sectors in total, which is 2,568 words or 5,136 bytes. 

Cylinder 0 starts at the beginning of the disk file, cylinder 1 is at byte 5,136, cylinder 2 is at 10,272 and the final cylinder begins at byte 1,037,474 and extends to the end of the file at byte 1,042,607. Within any cylinder we see sector 0 of head 0 at the start, sector 3 of head 0 is at 1,926, sector 0 of head 1 is at 2,568 and sector 3 of head 1 is at 4,494 into that cylinder. 

INDEXING TO A SPECIFIC WORD FOR READING OR WRITING

Cylinders are eight bit binary values, heads are single bit, sectors are two bit values and words within a sector requires a 9 bit binary index. Simply combining the 11 bits of cylinder & head & sector gives us the relative sector number inside the file, which when multiplied by 642 yields the first byte address of the sector. 

Adding the word number (times 2) indexes within that sector to the word itself we are interested in fetching. This must be added to the base of our memory mapped file over inside the Linux system, which points us at the word we want to read or write. 

In operation, the operator will select one of the virtual 2315 disk images from the SD card, it will be opened under Linux and memory mapped. The resulting location for the memory mapped file is sent over to the FPGA where it is kept as the base for all this arithmetic. 

SIMULATION TO VERIFY CORRECT OPERATION

I set up a testbench which provides a base address, which will eventually come from the Linux side but is arbitrarily chosen for unit testing purposes. I then vary the word, sector, head and cylinder values, watching the generated memory address to verify that it is the correct location in Linux memory to grab our intended word of the opened virtual cartridge file. 

I registered the address with our FPGA clock but let the multiplier and adders run outside so they settled down by the time I latched in the final result. 

VERY SIMPLE AND WORKS PROPERLY

The simulations showed this producing correct addresses with clean transitions at clock edges, including properly generating a result with the highest possible valid word address in the disk file. 

It is time now to move on to prepare the Terasic DE10-Nano board with the appropriate Linux image on the hard processor side (HPS) so that I can begin to design and test the bridge traffic between the HPS and the FPGA sides. 

Thursday, February 2, 2023

Finished verification of write data capture logic

ADDED ADDITIONAL THREE BIT CELLS OF ZERO

Since my testbench reads lines from a text file that are twenty bits long, my testing has had a preamble length that is an exact multiple of 20 bit times, but it is more likely in the real world that we have a different number that isn't zero modulo 20. 

I added three extra bit cells with data value 0 at the beginning of the simulation. The logic worked just fine, syncing properly solely based on a long string of 0 bits followed by a sequence 1 1 1 1 0 that represents the sync word. 

TESTED SHUTOFF OF WRITEGATE EARLY IN SECTOR

I first set the de-assertion of WriteGate at a random point in word 267 of a sector. The result was that the capture of that word was abandoned and the partial word was NOT captured to RAM. 

I then adjusted the drop to exactly the point after it had completed a word and put it to RAM, 

TESTED DELAYED SWITCHON OF WRITEGATE

This test assumes that for some reason the WriteGate is not turned on at the proper time at the start of a sector. This also delays the start of the clock phase A and phase B signals. I set up the start of this behavior at about 2 ms into the 10 millisecond sector time. The result on a real system is that the sector will extend past the start of the next sector which would corrupt that next sector. 

Indeed, with this test setup we were in the midst of writing word 268 when the Sector Mark starting the following sector (3) dropped, thus we did see the situation expected.

TESTED MALFORMED SYNC WORD BEHAVIOR

I wanted to verify that if we see any 1 bits in the stream before we read a valid sync word, we will react accordingly. That worked properly as well, locking the drive into the error condition where it would take a power cycle to resume operation. 

DISK MODELING MODULE IS CONSIDERED VERIFIED

Based on all the testing I could accomplish via simulation, the disk modeling side of the project is working as expected and ready for live testing on the IBM 1130. Time to move on to the logic that interfaces with the bridge for communication between the FPGA side and the ARM/Linux (HPS) side of the board.

Wednesday, February 1, 2023

Data capture during disk write is working, next to explore pathological cases

INPUT AND OUTPUT TEXT FILES CREATED

I set up a file stream.txt to hold the words to send to my bit capture state machine, each line consisting of 20 discrete bits coded as the ASCII characters 0 or 1. I set up an initial line of all zeros to reflect the end of the preamble and then the sync word pattern as the next line. Following that I coded 321 words of input to capture the words that would be transmitted if the disk controller were writing a full 321 word sector. 

Another file, got.txt, was written by the testbench under simulation, storing away each word as it is captured by my state machine. The output matched what I intended to be captured.

SOME ERROR TESTING COMPLETED SUCCESSFULLY

I set up some input words with incorrect error detection code bit patterns and these indeed led to the state machine locking into the error state. As well I attempted various lengths of preamble and ensured that the state machine still properly captured the sector's worth of data whether it was shorter or a bit longer than the target duration.

REMAINING TESTING FOR TOMORROW

I am going to verify the operation in various conditions:

  • Sync word is malformed
  • writegate turned off prematurely
  • writegate turned off in mid word
  • writegate asserted before sector starts

Setting up for verification of capture of data written by 1130 disk controller

BASIC SCHEME FOR CAPTURE OF DISK WRITES

The disk controller writes both clocking signals and data pulses to the disk drive when a write is active, signaled by the assertion of the WriteGate signal. The clock consists of sequences of 720 ns long pulses as phase A and phase B. Phase A is the interval when clock pulses are written on the disk and phase B is used to write a pulse if the data bit value is 1 else no pulse is sent.

The controller will begin the preamble of about 250 microseconds of zero data bits. That is, we have pulses on the WriteDataBit line during clock phase A but nothing during clock phase B times. This is used by the reading circuitry of a disk drive to align its data separator. That circuit in the drive routes some pulses out the ReadClock line and others out the ReadData line, based on whether they are occurring during the clock (phase A period) and the data (phase B) times. 

I will recognize this unambiguously because the disk controller sends the phase A and phase B signals along with the WriteDataBit pulses. Thus I don't need to do anything special to align with the clock versus data periods. My state machine just follows the signals of the two phases so that I know when a pulse arrives in the data (B) phase, it is time to move on to the sync word part of the sector. 

The sync word is the pattern 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 and this is what tells the disk controller reading the sector later where the breaks are between words in what is otherwise an undifferentiated stream of bits. After the end of the sync word, we know that the following bit cell is the beginning of word 1 of the sector and that every 20 bit cells after is the start of the next word. 

I will need to spot a pulse arriving in a data period (phase B), then cycle through three more data pulses and one phase B with no pulse to arrive at the beginning of the first word coming from the disk controller. My state machine will advance through the four 1 bits and the subsequent 0 bits, matching this exact sequence and going into an error state if the data values vary from 11110. 

My state machine will proceed into a loop of up to 321 words of data capture. For each word, we detect and shift in the sixteen bits of the transmitted word, while accumulating a count of 1 bits we have seen. Based on the count of 1 bits, we then detect and match the corresponding error detection code of four bitcells. The possible values we will see are 0000, 1000, 1100, or 1110 with any other pattern representing an error. 

After we shift in the 16 bits of a word and verify its error code is good, we will emit the write request along with the data word; this is passed to another module which writes that data word in the appropriate RAM location over on the ARM/linux side of the board. 

If I detect an error, either because the sync word pattern isn't correct or because the error code doesn't match what is sent by the disk controller, I trigger the disk drive into an error state. It no longer responds to commands and must be power cycled to resume operation. 

SIMULATION TESTBENCH TO PROVE OUT THE WRITE CAPTURE LOGIC

I used the testbench to produce the correct phase A and phase B clock signals once WriteGate is asserted, then injected various pulses on WriteDataBit to validate my state machine and functionality. A text file will let me feed various patterns of bits to the state machine. I will use this to verify that I deal properly with minor shifts in timing, such as a preamble that is not exactly 250 microseconds long. 

The file will also let me attempt both correct and improperly formatted sync words, along with data words that have both correct and incorrect error detection codes.