Category: coding

Writing and Understanding RISC-V Assembly Code

For quite a while I have followed the RISC-V ISA with growing intereset. Now that RISC-V is becoming more and more popular and catching a lot of public attention, it is time to get my hands dirty with some low level RISC-V assembly coding.

An instruction set architecture (ISA) is a specification of the commands/instructions a specific processor or processor architecture family does understand and that it can execute. For this tutorial we will be looking at the RISC-V ISA.
Assembly code (often abbreviated asm) is textual low-level program code, which can be directly translated to machine code. We will be working with the RISC-V assembly language.
The connection between an ISA and assembly code is such that assembly language/code is used to write a program for a corresponding ISA. In theory there could be more than one assembly language for the same ISA (like there are multiple high level programming language for the same machine architecture), but this is seldom the case (having different “flavors” of the same assembly language is more common).

I think there is no need to reproduce any details about the RISC-V ISA at this point. There are plenty of good sources online, the official specification of the RISCV user-level ISA of course, especially the chapters “Instruction Set Listings” and “RISC-V Assembly Programmer’s Handbook”, or one of the RISCV reference cards.

Another good tutorial for beginners, which is a little slow paced imho, is the Western Digital RISC-V ASM tutorial on Youtube (the tutorial really gets started with part 6). For beginners it is nice to see how the base addresses for peripherals can be found using the datasheet and how to “talk to” those peripheral blocks using assembly code.

Note that all examples presented in the following are created/run on Ubuntu 18.04 (WSL) and platformio is used for convenience (no need to set up the toolchains manually). On Ubuntu platformio can be obtained with apt:

$> sudo apt install platformio

Now lets start by creating a new folder for our platformio project and initializing it.

$> mkdir riscv-asm-examples
$> cd riscv-asm-examples 
$> platformio init

I use the following platformio.ini file to define the project parameters, since I have a HiFive1 board lying around somewhere. Feel free to use any other supported RISC-V board or framework instead. If you do not own a RISC-V board you can still run the assembler and have a look at the machine code. But I really recommend to get a board, the Longan Nano costs only 5$.
The platformio.ini file I am using is very simple:

$> cat ./platformio.ini
[env:freedom-e300-hifive1]
platform = riscv
board = freedom-e300-hifive1
framework = freedom-e-sdk

To check that platformio is installed and the platformio.ini file is correct run a sanity check using an empty main function:

$> echo "int main(void) {return 13;}" > ./src/main.c
$> platformio run

Running platformio for the first time can take a while because all the required packages and toolchains will be downloaded, so do not get impatient if its takes a little while.
If everything went fine a [SUCCESS] message should show up after a while.

Okay, the main function has been compiled. To see the assembler code which was produced we can used objdump -d. The -d option stands for disassemble, which means that the given object file will be “translated” to human readable assembler code.
The object file of interest is hidden away by platformio in .pio/build/freedom-e300-hifive1/src/ (folder may vary if you used a different build framework).
To disassemble the almost empty main function from before do like so:

$> objdump -d .pio/build/freedom-e300-hifive1/src/main.o
.pio/build/freedom-e300-hifive1/src/main.o:     file format elf32-little
 objdump: can't disassemble for architecture UNKNOWN!

Uh oh! Why isn’t it working? The tutorial said that this should work. Liar!
Here is why:

$> ls -lha $(which objdump)
lrwxrwxrwx 1 root root 24 May  8  2019 /usr/bin/objdump -> x86_64-linux-gnu-objdump*

The objdump we called is for the x86 ISA, not for RISC-V. We need to call the correct objdump executable from our RISC-V toolchain. The toolchain is hidden away by platformio inside the user’s home folder under ~/.platformio.

~/.platformio/packages/toolchain-riscv/riscv64-unknown-elf/bin/objdump -d .pio/build/freedom-e300-hifive1/src/main.o
Disassembly of section .text.startup:
 00000000 
:
    0:   4535                    li      a0,13
    2:   8082                    ret

Nice. The general structure of the assembly code is:

<memory_address>: <opcode>    <instruction> [<parameter(s)>]

So the assembly code above tells us that an immediate value 13 is loaded into register a0, then we return. Register a0 is the register which holds the function return value according to the RISC-V calling convention. The value 13 was given as return value in the main function. So this is really the assembly code corresponding to our main function. Hurrah!

To make things easier I recommend to set a symbolic link to the correct objdump executable, e.g. inside the platformio project folder.

$> ln -s ~/.platformio/packages/toolchain-riscv/riscv64-unknown-elf/bin/objdump objdump
$> ./objdump --version
GNU objdump (SiFive Binutils 2.32.0-2019.08.0) 2.32

OK, time to start with some assembly coding examples. Most examples are very minimalistic and use the main function as entry point from which the assembly code is invoced. The main goal, to show that assembly code can achieve the same things as C code, is proven nontheless.

Oh and to upload an example to your board (e.g. HiFive1) run the following command inside the project root directory (folder where platformio.ini is located):

$> platformio run -t upload

On my Windows 10 machine this only worked with VisualStudioCode (with the platformio plugin intalled). WSL did not work (USB driver stuff). Using a Linux VM should not pose any problem though.
Also the HiFive1 requires a push of the red reset button to reset the board and load the new program after it was uploaded.

Example 1: Superblink

This example just reproduces the WesternDigital RISC-V assembler tutorial from Youtube. Some GPIO pins are initialized and LEDs are toggled between on and off.
Check out the code on github.

Cycling through the RGB LEDs with superblink.

Example 2: Superfade

This example is an extension to superblink and uses PWM signals to control the intensity of LED’s. Three PWM channels are used to control the intensity of the 3 RGB color LEDs. One-by-one the LEDs are ramped up to full intensity and then reverted back to 0 intensity, creating a (incomplete) rainbow pattern.
Check out the code on github.

Fading the RGB LEDs with PWM signals.

Example 3: Simple UART Echo

This example waits to receive data in the UART RX FIFO and subsequently sends the same data out to the UART TX FIFO, thus echoing back any characters that are received.
Check out the code on github.

uartecho in action.

That’s all for the moment. See you next time.


References:

  1. https://github.com/riscv/riscv-asm-manual/blob/master/riscv-asm.md
  2. https://rv8.io/asm.html
  3. https://github.com/rv8-io/rv8
  4. https://www.youtube.com/watch?v=KLybwrpfQ3I
  5. https://content.riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf
  6. https://www.cl.cam.ac.uk/teaching/1617/ECAD+Arch/files/docs/RISCVGreenCardv8-20151013.pdf

Realizing Arbitrary Functions with ROM-Based Lookup Tables

When tasked with the implementation of a rather complex function, e.g. a polynomial of higher order, the resource utilization quickly shoots through the roof if implemented straight forward (also called the naïve implementation).
To avoid this it is often easier, simpler and faster to use a lookup table (LUT) solution.

Instead of doing a lot of calculations and mathematics, the results for a given function argument is just read from a read-only memory (ROM) which contains precalculated results.
Oftentimes a LUT/ROM based implementation can be used in place of a “proper” implementation during early prototyping. In a later stage of the project the LUT/ROM can be replaced with an optimized implementation.

Arbitrary function realized using a LUT/ROM

Most people will encounter a ROM based lookup table solution when dealing with sine and cosine functions. The technique is then often called direct digital synthesis (DDS), because a waveform is generated by digital logic instead of analog circuitry, as it was done in ancient times.
Since this is a more or less common task I wrote an Octave script that takes a function as input and generates a memory initialization file for a LUT/ROM solution. The parameter range of interest can be specified and the width and depth of the LUT/ROM can be defined.
Periodic functions will likely result in problems if not handled carefully. E.g. a lot of people will specify the parameter range for a sine LUT/ROM go from 0 to 2*Pi. However, since the boundaries of the interval are always part of the LUT/ROM the start/end value of the period will appear twice, once at the highest LUT/ROM address and once at the lowest LUT/ROM address (because sin(0) = sin(2*Pi) = 0). This gotcha does not hold for non-periodic functions. It can also easily be fixed.

The quality of the result this technique yields depends on both the LUT/ROM’s memory depth and the LUT/ROM’s word width. The former defines the number of available sampling points and thus the quantization of the function parameter(s). The latter defines the quantization of the function result, which means how close a single LUT/ROM data value is to the exact result of the function. Both kinds of quantization contribute to the error of the LUT/ROM implementation, i.e. the deviation of the LUT/ROMs result from the precise function result value.

The total size of the LUT/ROM is memory depth * word width bits.
Another performance characteristic is the maximum operating frequency under which an implementation can run. On FPGAs the maximum operating frequency will depend on the number of BRAMs required to realize the LUT ROM. If only one BRAM is used there is not need for additional routing or coupling glue logic and thus the operating frequency will be maximized.
Using more than one BRAM requires “chaining”/”coupling” of BRAMs and thus will reduce the maximum operating frequency due to additional routing and/or logic delays.

LUT/ROM composed of multiple BRAMs

STM32 Programming Entry

For the STM32 family of microcontrollers a number of different support libaries are available: Cortex Microcontroller Software Interface Standard (CMSIS), Standard Peripheral Library (SPL), Hardware Abstraction Layer (HAL).

CMSIS (by ARM) is basically just a bunch of header files with common defines for all registers of the microcontroller and its periperal devices. There are no API functions or drivers included at all. The CMSIS gives the programmer the most control and performance, but requires the most work.

SPL (by ST Microelectronics) consists of libraries allowing to use peripheral devices of the STM32 microcontroller much easier and abstracting away most of the wiggle work of register bits. The SPL is officially obsolete and will no longer be supported or maintained.

HAL (by ST Microelectronics) is the most recent library to support STM32 microcontrollers. It’s similar to SPL but offers some advanced features. Code written for HAL is supposed to be easily portable between different microcontrollers. On the other hand performance may not be at maximum since a lot of different hardware is supported by the same functions/API.

There is also a new thing called the Low Layer Library (LLL). The LLL defines different abstraction layers and allows the programmer to decide how much low level control she wants to take.

For more details check out the articles below.

References: https://blog.domski.pl/spl-vs-hal-which-one-should-you-use/

Notepad++ Column Mode

For a long time I did not like when people declare their ports like this in VHDL:

entity example is
  port (
    foo    : in        std_logic;
    bar    : out     std_logic;
    glarp : inout std_logic
  );

I was fine with vertical alignment along the colons, but the additional whitespace after the in/out keywords just looked horrific to me.
The same goes, to a less extent, for Verilog multibit wires/registers:

module example (
  input  wire [7:0] foo,
  output reg        bar
)

This all changed when I found out about column mode editing in Notepad++.
To enter column mode editing the Alt-key must be held while selecting text with the mouse. Since I know about this feature the vertical alignment of in/out keywords and port types makes perfect sense.
Only with the port types vertically aligned can I use column mode editing to quickly change the type of a bunch of ports.
Column mode editing has even more to offer, check out Edit &gt; Column Editor in Notepad++’s menu bar.

columnMode

Of course column mode editing can not replace all the super magic features of high-end (and high-price) IDEs like Sigasi. But it is a great help for all those who do not have access to expensive IDEs.

Ternary operator for VHDL

On occasion it can be super useful to have the ternary operator ? : at hand. Many programming languages like C have it and – without proof – I dare claim that it is much appreciated by many coders out there.

In VHDL there is no such thing. Instead a VHDL developer must always bring out the big gun and apply our beloved if-then-else(-endif) statement.

if (fifo_empty = '1') then
    read_fifo <= '0';
else
    read_fifo <= '1';
end if;

However, it would be really nice to have a ternary short-hand operator available instead sometimes.

Here’s a surrogate for the ternary operator which works without changing the VHDL LRM. It is a simple function ite, which is short for if-then-else 🙂 The function does nothing more but wrap an if-then-else statement. How ingenious of me.

function ite(b: boolean; x, y: integer) return integer is
begin
    if (b) then
        return x;
    else
        return y;
    end if;
end function ite;

Of course this function will only work for integer arguments, but it can easily be overloaded for other data types (I smell an application for a VHDL-2008 type generic).

 

P.S.: Verilog has a ternary operator.

The Latest and Greatest vs. Outdated But Robust and Reliable

There are often multiple choices when we are looking for a software tool to do <something&gt;. Some of the possible softwares may be quite archaic, have been around for decades and were used for many different tasks and in many projects throughout their lifetime. Others may be the latest newcomers, have not even hit version 1.0 yet, are built on top of brand new frameworks, libraries and APIs, and come with a modern look-and-feel.

Of course we always want the latest features which make our lives easier and our coding more productive.

On the other hand our code should remain portable among tool chains and vendors, so we are able to migrate easily (many reasons for why this could become necessary).

Following the first guideline will always restrict us to the minimum feature set offered by all vendors in question. Even if some vendors already support a certain feature it can not be used for compatibility reasons if another vendor does not support it yet. That’s because doing so could prevent migration to this vendor at a later point. Hoping that by then this feature will be supported everywhere is not an viable option.

In practice the decision is mostly made on a case-by-case basis, considering the value of a new feature against the danger of portability issues.

 

Semantic Versioning

A nice read about Semantic Versioning.

VHDL package use

Try to avoid

use work.mypack.all;

This can cause name conflicts, e.g. if two packages define a type of the same name. It is far less likely to have conflicting type and package names. Even if so, its a hell lot easier to fix, e.g. by renaming one of the packages.
This problem may be found when integrating 3rd party code as a black box (i.e. not knowing or caring whats inside).
Therefor choose package names as descriptive as possible, while keeping them reasonably short.

I am aware that name conflict are unlikely in VHDL. Very few widely used packages exist and the whole library concept is omitted by most developers. I bet 99% of digital design engineers never ever used a different library than work for their packages. But that’s no reason to ignore best practices.

What Is a Hardware Description Language Anyway?

What is a HDL?

A hardware description language (HDL) is a computer language used to describe the structure and behavior of digital (and sometimes analog) electronic circuits. A HDL may look quite similar to a traditional programming language and indeed will behave similar as well in some aspects, however in other aspects there are important differences that one must be aware of when writing HDL code.

This article is meant to give an extremely brief overview of HDLs, their origins, what’s so special about them and what state of the art HDLs of today can offer.

Where do HDLs originate from?

The origins of HDLs go back as far as the early 1970s when the complexity of digital integrated circuits was increasing dramatically and quickly became too much for traditional engineering workflows to handle. At that time the forefathers of todays HDLs stepped into play. These first HDLs were designed under the intention to allow circuit designers to create register-transfer level (RTL) descriptions from a high level perspective (for the time), independent of the technology (e.g. CMOS) used for production later on.

Main players: VHDL and Verilog

It was during the mid to late 1980s that the main players of today emerged in the industry: VHDL and Verilog. Even though both languages had been around in some form for a while then, it was the standardization by the IEEE which solidified their overall acceptance in the industries.

Since then numerous revisions of both standards where published, the latest being VHDL-2008 and SystemVerilog-2012.

What’s a HDL good for?

A HDL is a domain specific language, specifically designed to support the description of digital logic circuits and clock driven sequential logic. As such a HDL contains special constructs to enable the description of digital hardware and RTL elements.

One example of a hardware related language construct is the description of the rising edge of a clock signal, which is used to model the behavior of sequential circuits.

A D-Flipflop in VHDL would be described like the following:

process (clk) is
begin
    if rising_edge(clk) then
        q <= d;
    end if;
end process;

In Verilog on the other hand the very same D-Flipflop description looks like this:

always @ (posedge clk)
begin
    q <= d;
end

What do HDLs lack?

Usually a HDL alone is not sufficient to specify all aspects of a digital design, e.g. for an FPGA implementation (even more so for ASICs). A lot of meta information is required to turn HDL code into hardware. Some of this meta information can be presented in form of constraints which define physical or timing requirements the resulting hardware must fulfill. To describe these constraints special description languages exist outside the scope of HDLs. Some constraints can be embedded into the HDL code as well, but many engineers (including myself) like to keep the two things apart.

For a simple example, one constraint which practically all digital designs contain is the timing constraint for a clock signal. Such a clock constraint may look like the following (using XDC syntax, a Xilinx flavor of Synopsys SDC syntax):

create_clock -name "system_clock" -period 10.0 -waveform {5.0 5.0} [get_ports "sys_clk_i"]

Or using the older UCF syntax (hmm yes, 100% Xilinx proprietary non-portability):

NET "sys_clk_i" TNM_NET = system_clock;
TIMESPEC TS_system_clock = PERIOD "system_clock" 10 ns HIGH 50%;

Another area where HDLs have not reached their full potential yet is the huge field of verification. During simulation and verification a designer needs to create abstracted models and command sequences to see if the written HDL code behaves as expected. Writing such abstracted code can be very hard in traditional HDLs. Among the reasons for this are a lack of native support for highly abstracted code in HDLs and a lack of standard libraries (like in C or Python).

A lot of specialized verification languages contest to fill this gap, including e, OpenVera, SystemC and SystemVerilog.

Especially SystemVerilog tries to be a jack of all trades, trying to handle RTL coding, supporting various advanced forms of verification and offering a lot of high level constructs. However all these constructs have made SystemVerilog very complex, which in turn caused a lot of best practices to evolve, which constrict the allowed language subset to proven features and coding styles. Ironic.

The future of HDLs?

Maybe HDLs may become obsolete in the next decade. HDLs do not offer the same high productivity known from high level programming languages. This is partly due to the very limited support for abstraction and little code reuse and portability issues.

One approach to this problem is the idea of high level synthesis (HLS). Instead of extending and improving HDL languages to support more abstract constructs, why not use existing high level languages like C and adapt the implementation tools? A HLS tool will not gobble up HDL code, but instead accept C code written by a COTS programmer. The HLS tool will then do its best to create a hardware implementation which performs the same task as the program code.

In my opinion this approach sounds a lot better in theory than it works in practice (this may change). There are too many unsupported constructs and a lot of constraints must be tossed at the tools to get the result you want. Since the resulting hardware is not the most optimized, HLS may be seen as a trade of productivity vs. efficiency.

HLS languages include: SystemC (based on C++), Bluespec (based on Haskell), Chisel (based on Scala) and MyHDL (based on Python), among others.

A likely development could be that HLS is added as an additional layer on top of HDL code, similar to high level programming languages added as a top layer onto assembler code. If you want productivity write HLS code, if you need performance and efficiency write HDL code. Since most (all?) existing HLS tools do not directly output a netlist, but synthesizable HDL code, this forecast seems not all that wrong.

[references]

 

 

© bananatronics.org