Category: verilog

Anlogic TANG PriMER dev board

Recently I purchased a Sipeed TANG PriMER development board featuring an Anlogic EG4S20 FPGA (codenamed Eagle S20). The only reason I bought the board was to see what Anlogic FPGAs are capable of, since I had never heard of that FPGA vendor before. No need to think twice when the board costs less than 20$.

The TANG PriMER board is officially marketed as a RISC-V development board and comes with a Hummingbird E200 RISC-V softcore design preloaded into the onboard configuration flash. The Hummingbird is basically a slightly modified variation of the SiFive E2 core.
Setting up the tool chain was a bit of a hustle until I found this site which hosts both the TD IDE and the required license files. There are also some datasheets and schematics. Most of the official documentation is only available in Chinese, therefor I strongly recommend the inofficial english translation.

I have not done a lot with this board yet but verify the tool chain with a simple blink LED example. The design included a 32 bit counter which resulted in an estimated maximum frequency of 252 MHz. Not too bad. The TD IDE also comes with an IP wizard to generate IP cores, but it seems to just generate a wrapper for some primitive instantiations. It’s worth mentioning that the EG4S20 has an on-chip oscillator (250 or 266 MHz, documentation and IDE do not agree), on-chip SDRAM (64Mbit) and an 8-channel ADC (1MHz sample rate). Still need to figure out how to configure the board and which programmer can be used.

Since Anlogic FPGAs are not listed on or other distributor websites, it seems unlikely they will become widely available outside of China anytime soon.

Clock Enables vs. Multiple Clocks


In advanced FPGA systems which require different clock frequencies for different parts of the design, there is often a shortage of global clock buffers. Often several of the clocks are related (see below) and it becomes possible to use a single clock plus several clock enable signals, instead of several dedicated clocks. This article tries to shed some light on the impact these two alternatives can have on an FPGA system.
For the rest of this article, let’s assume all clocks C_i with frequency F_i are derived from the same reference clock C_ref with frequency F_ref and are strongly related, fulfilling the equation F_i = F_ref / D_i. This means that the frequency of a related clock is an integer fraction of the reference clock frequency. In that case things look a little different.
I use the terms strongly/weakly related to differentiate between the two basic ways the relationship of two clocks can be constituted. Weakly related clocks would be those which are linked by the equation F_i = F_ref * M_i / D_i. This means that all weakly related clocks have frequencies which are (possibly non-integer!) fractions of the reference clock frequency.
Note that asynchronous clocks or weakly related clocks have to be treated differently, and that clock enables as described here are not applicable for those.

Main Part

Virtually all FPGAs offer D-flip-flops which have an enable input, also called a clock enable (CE) since it controls the effect a rising or falling clock edge has on the content of the D-flip-flop. If the CE input is deasserted, changes to the D input of the flip-flop are not propagated to the Q output after an active clock edge. Only if the CE is asserted the value on the D input does propagate through to the Q output when an active clock edge arrives.
When N related clocks must be derived from one common reference clock there are two major options:

(1) Instantiate a PLL or DCM (Xilinx FPGA primitive) which uses the reference clock C_ref to generate all required clocks C_i. The reference clock C_ref has to be the clock with the highest frequency (see my constraints and assumptions above). The related clocks are generated by dividing the reference clock by an integer value. If a large number of related clocks are required this can lead to a dead end, because a PLL/DCM has a limit to the number of clocks it can generate (usually somewhere between 4-8).
This limitation could be circumvented by using one of the generated clocks C_i,1 of the first PLL/DCM as a reference clock C_ref,2 for a second PLL/DCM and then in turn use the second PLL/DCM to generate additional related clocks C_i,2. However, this will only work if the phase relationship between the reference clock of a PLL/DCM and the generated clock outputs can be adjusted, e.g. in general it may be required to adjust this phase offset to become 0 (or a value which is an integer multiple of the reference clock with the highest frequency).

Clock generation using one clock buffer per clock

(2) Generate only one clock signal, which has the highest frequency that is required. All other related clocks would be obtained by dividing this reference clock frequency by an integer multiple (as explained in 1) above). However, instead of dividing the reference clock, a clock enable signal CE_i is created, which is only active every D_i-th clock cycle. This clock enable signal serves as an enable for all flip-flops which would be located in the domain of the corresponding related clock C_i. This way a clock enable signal CE_i for each related clock C_i can be created.
So there is only one primary clock signal and all other “clocks” are logically represented by an enable signal which is only asserted every other clock cycle.
The tricky part then is to tell the timing analyzer tool to treat the clock enables correctly, so the place and route tool is aware of the timing requirements. Otherwise the design will be over-constrained, since all clock domains are treated as if they had the same frequency (which would be the maximum frequency). This would result in unnecessarily strict timing requirements for all the logic which would normally run at a lower frequency. Thus timing closure will be more difficult to achieve.

Clock generation with multiple clock enables using a single clock buffer


The decision to use multiple clocks over a single clock plus clock enables boils down to a resource trade-off.
On the one hand, multiple PLLs/DCMs and multiple (global) clock buffers are used to generate multiple related clocks. This requires more (global) clocking resources, but no additional fabric resources at all. Each clock domain is defined by a physical clock signal.
On the other hand, only one global clock with one global clock buffer is used. The different clock domains are logically defined by means of enable signals which are only asserted every other clock cycle. This approach requires more fabric resources (LUTs, CLBs, FFs) to generate and distribute the enable signal nets which define the various clock domains. On the other hand only a single clock must be generated which saves global clocking resources.
Bingo bango there you have it.

Notepad++ Column Mode

For a long time I did not like when people declare their ports like this in VHDL:

entity example is
  port (
    foo    : in        std_logic;
    bar    : out     std_logic;
    glarp : inout std_logic

I was fine with vertical alignment along the colons, but the additional whitespace after the in/out keywords just looked horrific to me.
The same goes, to a less extent, for Verilog multibit wires/registers:

module example (
  input  wire [7:0] foo,
  output reg        bar

This all changed when I found out about column mode editing in Notepad++.
To enter column mode editing the Alt-key must be held while selecting text with the mouse. Since I know about this feature the vertical alignment of in/out keywords and port types makes perfect sense.
Only with the port types vertically aligned can I use column mode editing to quickly change the type of a bunch of ports.
Column mode editing has even more to offer, check out Edit > Column Editor in Notepad++’s menu bar.


Of course column mode editing can not replace all the super magic features of high-end (and high-price) IDEs like Sigasi. But it is a great help for all those who do not have access to expensive IDEs.

Simulation Advice

Here is some general advice for simulation of HDL code. No respect is paid to verification methodologies like UVM or OSVVM. Most of it is obvious, but it helps my memory when I write these things down.

  • Use assert statements to catch error events. Your eyes can miss even the most obvious error when scanning over some simulation waveforms after a long day in front of the screen.
  • Use log files and/or report statements to save information about the status and progress of simulation, errors or any other noteworthy event. This will speed up the task of locating events of interest and will allow you to do text searches over those files.
  • Use colors, the right radix for numbers and hierarchical structure in your waveform viewer for optimal data representation. Unless you prefer to look at heaps of green lines with loads of 0’s and 1’s around them.
  • Save your simulator/waveform settings. At some point you will come back and won’t have to repeat the tedious task of setting up a neat waveform view. The simulator/waveform settings should also go into the repository (separate folder for each simulator), but it’s a matter of taste.


Testbench != Simulation

There is a difference between testbench files and simulation files:
Testbench files are independent of the simulator and comparable tools. They include testbench configuration files, test case descriptions or stimuli and golden reference output files.

Simulation files are setup and command files for a simulator or similar vendor tool. Related configuration files include waveform settings and simulation scripts.

A project’s folder structure should also draw this distinction to keep simulator/vendor independent files separate. Log files and results should therefor be put in an output folder alongside the testbench files.


What Is a Hardware Description Language Anyway?

What is a HDL?

A hardware description language (HDL) is a computer language used to describe the structure and behavior of digital (and sometimes analog) electronic circuits. A HDL may look quite similar to a traditional programming language and indeed will behave similar as well in some aspects, however in other aspects there are important differences that one must be aware of when writing HDL code.

This article is meant to give an extremely brief overview of HDLs, their origins, what’s so special about them and what state of the art HDLs of today can offer.

Where do HDLs originate from?

The origins of HDLs go back as far as the early 1970s when the complexity of digital integrated circuits was increasing dramatically and quickly became too much for traditional engineering workflows to handle. At that time the forefathers of todays HDLs stepped into play. These first HDLs were designed under the intention to allow circuit designers to create register-transfer level (RTL) descriptions from a high level perspective (for the time), independent of the technology (e.g. CMOS) used for production later on.

Main players: VHDL and Verilog

It was during the mid to late 1980s that the main players of today emerged in the industry: VHDL and Verilog. Even though both languages had been around in some form for a while then, it was the standardization by the IEEE which solidified their overall acceptance in the industries.

Since then numerous revisions of both standards where published, the latest being VHDL-2008 and SystemVerilog-2012.

What’s a HDL good for?

A HDL is a domain specific language, specifically designed to support the description of digital logic circuits and clock driven sequential logic. As such a HDL contains special constructs to enable the description of digital hardware and RTL elements.

One example of a hardware related language construct is the description of the rising edge of a clock signal, which is used to model the behavior of sequential circuits.

A D-Flipflop in VHDL would be described like the following:

process (clk) is
    if rising_edge(clk) then
        q <= d;
    end if;
end process;

In Verilog on the other hand the very same D-Flipflop description looks like this:

always @ (posedge clk)
    q <= d;

What do HDLs lack?

Usually a HDL alone is not sufficient to specify all aspects of a digital design, e.g. for an FPGA implementation (even more so for ASICs). A lot of meta information is required to turn HDL code into hardware. Some of this meta information can be presented in form of constraints which define physical or timing requirements the resulting hardware must fulfill. To describe these constraints special description languages exist outside the scope of HDLs. Some constraints can be embedded into the HDL code as well, but many engineers (including myself) like to keep the two things apart.

For a simple example, one constraint which practically all digital designs contain is the timing constraint for a clock signal. Such a clock constraint may look like the following (using XDC syntax, a Xilinx flavor of Synopsys SDC syntax):

create_clock -name "system_clock" -period 10.0 -waveform {5.0 5.0} [get_ports "sys_clk_i"]

Or using the older UCF syntax (hmm yes, 100% Xilinx proprietary non-portability):

NET "sys_clk_i" TNM_NET = system_clock;
TIMESPEC TS_system_clock = PERIOD "system_clock" 10 ns HIGH 50%;

Another area where HDLs have not reached their full potential yet is the huge field of verification. During simulation and verification a designer needs to create abstracted models and command sequences to see if the written HDL code behaves as expected. Writing such abstracted code can be very hard in traditional HDLs. Among the reasons for this are a lack of native support for highly abstracted code in HDLs and a lack of standard libraries (like in C or Python).

A lot of specialized verification languages contest to fill this gap, including e, OpenVera, SystemC and SystemVerilog.

Especially SystemVerilog tries to be a jack of all trades, trying to handle RTL coding, supporting various advanced forms of verification and offering a lot of high level constructs. However all these constructs have made SystemVerilog very complex, which in turn caused a lot of best practices to evolve, which constrict the allowed language subset to proven features and coding styles. Ironic.

The future of HDLs?

Maybe HDLs may become obsolete in the next decade. HDLs do not offer the same high productivity known from high level programming languages. This is partly due to the very limited support for abstraction and little code reuse and portability issues.

One approach to this problem is the idea of high level synthesis (HLS). Instead of extending and improving HDL languages to support more abstract constructs, why not use existing high level languages like C and adapt the implementation tools? A HLS tool will not gobble up HDL code, but instead accept C code written by a COTS programmer. The HLS tool will then do its best to create a hardware implementation which performs the same task as the program code.

In my opinion this approach sounds a lot better in theory than it works in practice (this may change). There are too many unsupported constructs and a lot of constraints must be tossed at the tools to get the result you want. Since the resulting hardware is not the most optimized, HLS may be seen as a trade of productivity vs. efficiency.

HLS languages include: SystemC (based on C++), Bluespec (based on Haskell), Chisel (based on Scala) and MyHDL (based on Python), among others.

A likely development could be that HLS is added as an additional layer on top of HDL code, similar to high level programming languages added as a top layer onto assembler code. If you want productivity write HLS code, if you need performance and efficiency write HDL code. Since most (all?) existing HLS tools do not directly output a netlist, but synthesizable HDL code, this forecast seems not all that wrong.