Category: fpga Page 1 of 2

First Steps with the Tang Nano FPGA Development Board

The Tang Nano is a very very low cost FPGA development board by Sipeed featuring a GW1N-1-LV FPGA produced by GOWIN Semiconductors. GOWIN is another Chinese chip manufacturer entering the FPGA arena, like Efinix and Anlogic.
The GW1N-1-LV is the smallest member of GOWINs “Little Bee” series, which consists of small footprint instant-on FPGA devices for IoT and interfacing solutions.

The Tang Nano FPGA development board

The Tang Nano offers the following features:

  • GW1N-1-LV FPGA (QFN48 package)
    • 1152 LUT’s
    • 864 FF’s
    • 4 BRAM’s à 18 kbit (72 kbit total)
    • 96 kbit flash memory
    • 1 PLL
  • 34 user IO pins
  • LCD interface connector (ZIF FFC)
  • 2 user buttons
  • 24 MHz oscillator
  • 64 MBit QSPI PSRAM
  • USB-JTAG downloader/debugger (via USB-C connector)
  • 4 pin JTAG header (split into 2 x 2 pins left/right of the USB-C connector)
  • costs just about 5 $ (like the Longan Nano)

Like most of Sipeed’s latest development boards the Tang Nano comes with a USB-C connector, thus a USB-C cable is required to use the board without unnecessary tinkering.
Regarding the LCD interface I am unsure how this would be useful, except for generating some fancy test patterns. Almost all user IO pins are utilized by the LCD interface if it is used, so there are not many pins left which could be used to get a video signal into the FPGA anymore.

The Toolchain

All steps from HDL to bitstream are handled by GOWIN FPGA Designer, a graphical IDE which offers a project based design flow. Both VHDL and Verilog are supported (allegedly up to SystemVerilog-2017).
Timing constraints are defined by industry standard SDC files, however it is not clear which subset of SDC commands is supported.
Pin assignments and IO constraints are defined in a proprietary CST file. Alternatively UCF syntax is supported, as known from Xilinx ISE.

The design flow is very simple and offers just the most basic options.
The synthesis tool can be set to either GowinSynthesis (included with GOWIN FPGA Designer) or Synplify Pro (external synthesis tool by Synopsys). I strongly doubt that anyone will use Synplify Pro for GOWIN FPGA’s. In the following GowinSynthesis is assumed to be the selected synthesis engine.
The PAR step is simple yet effective as well, however I did not test (yet) how PAR performs once the device utilization reaches a higher percentage.
Timing analysis and power analysis reports can be generated once implementation is complete.
GOWIN FPGA Designer also comes with a programmer utility which can be used to program the Tang Nano and read out some device IDs and status registers. The FPGA configuration bitstream can be programmed into embedded flash memory, into external flash memory or can be written directly to the (volatile) SRAM cells of the FPGA.

The Example

To test that the Tang Nano board is working properly I implemented the infamous blinky LED example.

The Tang Nano offers 2 different clock sources: an on-board 24 MHz oscillator and an integrated oscillator running at roughly 240 MHz.
The external 24 MHz clock reference should be used in conjunction with the PLL inside the FPGA, in order to generate an adequate clock signal to drive the logic on the FPGA fabric. For the start, the PLL is generated by using the IP Generator tool to get the correct parameters for the PLL. However, the 24 MHz reference clock can also be used to drive logic directly, without using the PLL.

/* On-Chip PLL 108 MHz. *****************************************************/
wire clk_108M;
wire pll_lock;
wire pll_reset = 1'b0;

assign clk_24M = XTAL_IN;

Gowin_rPLL Gowin_rPLL_inst(
    .clkout(clk_108M),
    .lock(pll_lock),
    .reset(pll_reset),
    .clkin(clk_24M));

To make use of the integrated oscillator another IP core is generated with the IP Generator tool. Alternatively the respective device primtive can be instantiated explicitly.

/* On-Chip Oscillator 2.5 MHz (240 MHz / 96). ********************************/
wire clk_2M5;
Gowin_OSC_div96 Gowin_OSC_div96_inst (.oscout(clk_2M5));

Finally one of the generated clocks can be used to drive a counter, which will get the RGB LED blinking.

/* Blink RGB LED. ***********************************************************/
localparam CNT_LEN = 26;
reg [CNT_LEN-1:0] cnt = 0;

always @ (posedge clk_108M, negedge rstn)
begin
    if (rstn == 1'b0) begin
        cnt <= 'd0;
    end else begin
        cnt <= cnt + 1;
    end
end

always @ (*)
begin
    LED_R <= ~cnt[CNT_LEN-1];
    LED_G <= ~cnt[CNT_LEN-2];
    LED_B <= ~cnt[CNT_LEN-3];
end

I use the 3 MSB’s of the counter to blink the red, green and blue color channel of the RGB LED respectively. Thus the RGB LED will cycle through all possible combinations of colors. The blinking speed may vary depending on the clock driving the counter.

The final step to get the blinky LED design running on the Tang Nano is to define the pin assignment and timing constraints.
The timing constraints consist of two create_clock commands for the 24 MHz reference clock input and the internal oscillator, which is set to 2.5 MHz.

# tang_nano.sdc
create_clock -name CLK -period 41.667 [get_ports {XTAL_IN}]
create_clock -name SLOW -period 400 [get_pins {Gowin_OSC_div96_inst/osc_inst.OSCOUT}]

The pin assignments can be created using the Floorplanner tool and the Tang Nano schematic. For simplicity the constraints can be copied over from the Tang Nano examples repository or my own example code (file tang_nano.cst). Note that I tried to stay as close to the net names of the schematic while naming the top level ports.

// tang_nano.cst
//Copyright (C)2014-2019 Gowin Semiconductor Corporation.
//All rights reserved. 
//File Title: Physical Constraints file
//GOWIN Version: V1.9.3Beta
//Part Number: GW1N-LV1QN48C6/I5
//Created Time: Tue 12 24 23:51:04 2019

IO_LOC "USER_BTN_A" 15;
IO_PORT "USER_BTN_A" IO_TYPE=LVCMOS33;
IO_LOC "XTAL_IN" 35;
IO_PORT "XTAL_IN" IO_TYPE=LVCMOS33;
IO_LOC "USER_BTN_B" 14;
IO_PORT "USER_BTN_B" IO_TYPE=LVCMOS33;
IO_LOC "LCD_R[0]" 27;
IO_PORT "LCD_R[0]" IO_TYPE=LVCMOS33;
IO_LOC "LCD_R[1]" 28;
IO_PORT "LCD_R[1]" IO_TYPE=LVCMOS33;
IO_LOC "LCD_R[2]" 29;
IO_PORT "LCD_R[2]" IO_TYPE=LVCMOS33;
IO_LOC "LCD_R[3]" 30;
IO_PORT "LCD_R[3]" IO_TYPE=LVCMOS33;
IO_LOC "LCD_R[4]" 31;
IO_PORT "LCD_R[4]" IO_TYPE=LVCMOS33;
IO_LOC "LCD_G[0]" 32;
IO_PORT "LCD_G[0]" IO_TYPE=LVCMOS33;
IO_LOC "LCD_G[1]" 33;
IO_PORT "LCD_G[1]" IO_TYPE=LVCMOS33;
IO_LOC "LCD_G[2]" 34;
IO_PORT "LCD_G[2]" IO_TYPE=LVCMOS33;
IO_LOC "LCD_G[3]" 38;
IO_PORT "LCD_G[3]" IO_TYPE=LVCMOS33;
IO_LOC "LCD_G[4]" 39;
IO_PORT "LCD_G[4]" IO_TYPE=LVCMOS33;
IO_LOC "LCD_G[5]" 40;
IO_PORT "LCD_G[5]" IO_TYPE=LVCMOS33;
IO_LOC "LCD_B[0]" 41;
IO_PORT "LCD_B[0]" IO_TYPE=LVCMOS33;
IO_LOC "LCD_B[1]" 42;
IO_PORT "LCD_B[1]" IO_TYPE=LVCMOS33;
IO_LOC "LCD_B[2]" 43;
IO_PORT "LCD_B[2]" IO_TYPE=LVCMOS33;
IO_LOC "LCD_B[3]" 44;
IO_PORT "LCD_B[3]" IO_TYPE=LVCMOS33;
IO_LOC "LCD_B[4]" 45;
IO_PORT "LCD_B[4]" IO_TYPE=LVCMOS33;
IO_LOC "LCD_CLK" 11;
IO_PORT "LCD_CLK" IO_TYPE=LVCMOS33;
IO_LOC "LCD_DE" 5;
IO_PORT "LCD_DE" IO_TYPE=LVCMOS33;
IO_LOC "LCD_HSYNC" 10;
IO_PORT "LCD_HSYNC" IO_TYPE=LVCMOS33;
IO_LOC "LCD_VSYNC" 46;
IO_PORT "LCD_VSYNC" IO_TYPE=LVCMOS33;
IO_LOC "LED_R" 16;
IO_PORT "LED_R" IO_TYPE=LVCMOS33 SLEW_RATE=SLOW;
IO_LOC "LED_G" 17;
IO_PORT "LED_G" IO_TYPE=LVCMOS33 SLEW_RATE=SLOW;
IO_LOC "LED_B" 18;
IO_PORT "LED_B" IO_TYPE=LVCMOS33 SLEW_RATE=SLOW;
GOWIN FPGA Designer: Floorplanner

So far so good. The only thing left now is to generate a programming file for the Tang Nano and toss it on the board.

GOWIN FPGA Designer: Programmer

All done. Time to get impressed by the pinnacle of modern human civilization: a blinking LED.

The Tang Nano skillfully flashing its RGB LED

As usual, the example from this article is available on my Tang Nano github repository.

That’s it for today. See you next time.


References:

  1. https://www.seeedstudio.com/Sipeed-Tang-Nano-FPGA-board-powered-by-GW1N-1-FPGA-p-4304.html
  2. https://github.com/sipeed/Tang-Nano-Doc
  3. https://github.com/sipeed/Tang-Nano-examples
  4. http://dl.sipeed.com/TANG/Nano
  5. https://xesscorp.github.io/tang_nano_user/docs/_site/
  6. https://www.gowinsemi.com/en/product/detail/2/

Gigabit Transceiver(s) for a Cheap FPGA Development Board

There are a lot of FPGA development boards out there to buy. Official vendor boards with the latest advanced devices on it can easily cost several thousand Euros.
Hobbyists and makers are more interested in FPGA development boards within an affordable price range (roughly << 100 $/€). The logic resources and feature set of the FPGA devices on these boards is not that important on the other hand. The main application for makers/hobbyists is small projects and self-learning, I assume, and not rolling out their own 5G equipment.

There are already a lot of affordable entry level FPGA boards available and more are being released every year. The one thing missing on these boards is usually gigabit transceivers (GT’s). Of course the cheapest FPGA devices do not come with pricey extras like GT’s, but is the price difference really that big?

So I started to search for low cost FPGA devices which include GT’s. I only looked at the three major vendors: Xilinx, Intel/Altera and Lattice. I tried to focus on the latest FPGA families in the low cost segment. The prices from Mouser are as of December 2019 and are valid for single quantity purchase.

Intel/Altera Cyclone IV GX EP4CGX15 with two 2.5 Gbps GT’s
Xilinx Spartan-6 XC6SLX25T with two 3.2 Gbps GT’s
Xilinx Artix-7 XC7A12T with two 6.6 Gbps GT’s
Lattice ECP5 LFE5UM-25 with two 3.2 Gbps GT’s

The comparison shows that Lattice is the first choice when aiming for a low cost FPGA with GT’s. The prices are only from one distributor and only for single quantities, so the price sample must be taken with a grain of salt.
Interestingly the old Spartan-6 is much more expensive than the newer Artix-7.

However, there are other aspects to consider than just the price and transceiver speed, e.g. if a PCIe endpoint IP is available for free or not (the same goes for all IP cores utilizing the GT’s). Without the support of IP cores and tools the GT’s won’t be any good anyhow.

An interesting board announced on CrowdSupply right now is a new member of the TinyFPGA series, the TinyFPGA EX with a EX85-5G FPGA. This one will have two GT’s with up to 5 Gbps. Looking at the board layout it’s not clear to me if the full capability of the GT’s will be usable in the end, because there are only plain pin headers for board IO, no BNC or F-conncetors.
The end user price has not been announced yet. Let’s wait and see what the GT’s are going to cost in a “commercial” product.


References:

  1. https://www.mouser.de/Search/Refine?Keyword=LFE5UM-25&Ns=Pricing%7C0
  2. https://www.mouser.de/_/?Keyword=XC7A12T&Ns=Pricing%7c0&FS=True
  3. https://www.mouser.de/Intel/Semiconductors/Programmable-Logic-ICs/FPGA-Field-Programmable-Gate-Array/_/N-3oh9p?P=1yy6lwuZ1y9hztvZ1yy1mfxZ1yy1mdnZ1yy1mfwZ1yy1mcsZ1yy1mcqZ1yy1mcp&Ns=Pricing%7C0
  4. https://www.mouser.de/datasheet/2/225/FPGA-DS-02012-2-1-ECP5-ECP5G-Family-Data-Sheet-1022822.pdf
  5. https://www.xilinx.com/support/documentation/data_sheets/ds180_7Series_Overview.pdf
  6. https://www.mouser.de/datasheet/2/612/cv_51001-1098989.pdf
  7. https://www.mouser.de/Xilinx/Semiconductors/Programmable-Logic-ICs/FPGA-Field-Programmable-Gate-Array/_/N-3oh9p?P=1y8eeklZ1y8efd7&Keyword=XC6SLX25T&Ns=Pricing%7c0
  8. https://www.xilinx.com/support/documentation/data_sheets/ds160.pdf

First Steps with the iCEBreaker FPGA Development Board

The iCEBreaker board is the first FPGA development board with a fully open-source toolchain, which allows to go all the way from HDL code to configuration bitstream. All the schematics and hardware information is openly available at no extra cost.

The Board

The iCEBreaker board features a Lattice iCE40UP5K FPGA with the following integrated components:

  • 5280 PLB’s consisting of one 4-input LUT, carry-chain and one FF each
  • 128 Mbit dual-port BRAM’s
  • 1 MBit single-port BRAM’s
  • 8 DSP’s
  • 1 programmable PLL and 2 on-chip oscillators
  • 2 hard-IP’s for SPI and I2C each
  • 3 hard-IP’s for PWM
  • Up to 32 user IO pins. The IO pins are available as 3 PMOD edge connectors
  • One PMOD module with 3 push-buttons and 5 LEDs is included
  • On-board FTDI FT2232H for easy FPGA configuration and debugging via USB
  • 128 Mbit Winbond QSPI flash, programmable over USB or via a separate SPI pin-header
Overview of the iCEBreaker.

The Toolchain

The remainder of this article will guide through the procedure of setting up the open-source toolchain for the iCEBreaker. No Lattice tools and software will be used, maybe that will be covered another time. In a final step the toolchain will be tested by implementing and loading a simple example design. The OS will be asusmed to be Ubuntu 18.04 for the rest of this article.

Let’s begin by installing the dependencies which are required to build the toolchain. The command below will install the superset of all dependencies which are required for the steps that follow.

$> sudo apt install build-essential clang bison flex libreadline-dev gawk tcl-dev libffi-dev git mercurial graphviz xdot pkg-config python python3 libboost-system-dev libboost-python-dev libboost-filesystem-dev libftdi-dev qt5-default python3-dev libboost-all-dev cmake libeigen3-dev

Next clone the github repository for yosys, the synthesis tool of the toolchain. The latest stable release/tag of yosys is recommended, but instead the latest master branch can also be used.

$> git clone https://github.com/yosyshq/yosys
$> cd yosys
$> git checkout yosys-v0.9
$> make -j(nproc)
$> sudo make install

Continue with the bitstream generation tool icestorm.

git clone https://github.com/cliffordwolf/icestorm
cd icestorm
make -j$(nproc)
sudo make install

Finally we have to choose between the place-and-route tool nextpnr and arachne-pnr. Since arachne-pnr is considered obsolete it’s successor nextpnr should be used. Since I like to play around and compare results between different tools I will install both, but note that only one of the two is required.

$> git clone https://github.com/yosyshq/nextpnr
$> cd nextpnr
$> cmake -DARCH=ice40
$> make -j$(nproc)
$> sudo make install
$> git clone https://github.com/yosyshq/arachne-pnr
$> cd arachne-pnr
$> make -j$(nproc)
$> sudo make install

That concludes the setup of the toolchain. Time for an example, to see if everything works.

The Example

Now that we are all set, let’s try to synthesize a simple example design, run the place-and-route tool and generate a bitstream for the iCEBreaker.

$> yosys -p "synth_ice40 -top icebreaker_top -blif icebreaker_top.blif" src/icebreaker_top.v
$> arachne-pnr -d 5k icebreaker_top.blif -o icebreaker_top.asc
$> icepack icebreaker_top.asc icebreaker_top.bin
$> iceprog icebreaker_top.bin

Aaaand the board is bricked (for the moment)! The iCEBreaker no longer is properly recognized neither by Ubuntu nor my Windows 10 machine. It took me a few moments to understand what had gone wrong, but not close to as long as I feared it would take.
Well, the problem was that no physical constraint file (.pcf) was passed to arachne-pnr. The .pcf file defines which top module port is connected to which physical package pin. Without this information arachne-pnr will choose these pins assignments arbitrarily, like most place-and-route tools do in such a situation. After asking arachne-pnr to output the pin assignments it had chosen (-w option) the problem became very clear.

$> arachne-pnr -d 5k icebreaker_top.blif -o icebreaker_top.asc -w icebreaker_top.asc.pcf
... (some irrelevant output)
$> cat icebreaker_top.asc.pcf 
$> arachne-pnr 0.1+328+0 (git sha1 c40fb22, g++ 7.4.0-1ubuntu1~18.04.1 -O2)
set_io led_o 35

Cross-checking this with the iCEBreaker pinout-sheet or schematic and we can see that pin 35 is actually connected to the external 12 MHz oscillator output. Unfortunately this oscillator also drives the FTDI chip (FT2232) which takes care of the USB connection to the outside world. Driving a dumb blinky LED signal on that pin will wreak havoc on the 12 MHz clock signal and mess with the FTDI chip.
To fix this issue the CRESET jumper on the board must be closed. Setting the CRESET jumper will keep the FPGA in reset. This way no output pin of the FPGA is driven and we have time to replace the bitstream in the SPI flash with one that doesn’t do any harm.

So let’s try again, this time with the correct .pcf file.

$> arachne-pnr -d 5k icebreaker_top.blif -o icebreaker_top.asc -p constr/icebreaker_top.pcf -w icebreaker_top.asc.pcf
... (some irrelevant output)
$> icepack icebreaker_top.asc icebreaker_top.bin
$> iceprog icebreaker_top.bin 
Behold, the iCEBreaker blinking like a boss! Imagine the possibilities…

The example from this article – and hopefully some more advanced stuff in the near future too – will be available on my iCEBreaker github repository.

That’s it for today. See you next time.


References:

  1. https://github.com/icebreaker-fpga/icebreaker
  2. https://github.com/icebreaker-fpga/icebreaker-examples
  3. https://github.com/YosysHQ/yosys
  4. http://www.clifford.at/icestorm/#install

Getting Started with GHDL

If you haven’t heard of GHDL, it is *the* free open-source VHDL simulator out there.
GHDL stand for “G Hardware Description Language” (the G is without meaning). GHDL is mainly implemented in Ada and can be build with different backends: mcode, LLVM and GCC. The different backends provide different performance levels and vary in build complexity. I recommend LLVM since it performs well and is still quite straight forward to build. Building GHDL from latest sources from its github project is probably the best way to go.

Despite its free nature GHDL provides very good support for all major VHDL-LRM releases: VHDL-1987/1993/200X/2008(partial). Unforunately GHDL is a pure VHDL simulator, so there is no support for Verilog at all. This is understandable as there are already some very good simulators for Verilog out there.

Compiling GHDL

The following guide assumes a Ubuntu 18.04 environment (either a native installation or docker or WSL will do).
Clone the latest GHDL sources or any stable release from github:
$&gt; git clone https://github.com/ghdl/ghdl
Decent into the ghdl working copy and run the configure script with options to use LLVM as backend (–with-llvm-config) and a custom install path (–prefix):
$&gt; cd ghdl
$&gt; ./configure --prefix=/opt/ghdl-llvm --with-llvm-config

Install some dependencies:
$&gt; sudo apt install -y bison flex
Afterwards you can build and install ghdl:
$&gt; make
$&gt; make install

The ghdl main executable is located at /opt/ghdl-llvm/bin/ghdl (the path given to the –prefix option). I usually create a symbolic link to make the ghdl command directly available in the $PATH:
$&gt; ln -s /opt/ghdl-llvm/bin/ghdl /usr/bin/ghdl

That’s it for now. If you are familiar with docker, there is an easy to use docker image for ghdl available on dockerhub.

Anlogic TANG PriMER dev board

Recently I purchased a Sipeed TANG PriMER development board featuring an Anlogic EG4S20 FPGA (codenamed Eagle S20). The only reason I bought the board was to see what Anlogic FPGAs are capable of, since I had never heard of that FPGA vendor before. No need to think twice when the board costs less than 20$.

The TANG PriMER board is officially marketed as a RISC-V development board and comes with a Hummingbird E200 RISC-V softcore design preloaded into the onboard configuration flash. The Hummingbird is basically a slightly modified variation of the SiFive E2 core.
Setting up the tool chain was a bit of a hustle until I found this site which hosts both the TD IDE and the required license files. There are also some datasheets and schematics. Most of the official documentation is only available in Chinese, therefor I strongly recommend the inofficial english translation.

I have not done a lot with this board yet but verify the tool chain with a simple blink LED example. The design included a 32 bit counter which resulted in an estimated maximum frequency of 252 MHz. Not too bad. The TD IDE also comes with an IP wizard to generate IP cores, but it seems to just generate a wrapper for some primitive instantiations. It’s worth mentioning that the EG4S20 has an on-chip oscillator (250 or 266 MHz, documentation and IDE do not agree), on-chip SDRAM (64Mbit) and an 8-channel ADC (1MHz sample rate). Still need to figure out how to configure the board and which programmer can be used.

Since Anlogic FPGAs are not listed on digikey.com or other distributor websites, it seems unlikely they will become widely available outside of China anytime soon.

Clock Enables vs. Multiple Clocks

Introduction

In advanced FPGA systems which require different clock frequencies for different parts of the design, there is often a shortage of global clock buffers. Often several of the clocks are related (see below) and it becomes possible to use a single clock plus several clock enable signals, instead of several dedicated clocks. This article tries to shed some light on the impact these two alternatives can have on an FPGA system.
For the rest of this article, let’s assume all clocks C_i with frequency F_i are derived from the same reference clock C_ref with frequency F_ref and are strongly related, fulfilling the equation F_i = F_ref / D_i. This means that the frequency of a related clock is an integer fraction of the reference clock frequency. In that case things look a little different.
I use the terms strongly/weakly related to differentiate between the two basic ways the relationship of two clocks can be constituted. Weakly related clocks would be those which are linked by the equation F_i = F_ref * M_i / D_i. This means that all weakly related clocks have frequencies which are (possibly non-integer!) fractions of the reference clock frequency.
Note that asynchronous clocks or weakly related clocks have to be treated differently, and that clock enables as described here are not applicable for those.

Main Part

Virtually all FPGAs offer D-flip-flops which have an enable input, also called a clock enable (CE) since it controls the effect a rising or falling clock edge has on the content of the D-flip-flop. If the CE input is deasserted, changes to the D input of the flip-flop are not propagated to the Q output after an active clock edge. Only if the CE is asserted the value on the D input does propagate through to the Q output when an active clock edge arrives.
When N related clocks must be derived from one common reference clock there are two major options:

(1) Instantiate a PLL or DCM (Xilinx FPGA primitive) which uses the reference clock C_ref to generate all required clocks C_i. The reference clock C_ref has to be the clock with the highest frequency (see my constraints and assumptions above). The related clocks are generated by dividing the reference clock by an integer value. If a large number of related clocks are required this can lead to a dead end, because a PLL/DCM has a limit to the number of clocks it can generate (usually somewhere between 4-8).
This limitation could be circumvented by using one of the generated clocks C_i,1 of the first PLL/DCM as a reference clock C_ref,2 for a second PLL/DCM and then in turn use the second PLL/DCM to generate additional related clocks C_i,2. However, this will only work if the phase relationship between the reference clock of a PLL/DCM and the generated clock outputs can be adjusted, e.g. in general it may be required to adjust this phase offset to become 0 (or a value which is an integer multiple of the reference clock with the highest frequency).

Clock generation using one clock buffer per clock

(2) Generate only one clock signal, which has the highest frequency that is required. All other related clocks would be obtained by dividing this reference clock frequency by an integer multiple (as explained in 1) above). However, instead of dividing the reference clock, a clock enable signal CE_i is created, which is only active every D_i-th clock cycle. This clock enable signal serves as an enable for all flip-flops which would be located in the domain of the corresponding related clock C_i. This way a clock enable signal CE_i for each related clock C_i can be created.
So there is only one primary clock signal and all other “clocks” are logically represented by an enable signal which is only asserted every other clock cycle.
The tricky part then is to tell the timing analyzer tool to treat the clock enables correctly, so the place and route tool is aware of the timing requirements. Otherwise the design will be over-constrained, since all clock domains are treated as if they had the same frequency (which would be the maximum frequency). This would result in unnecessarily strict timing requirements for all the logic which would normally run at a lower frequency. Thus timing closure will be more difficult to achieve.

Clock generation with multiple clock enables using a single clock buffer

Summary

The decision to use multiple clocks over a single clock plus clock enables boils down to a resource trade-off.
On the one hand, multiple PLLs/DCMs and multiple (global) clock buffers are used to generate multiple related clocks. This requires more (global) clocking resources, but no additional fabric resources at all. Each clock domain is defined by a physical clock signal.
On the other hand, only one global clock with one global clock buffer is used. The different clock domains are logically defined by means of enable signals which are only asserted every other clock cycle. This approach requires more fabric resources (LUTs, CLBs, FFs) to generate and distribute the enable signal nets which define the various clock domains. On the other hand only a single clock must be generated which saves global clocking resources.
Bingo bango there you have it.

Realizing Arbitrary Functions with ROM-Based Lookup Tables

When tasked with the implementation of a rather complex function, e.g. a polynomial of higher order, the resource utilization quickly shoots through the roof if implemented straight forward (also called the naïve implementation).
To avoid this it is often easier, simpler and faster to use a lookup table (LUT) solution.

Instead of doing a lot of calculations and mathematics, the results for a given function argument is just read from a read-only memory (ROM) which contains precalculated results.
Oftentimes a LUT/ROM based implementation can be used in place of a “proper” implementation during early prototyping. In a later stage of the project the LUT/ROM can be replaced with an optimized implementation.

Arbitrary function realized using a LUT/ROM

Most people will encounter a ROM based lookup table solution when dealing with sine and cosine functions. The technique is then often called direct digital synthesis (DDS), because a waveform is generated by digital logic instead of analog circuitry, as it was done in ancient times.
Since this is a more or less common task I wrote an Octave script that takes a function as input and generates a memory initialization file for a LUT/ROM solution. The parameter range of interest can be specified and the width and depth of the LUT/ROM can be defined.
Periodic functions will likely result in problems if not handled carefully. E.g. a lot of people will specify the parameter range for a sine LUT/ROM go from 0 to 2*Pi. However, since the boundaries of the interval are always part of the LUT/ROM the start/end value of the period will appear twice, once at the highest LUT/ROM address and once at the lowest LUT/ROM address (because sin(0) = sin(2*Pi) = 0). This gotcha does not hold for non-periodic functions. It can also easily be fixed.

The quality of the result this technique yields depends on both the LUT/ROM’s memory depth and the LUT/ROM’s word width. The former defines the number of available sampling points and thus the quantization of the function parameter(s). The latter defines the quantization of the function result, which means how close a single LUT/ROM data value is to the exact result of the function. Both kinds of quantization contribute to the error of the LUT/ROM implementation, i.e. the deviation of the LUT/ROMs result from the precise function result value.

The total size of the LUT/ROM is memory depth * word width bits.
Another performance characteristic is the maximum operating frequency under which an implementation can run. On FPGAs the maximum operating frequency will depend on the number of BRAMs required to realize the LUT ROM. If only one BRAM is used there is not need for additional routing or coupling glue logic and thus the operating frequency will be maximized.
Using more than one BRAM requires “chaining”/”coupling” of BRAMs and thus will reduce the maximum operating frequency due to additional routing and/or logic delays.

LUT/ROM composed of multiple BRAMs

Interesting Read about IP-XACT

In case you are interested in FPGA/ASIC design and/or HDL coding you may find this blog entry about IP-XACT worthy of reading.

Simulation Advice

Here is some general advice for simulation of HDL code. No respect is paid to verification methodologies like UVM or OSVVM. Most of it is obvious, but it helps my memory when I write these things down.

  • Use assert statements to catch error events. Your eyes can miss even the most obvious error when scanning over some simulation waveforms after a long day in front of the screen.
  • Use log files and/or report statements to save information about the status and progress of simulation, errors or any other noteworthy event. This will speed up the task of locating events of interest and will allow you to do text searches over those files.
  • Use colors, the right radix for numbers and hierarchical structure in your waveform viewer for optimal data representation. Unless you prefer to look at heaps of green lines with loads of 0’s and 1’s around them.
  • Save your simulator/waveform settings. At some point you will come back and won’t have to repeat the tedious task of setting up a neat waveform view. The simulator/waveform settings should also go into the repository (separate folder for each simulator), but it’s a matter of taste.

 

Ternary operator for VHDL

On occasion it can be super useful to have the ternary operator ? : at hand. Many programming languages like C have it and – without proof – I dare claim that it is much appreciated by many coders out there.

In VHDL there is no such thing. Instead a VHDL developer must always bring out the big gun and apply our beloved if-then-else(-endif) statement.

if (fifo_empty = '1') then
    read_fifo <= '0';
else
    read_fifo <= '1';
end if;

However, it would be really nice to have a ternary short-hand operator available instead sometimes.

Here’s a surrogate for the ternary operator which works without changing the VHDL LRM. It is a simple function ite, which is short for if-then-else 🙂 The function does nothing more but wrap an if-then-else statement. How ingenious of me.

function ite(b: boolean; x, y: integer) return integer is
begin
    if (b) then
        return x;
    else
        return y;
    end if;
end function ite;

Of course this function will only work for integer arguments, but it can easily be overloaded for other data types (I smell an application for a VHDL-2008 type generic).

 

P.S.: Verilog has a ternary operator.

Page 1 of 2

© bananatronics.org