Implementing a True Random Number Generator on Xilinx 7 Series FPGAs

Intro

The Harmon Instruments platform uses a Xilinx Zynq (ARM Cortex A9 + FPGA) running Linux. Booting from a RAM disk and having a very deterministic boot process results in little entropy available network services are starting and generating encryption keys. There were numerous warnings like uninitialized urandom read in the boot logs and SSH logins were slow.

I wanted to use existing hardware to supply the Linux RNG. Possibilities included the Zynq's built in ADC with it's noisy temperature sensor, the embedded controller's built in ADC which measures power supply voltages or doing something in the FPGA fabric.

Doing it in the FPGA seemed the best and cleanest for the software architecture as it could be accessed via a memory mapped register.

There are several FPGA logic based TRNG papers. This one seemed the most promising.

Random number generation is not a field in which I am an expert. My assumptions may be flawed. I expect it is a big improvement over what entropy Linux had available in the 3 seconds between beginning to boot and starting network services.

Approach

TRNG block diagram

Implementing the programmable delay lines in the paper mentioned above resulted in minimal delay variation. Perhaps, the LUTs on 7 series have more balanced delays than the older parts. I opted to use the MMCME2 variable fine phase shift available in the Xilinx 7 Series FPGAs. Presumably, the fine phase shift is implemented similarly to the delay in the IDELAY which uses a chain of inverters with a variable supply voltage. The supply voltage is adjusted by a PLL to achieve the required propagation delay. The IDELAY is explained in XAPP707.

The MMCME2 fine phase shift resolution is about 17.8 picoseconds. There is perhaps 50 ps of phase variation due to the switching power supply on VCCINT in addition to the random jitter of the MMCME2.

The carry chain of an adder is used to get a series of closely spaced taps in time. One adder input is all 1's and the other is all 0's except the least significant bit (LSB) which is connected to a 100 MHz clock via the MMCME2. The capture flip flops are clocked with the same 100 MHz clock as is provided to the MMCM input. The MMCM phase is adjusted so that the capture flip flops catch the carry propagation at approximately the half way point. The average incremental carry delay though the adder is about 25 ps per bit.

Raw adder output looks like this:

000111111111
000001011111
000101011111
000001011111
000101011111
000000011111
000000011111
000001011111
000001011111
000001011111
000001011111
000101011111
000111111111
000101011111
000001011111
000111111111
000000011111
000001011111
000101111111
000001011111
000101111111
000001011111
000000011111
000000011111
000001011111
000001011111
000111111111
000001011111
000001011111
000101011111
000001011111
000101011111
000001011111
000001011111
000001011111
000000011111
000000011111
000111111111
000000011111
000001011111
000001011111
000001011111
000000011111
000111111111
000101011111
000000011111
000101011111
000000011111
000000011111
000101011111
000101011111
000001011111
000101111111
000111111111
000111111111
000111111111
000111111111
000111111111
000001011111
000001011111
000000011111
000000011111
000000011111
000000011111

Noise source

The noise being measured derives from many sources, some of which are random and others which are not. The biggest random source is clock jitter added by the MMCME2 which is mostly due to thermal noise. The biggest deterministic source is power supply ripple on VCCINT. There appears to be little to no metastability in the capture flip flops which is surprising.

Post Processing

The addition results are reduced to a single bit after the capture flip flops by XORing all of the bits together.

A second stage of post processing XORs the single bit result of 32 addition operations together. This gives a final data rate of 3.125 Mb/s.

Analysis: Dieharder

This one requires a lot of data. I captured 8 GiB which took over 6 hours and over 2 trillion operations of the timing skewed adder. All tests pass. The -Y 1 options tells it to try again until it passes or fails in the case of a result labeled 'weak'. It did that in one case. Output is too much to list.

$ dieharder -a -g 201 -f ~/vna/software/zynqsw/dump -Y 1
#=============================================================================#
#            dieharder version 3.31.1 Copyright 2003 Robert G. Brown          #
#=============================================================================#
   rng_name    |           filename             |rands/second|
 file_input_raw|/home/dlharmon/vna/software/zynqsw/dump|  3.97e+07  |
#=============================================================================#
        test_name   |ntup| tsamples |psamples|  p-value |Assessment
#=============================================================================#
   diehard_birthdays|   0|       100|     100|0.66926960|  PASSED
      diehard_operm5|   0|   1000000|     100|0.75113475|  PASSED
  diehard_rank_32x32|   0|     40000|     100|0.92027413|  PASSED

A second run showed that one passing and a few others as "weak". It seems to report at least one test as weak even with the internal PRNGs or /dev/urandom.

Calling this one is tricky: $ dieharder -a -g 201 -f $DUMPFILE. Without the -g 201 which tells it to use raw binary data in the file specified with -f, it uses an internal PRNG. With just -f, it silently uses an internal PRNG.

Analysis: ent

$ ent $DUMPFILE -b
Entropy = 1.000000 bits per bit.

Optimum compression would reduce the size
of this 68719476736 bit file by 0 percent.

Chi square distribution for 68719476736 samples is 3.29, and randomly
would exceed this value 6.99 percent of the times.

Arithmetic mean value of data bits is 0.5000 (0.5 = random).
Monte Carlo value for Pi is 3.141582468 (error 0.00 percent).
Serial correlation coefficient is 0.000001 (totally uncorrelated = 0.0).

And without the second stage of post processing:

Entropy = 0.994544 bits per bit.

Optimum compression would reduce the size
of this 1048576 bit file by 0 percent.

Chi square distribution for 1048576 samples is 7920.65, and randomly
would exceed this value less than 0.01 percent of the times.

Arithmetic mean value of data bits is 0.4565 (0.5 = random).
Monte Carlo value for Pi is 3.474296178 (error 10.59 percent).
Serial correlation coefficient is -0.001865 (totally uncorrelated = 0.0).

Analysis: rngtest

Some failures are expected with completely random data.

$ rngtest <$DUMPFILE
rngtest 2-unofficial-mt.14
Copyright (c) 2004 by Henrique de Moraes Holschuh
This is free software; see the source for copying conditions.  There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

rngtest: starting FIPS tests...
rngtest: entropy source exhausted!
rngtest: bits received from input: 68719476736
rngtest: FIPS 140-2 successes: 3433210
rngtest: FIPS 140-2 failures: 2763
rngtest: FIPS 140-2(2001-10-10) Monobit: 342
rngtest: FIPS 140-2(2001-10-10) Poker: 338
rngtest: FIPS 140-2(2001-10-10) Runs: 1064
rngtest: FIPS 140-2(2001-10-10) Long run: 1038
rngtest: FIPS 140-2(2001-10-10) Continuous run: 1
rngtest: input channel speed: (min=1.488; avg=8354.665; max=19073.486)Mibits/s
rngtest: FIPS tests speed: (min=1.364; avg=79.565; max=93.497)Mibits/s
rngtest: Program run time: 832147763 microseconds

Feeding /dev/random

Simply writing to /dev/random does not increase the systems entropy bit count. Details are on this man page. I set the bits of entropy to the bit count of the data written divided by 8 to be conservative.

// Copyright 2019 Harmon Instruments, LLC
// MIT license

#include <linux/random.h>
#include <sys/ioctl.h>

class DevRandom {
private:
        int _fd;

public:
        DevRandom() {
                _fd = open("/dev/random", O_RDWR);
                if (_fd < 0) {
                        throw std::runtime_error("failed to open /dev/random");
                }
        }
        ~DevRandom() { close(_fd); }
        // returns the Linux entropy pool count in bits
        int get_entropy_count() {
                int rv = 0;
                int iocrv = ioctl(_fd, RNDGETENTCNT, &rv);
                if (iocrv != 0) {
                        perror("get_entropy_count failed");
                        throw std::runtime_error("get_entropy_count failed");
                }
                return rv;
        }
        // bits is the number of bits of entropy e is assumed to contain
        void add_entropy(std::vector<uint32_t> &e, uint32_t bits) {
                std::vector<uint32_t> d;
                d.resize(e.size() + 2);
                d[0] = bits;                    // number of bits of entropy
                d[1] = e.size() * sizeof(e[0]); // number of bytes in buffer
                for (size_t i = 0; i < e.size(); i++) // buffer
                        d[2 + i] = e[i];
                int iocrv = ioctl(_fd, RNDADDENTROPY, &d[0]);
                if (iocrv != 0) {
                        perror("add_entropy failed");
                        throw std::runtime_error("add_entropy failed");
                }
        }
};