Linux Userspace IO Interrupts on Xilinx Zynq

Intro

It's now possible to use a complex peripheral with interrupts and DMA under Linux using UIO and the generic-uio driver rather than having to write a kernel module.

There was a kernel module for the Harmon Instruments VNA which was working well, but using UIO means that will not have to be maintained.

This example uses 3 system calls per interrupt due to limitations in generic-uio. With a kernel module, only 1 system call per interrupt would be required for this hardware. It might be desirable to add an ioctl() call to the UIO driver which simply blocks until the interrupt count is at least 1 more than the last call, leaves the interrupt enabled and has a timeout. In the Harmon Instruments VNA case, only about 50 interrupts per second are processed and the time per interrupt is about 20 microseconds, so less than 0.1 % of a CPU is being wasted.

Linux 4.9 is used in the example.

FPGA

The logic in the Zynq PL (FPGA side) presents a rising edge to the IRQ_F2P[0] port on the Zynq PS (CPU side) on completion of DMA operations. One IRQ_F2P interrupt is enabled of a possible 16. Vivado confusingly states "The MSB is assigned the highest interrupt ID of 91" and "[91:84], [68:61]" for the interrupt numbers. Here's a Xilinx Answer Record that implies it should be interrupt 91. After bit of greping the generated wrapper, I find there's a parameter, IRQ_F2P_MODE which is set to "DIRECT" by default in my case, but can be set to "REVERSE" in which case it would be interrupt 91. Following the logic in the generated wrapper and consulting the documentation, it is evident it is interrupt 61. Xilinx should have just brought out all 16 interrupts avoiding the interrupt mapping confusion of the wrapper.

Kernel Config

The UIO driver needs to be enabled in the kernel configuration. It is under Device Drivers -> Userspace I/O drivers in menuconfig.

CONFIG_UIO=y
CONFIG_UIO_PDRV_GENIRQ=y

There is also CONFIG_UIO_DMEM_GENIRQ if dynamically allocated DMA buffers are required. This example uses a static DMA buffer allocation at boot time.

Device Tree

chosen {
        bootargs = "console=ttyPS0,115200 earlyprintk root=/dev/ram mem=384M uio_pdrv_genirq.of_id=generic-uio";
        linux,stdout-path = "/amba/serial@e0000000";
        stdout-path = "serial0:115200n8";
};

&amba {
      hififo: hififo@40000000 {
                      compatible = "generic-uio";
                      interrupt-parent = <&intc>;
                      interrupts = <0 29 1>;
                      reg = <0x40000000 0x1000 0x18000000 0x8000000>;
              };
};

The "mem=384M" in bootargs tells the system to only use the lower 384 MiB of the 512 MiB of available RAM. A large contiguous DMA buffer is required and this is a convenient way to get it on an embedded system. The "uio_pdrv_genirq.of_id=generic-uio" is required to enable the generic-uio driver as explained in this Github discussion

Hififo is our device using the UIO driver. It has 4 kiB of registers at address 0x4000 (mmap index 0) and 128 MiB of DMA space at 0x18000000 (mmap index 1). Above, I stated our interrupt is 61, and here it is 29. There's an offset of 32 for a shared peripheral interrupt in the device tree. The 1 after the 29 indicates that it is a rising edge interrupt, triggering only once on the completion signal.

Userspace

This blog post was very helpful with the user space code. I chose to wrap the UIO syscalls in a C++ class to avoid "goto fail" like error handling.

#include <cstdint>
#include <cstddef>
class UIO {
private:
        int _fd;

public:
        explicit UIO(const char *fn);
        ~UIO();
        void unmask_interrupt();
        void wait_interrupt(int timeout_ms);
        friend class UIO_mmap;
};

class UIO_mmap {
private:
        size_t _size;
        void *_ptr;

public:
        UIO_mmap(const UIO &u, int index, size_t size);
        ~UIO_mmap();
        size_t size() const { return _size; }
        void *get_ptr() const { return _ptr; }
};
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <cstdlib>
#include <cstdint>
#include <poll.h>
#include <errno.h>
#include <stdexcept>

UIO::UIO(const char *fn) {
        _fd = open(fn, O_RDWR);
        if (_fd < 0)
               throw std::runtime_error("failed to open UIO device");
}

UIO::~UIO() { close(_fd); }

void UIO::unmask_interrupt() {
        uint32_t unmask = 1;
        ssize_t rv = write(_fd, &unmask, sizeof(unmask));
        if (rv != (ssize_t)sizeof(unmask)) {
                perror("UIO::unmask_interrupt()");
        }
}

void UIO::wait_interrupt(int timeout_ms) {
        // wait for the interrupt
        struct pollfd pfd = {.fd = _fd, .events = POLLIN};
        int rv = poll(&pfd, 1, timeout_ms);
        // clear the interrupt
        if (rv >= 1) {
               uint32_t info;
               read(_fd, &info, sizeof(info));
        } else if (rv == 0) {
               // this indicates a timeout, will be caught by device busy flag
        } else {
               perror("UIO::wait_interrupt()");
        }
}

UIO_mmap::UIO_mmap(const UIO &u, int index, size_t size) : _size(size) {
        _ptr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED,
                u._fd, index * getpagesize());
        if (_ptr == MAP_FAILED) {
                perror("UIO_mmap");
                std::runtime_error("UIO_mmap construction failed");
        }
}

UIO_mmap::~UIO_mmap() { munmap(_ptr, _size); }

A struct with std::atomic members is a convenient way to represent the device registers. This provides memory barriers to ensure proper ordering of reads and writes.

#include <cstdint>
#include <atomic>
struct FPGAregs {
        std::atomic<uint32_t> interrupt_status;
        std::atomic<uint32_t> busy;
        std::atomic<uint32_t> trigger_dma_ops;
        std::atomic<uint32_t> led;
};

This is a simplified example of usage:

UIO uio("/dev/uio0");
UIO_mmap mmap_regs(uio, 0, 4096);
UIO_mmap mmap_dma(uio, 1, 128*1024*1024);

// access the hardware registers
FPGAregs *regs = reinterpret_cast<FPGAregs *>(mmap_regs.get_ptr());
regs->led = 1;

// DMA operations with interrupt
// not shown: write outgoing data to the DMA buffer
uio.unmask_interrupt(); // we need to unmask first so we don't miss the interrupt
regs->trigger_dma_ops = 1; // trigger our hardware, creating an interrupt on completion
uio.wait_interrupt(1000);
// check the registers to verify operation completed and access result in the DMA buffers
if(regs->busy)
       ...

OSHPark 4 Layer PCB Measurements

Intro

OSHPark PCBs are convenient for various RF test boards at a low cost given the relatively low loss Isola FR408 material. Trying an idea is much more appealing when the cost is $50 for the board and parts versus $1000+ for a custom RF material stack up prototype. As you will see there is some variation in properties over the board. These are built with prepreg for outer layers where RF and microwave PCBs typically have cores for outer layers for consistent dielectric thickness. These have been used succesfully at Harmon Instruments to beyond 50 GHz. Loss is extremely high at those frequencies, so lines must be kept short. There is a bypassable frequency doubler in the VNA that measured this data on an OSHPark PCB, providing stimulus from 26.5 GHz to 50 GHz.

Thanks to OSHPark for providing this board and the previous revisions at no charge.

The test board

the test PCB

the test PCB

The first revision used line widths calculated from the values on the OSHPark site. Impedances were consistenly high (56 ohms typical) over two orders of the board. This one uses the assumption of 0.19 mm thick prepreg with an Er of 3.3 (versus 0.17 mm and Er of 3.66) given in the OSHPark docs. See the previous blog post for the Er of just the material. The missing corner of this board was the sample used there. The prepreg in the cross section measures approximately 0.177 mm, but likely varies throughout the board. The total thickness varies about 25 µm over a board.

Microstrip

Time Domain

Time domain of 53 mm unmasked microstrip

0.42 mm wide lines are used for 50 ohm microstrip on this board. As can be seen in the plot above, the impedanace is, on average, a bit low. Perhaps, 0.41 mm would be a good design value. PCB #1 had the most variation and the impedance correlates with PCB thickness variation along the line. There is a short thick area around the high impedance peak. The other two show a thickness taper along the length. I repeated the measurement of PCB #1 after #2 and #3 and got data nearly indistinguishable from the original measurement.

Time domain of 53 mm masked microstrip

Using the same 0.42 mm width, but adding solder mask, gives a lower impedance of around 47 ohms. A good design value to achieve 50 ohms might be 0.39 mm. Covering microstrip in solder mask does save some insertion loss, but the difference is minimal, so unmasked may still be preferred.

Insertion loss (no solder mask)

Microstrip insertion loss plotMicrostrip insertion loss plot

Insertion loss (with solder mask)

Microstrip insertion loss plotMicrostrip insertion loss plot

Effective Relative Permittivity (Er)

Effective Er is simply the square of the speed of light divided by the propagation velocity in the line.

Microstrip Er plotMicrostrip Er plotMicrostrip Er plot

The unmasked microstrip should have an effective Er of around 2.5 given the assumptions about Er and the line geometry. It measures much higher. I presume this is due to metal loss and surface texture, but am not certain of it. Comments are welcome. I measure ErEff=1.002 for 3.5 mm air lines at 20 GHz using the same differential phase calculation. EM simulation shows that a textured conductor has slower propagation than a smooth one and this is rather rough copper. The plot below shows time domain step responses of both 53 mm lines with the masked line reaching 50% at 309.4 ps and the unmasked at 296.4 ps. Those correspond to effective Er values of 3.07 and 2.81. The effect of the mask slowing propagation is clearly visible. The stimulus rise time (10-90%), (set by the window function) is about 14 ps, masked is 25 ps and unmasked is 36 ps.

time domain

Stripline

The striplines on this board are 0.3 mm wide, on inner2, with copper on inner1 and the bottom as ground references. It's very close to 50 ohms. The vias need a bit more clearance (0.2 mm used on this board) as can be seen by the low impedance spikes in the time domain plot.

Time Domain

Stripline time domain plot

The 2.4/1.85 dip is the discontinuity resulting from the connection of a 2.4 mm and 1.85 mm connector. A similar discontinuity results from connecting a 2.92 to a 3.5 mm connector. The Harmon Instruments VNA supports removal of these discontinuities, but it was not enabled here. The via dips are the vias transitioning from microstrip to stripline.

Insertion Loss

Stripline insertion loss plot

The spikes at ~ 2 GHz intervals are artifacts of the LRL on PCB calibration (frequencies where the line phase difference is 180 degrees) and should be ignored. More lines could have been added to avoid these.

Effective Er

Stripline Er

The Er of the core is about 4.0 and the prepreg 3.3. Electromagnetic simulation without metal loss gives an effective Er of 3.55 for this geometry. Like the microstrip case, these values are higher than expected. The values are higher at low frequencies primarily due to metal losses. Stripline is often stated to be free of dispersion, but that assumes lossless materials.

Components

50 ohm series resistor time domain

This one is simply a 49.9 ohm 0402 resistor in series with 1 mm of 50 ohm line on either side and was done as a check on the impedance calibration. The result is as expected.

Murata BLM15GG471 ferrite bead s11 vs current

Murata BLM15HD601 ferrite bead s11 vs current

I'd previously measured these two ferrite beads with no DC bias, but had questioned how much that would change the inductance. The effect is significant at low frequency and small at higher frequencies. It's interesting that the order of the traces reverses above the self resonant frequency as the decreasing shunt inductance reduces the effective capacitance. These ferrite beads are mostly capacitive above the parallel resonance. The VNA bias tees contribute only a negligible amount (0.002 dB insertion loss increase at 200 mA) to this data.

Murata BLM15GG471

Murata BLM15HD601

These parts are both capacitive at higher frequencies and the above performance could be improved by adding short high impedance sections to either side of the ferrite bead. About 120 pH on either side should give the BLM15GG471 20 dB return loss to 30 GHz.

Measuring Material Properties in Waveguide

I saw Dielectric constant measurement for thin material at microwave frequencies and wanted to give it a try. This was done in WR-42 waveguide from 18 to 26.5 GHz. Samples were cut to 0.17 x 0.42 inches to fit the waveguide.

Relative Permittivity (Er)

Er of samples

Material Thickness Er (measured) Er (ref)
Polypropylene (pp) 736 μm 2.14 2.26 at 9.4 GHz
Polycarbonate (pc) 583 μm 2.7 2.77 at 11 GHz
Rogers RO4350B 509 μm 3.59 3.48/3.66 at 10 GHz
OSHPark 4 layer prepreg 205 μm 3.3 3.26 - 4.01
OSHPark 4 layer core 531 μm 4.02 3.26 - 4.01
GE Silicone II (white) 1.875 mm 2.64 unspecified
OSHPark PCB

The OSHPark 4 layer PCBs are made from Isola FR408. As shown in this document, the thicker cores and prepregs have higher glass content and thus higher Er. The 47 mil thick core is not listed in the above PDF, but would presumably be the heaviest glass and have an Er of around 4.0. The prepreg used for the outer layers is a fine glass weave with high resin content, so the measured value of 3.3 seems reasonable. The prepreg sample is very thin and has some taper, so that value may be the least reliable of the group. The core and prepreg samples were obtained by mechanically delaminating a finished PCB.

RO4350

The RO4350B has a 10 GHz dielectric constant of 3.48 +- 0.05 using a clamped stripline method that tends to underestimate Er on hard materials like RO4350B. The recommended value for design is 3.66, so this measured value of 3.59 seems reasonable.

Silicone II

This is white caulk from a hardware store which I had purchased for my bathtub. The data on this one is questionable as it was hard to cut a precise block of this soft material. A silicone might be interesting as a directional coupler overlay.

Loss Tangent

These values are less reliable and noisy as the dissipation loss of the samples is on the order of 0.015 dB. The polypropylene has such low loss that errors are causing it to appear to have a negative loss tangent. The waveguide shim used as a sample holder was bare brass which certainly didn't help repeatability. The spikes in the silicone and osh_core traces are likely due to resonance within the sample as the section of guide containing the sample can support higher order modes.

loss tangent of samples

Approximation not valid

Plot of
  differing results with approximate solution and exact solution

The plot above shows the amount of error introduced by the assumption in the paper that the samples are thin. The data with _est uses the approximations for small values, \(\sin(kz_2\tau) \approx kz_2\tau\). With the smaller guide and higher frequencies used here versus the original paper, the assumption is not valid with reasonable thickness samples.

The value of Er is solved for using SciPy optimize.minimize. It's a bit ugly as it doesn't directly handle complex numbers, but gets the job done.

Future work

Try to do something similar in 7 mm coax.

Code

The SParameter library (used for opening .s2p files, wgsection, S parameter concatenation) has not yet been released, so this is only useful for reference. The library is in C++ with SWIG wrappers and will be released as open source (GPLv3) when it is ready.

#!/usr/bin/env python3
import numpy as np
import matplotlib.pyplot as plt
import sys
sys.path.append("../../")
import SParameter as scpp
from scipy.constants import c, inch
import scipy.optimize

a = 0.42 * inch # wide waveguide dimension in meters

samples = [ # file name, thickness in meters
    ['pc.s2p', 583.0e-6],
    ['pp.s2p', 736.0e-6],
    ['ro4350b.s2p', 509e-6],
    ['osh_prepreg.s2p', 205e-6],
    ['osh_core.s2p', 531e-6],
    ['silicone.s2p', 1875e-6],
]

for sample in samples:
    fn = sample[0]
    t = sample[1] # thickness
    sp = scpp.SParameter(fn)
    sp.reducefreqrange(50e6, 18e9, 26.5e9)
    # add an air filled section of guide the length of the sample
    te = scpp.wgsection(sp, a, t)
    sp = scpp.SParameter(te, sp)
    s = np.array(sp.s)
    s21 = 0.5 * (s[1::4] + s[2::4]) # average of s21, s12
    f = np.array(sp.f)
    e = np.zeros(len(f), dtype=complex)

    for i in range(len(f)):
        wavelength = c / f[i] # free space wavelength
        kz = (np.pi/(wavelength*a)) * np.sqrt( 4.0*np.square(a)-np.square(wavelength))

        def func(z):
            c = s21[i] - 1.0 / (np.cos(z) + 0.5j*kz*t*(1+np.square(z/(kz*t))) * np.sin(z)/(z))
            return c.real*c.real + c.imag*c.imag

        b0 = [a.real, a.imag] # estimate, assumes Er = 1.0
        res = scipy.optimize.minimize(lambda z: func(complex(z[0],z[1])), b0, tol=1e-6)
        kz2 = complex(res.x[0], res.x[1])/t
        e[i] = (np.square(kz2 * wavelength * a / np.pi) + np.square(wavelength))/(4*a*a)

    if sys.argv[1] != 'loss':
        plt.plot(f/1e9,np.real(e), label=fn)
        plt.ylabel('Er')
    else:
        losstan = -np.imag(e)/np.real(e)
        plt.plot(f/1e9,losstan, label=fn)
        plt.ylabel('loss tangent')

plt.xlabel('Frequency (GHz)')
plt.legend(loc=4)
plt.grid(True)
plt.show()

Unit Testing VNA Calibration Code

The Problem

There are an alphabet soup of VNA calibration types (SOLT, TRL, LRL, TRM, SOLR, QSOLT, many more), some of which will be supported by the Harmon Instruments VNA and run by the end user. Testing all of the supported calibration types through the normal user interface and measuring verification components for each could easily take an entire day. Using physical calibration standards, uncertainties would be limited to those of the standards. An additional concern with testing on hardware is the wear and tear on the fragile and expensive calibration standards.

Photo of VNA Calibration Standards

Breaking it Down

The data path from ADC values to corrected S-parameters for display is split into layers with independent test coverage. The lowest level, implemented in an FPGA, taking signals from the ADCs has test coverage using the excellent CocoTB package. The next layer, written in C++, takes integer vector voltage data from the FPGA and produces raw S-parameters. These first two layers are relatively simple mathematical operations. The final layer, correcting the raw S-parameters, is where the complexity lies and is what will be discussed here.

Modeling the VNA and Calibration Standards

For simplicity, we will focus on a 2 port VNA calibrations using an 8 term error model (TRL, SOLR, similar). This model contains a fictitious 2 port "error adapter" to either side of the device under test (DUT). These represent internal leakages, losses, gain imbalances, etc as well as the test port cables. The raw S-parameters are simply those of the DUT in cascade with the two error adapters. In a calibration, sufficient standards are measured to solve for the error adapters.

Knowing the contents of the error adaptors, it is possible to run the calibration backwards and calculate the raw S-parameters from the true S-parameters of the devices being measured. Realistic models of the calibration standards are used with airlines, opens and shorts having loss. Simulated raw data is genereated for the required set of standards as well as a few challenging DUTs and the data is sent throught the calibration acquisition and correction software. The corrected data is compared with the originals and verified to be within reasonable rounding error. The tests on the calibration algorithms complete in less than 1 second and run every time the software is built causing the build to fail if any test does.

For the purposes of testing the calibration software, the VNA is presumed to be linear and time invariant. In the future, for uncertainty analysis, it may be desirable to add drift and nonlinearity based on characterization of real hardware.

Offline Processing of Acquired Data

By saving raw data for many types of calibration and verification standards, it is possible to generate corrected S-parameters using any calibration type using those standards. QSOLT had not yet been coded when this data set was aquired, but it is possible to generate QSOLT corrected data as all of the required standards are present in the data dumps. This is mostly useful to see how the algorithms compare with imperfect standards.

Plot of
  Maury Microwave 60 mm airline S22 with QSOLT, TRL, TRM and data
  based QSOLT calibrataions

The above plot shows significant differences between calibrations assuming the load standards are ideal (QSOLT, TRM), TRL which uses airlines rather than loads and a data based QSOLT calibration which uses more accurate models of the short, open and load. These differences are not due to software, but the physical standards and their definitions.

Implementation

The S-parameter handling code (C++) on the Harmon Instruments VNA has a SWIG generated Python wrapper and the Python unit testing framework is used to run the tests. Google Test would have been another option. Python works well here due to libraries like NumPy, SciPy and Matplotlib providing mathematical functions and plotting. For instance, to test a C++ implementation of the Kaiser window, I added a test to compare it with one generated by SciPy.