Linux Userspace IO Interrupts on Xilinx Zynq

Intro

It's now possible to use a complex peripheral with interrupts and DMA under Linux using UIO and the generic-uio driver rather than having to write a kernel module.

There was a kernel module for the Harmon Instruments VNA which was working well, but using UIO means that will not have to be maintained.

This example uses 3 system calls per interrupt due to limitations in generic-uio. With a kernel module, only 1 system call per interrupt would be required for this hardware. It might be desirable to add an ioctl() call to the UIO driver which simply blocks until the interrupt count is at least 1 more than the last call, leaves the interrupt enabled and has a timeout. In the Harmon Instruments VNA case, only about 50 interrupts per second are processed and the time per interrupt is about 20 microseconds, so less than 0.1 % of a CPU is being wasted.

Linux 4.9 is used in the example.

FPGA

The logic in the Zynq PL (FPGA side) presents a rising edge to the IRQ_F2P[0] port on the Zynq PS (CPU side) on completion of DMA operations. One IRQ_F2P interrupt is enabled of a possible 16. Vivado confusingly states "The MSB is assigned the highest interrupt ID of 91" and "[91:84], [68:61]" for the interrupt numbers. Here's a Xilinx Answer Record that implies it should be interrupt 91. After bit of greping the generated wrapper, I find there's a parameter, IRQ_F2P_MODE which is set to "DIRECT" by default in my case, but can be set to "REVERSE" in which case it would be interrupt 91. Following the logic in the generated wrapper and consulting the documentation, it is evident it is interrupt 61. Xilinx should have just brought out all 16 interrupts avoiding the interrupt mapping confusion of the wrapper.

Kernel Config

The UIO driver needs to be enabled in the kernel configuration. It is under Device Drivers -> Userspace I/O drivers in menuconfig.

CONFIG_UIO=y
CONFIG_UIO_PDRV_GENIRQ=y

There is also CONFIG_UIO_DMEM_GENIRQ if dynamically allocated DMA buffers are required. This example uses a static DMA buffer allocation at boot time.

Device Tree

chosen {
        bootargs = "console=ttyPS0,115200 earlyprintk root=/dev/ram mem=384M uio_pdrv_genirq.of_id=generic-uio";
        linux,stdout-path = "/amba/serial@e0000000";
        stdout-path = "serial0:115200n8";
};

&amba {
      hififo: hififo@40000000 {
                      compatible = "generic-uio";
                      interrupt-parent = <&intc>;
                      interrupts = <0 29 1>;
                      reg = <0x40000000 0x1000 0x18000000 0x8000000>;
              };
};

The "mem=384M" in bootargs tells the system to only use the lower 384 MiB of the 512 MiB of available RAM. A large contiguous DMA buffer is required and this is a convenient way to get it on an embedded system. The "uio_pdrv_genirq.of_id=generic-uio" is required to enable the generic-uio driver as explained in this Github discussion

Hififo is our device using the UIO driver. It has 4 kiB of registers at address 0x4000 (mmap index 0) and 128 MiB of DMA space at 0x18000000 (mmap index 1). Above, I stated our interrupt is 61, and here it is 29. There's an offset of 32 for a shared peripheral interrupt in the device tree. The 1 after the 29 indicates that it is a rising edge interrupt, triggering only once on the completion signal.

Userspace

This blog post was very helpful with the user space code. I chose to wrap the UIO syscalls in a C++ class to avoid "goto fail" like error handling.

#include <cstdint>
#include <cstddef>
class UIO {
private:
        int _fd;

public:
        explicit UIO(const char *fn);
        ~UIO();
        void unmask_interrupt();
        void wait_interrupt(int timeout_ms);
        friend class UIO_mmap;
};

class UIO_mmap {
private:
        size_t _size;
        void *_ptr;

public:
        UIO_mmap(const UIO &u, int index, size_t size);
        ~UIO_mmap();
        size_t size() const { return _size; }
        void *get_ptr() const { return _ptr; }
};
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <cstdlib>
#include <cstdint>
#include <poll.h>
#include <errno.h>
#include <stdexcept>

UIO::UIO(const char *fn) {
        _fd = open(fn, O_RDWR);
        if (_fd < 0)
               throw std::runtime_error("failed to open UIO device");
}

UIO::~UIO() { close(_fd); }

void UIO::unmask_interrupt() {
        uint32_t unmask = 1;
        ssize_t rv = write(_fd, &unmask, sizeof(unmask));
        if (rv != (ssize_t)sizeof(unmask)) {
                perror("UIO::unmask_interrupt()");
        }
}

void UIO::wait_interrupt(int timeout_ms) {
        // wait for the interrupt
        struct pollfd pfd = {.fd = _fd, .events = POLLIN};
        int rv = poll(&pfd, 1, timeout_ms);
        // clear the interrupt
        if (rv >= 1) {
               uint32_t info;
               read(_fd, &info, sizeof(info));
        } else if (rv == 0) {
               // this indicates a timeout, will be caught by device busy flag
        } else {
               perror("UIO::wait_interrupt()");
        }
}

UIO_mmap::UIO_mmap(const UIO &u, int index, size_t size) : _size(size) {
        _ptr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED,
                u._fd, index * getpagesize());
        if (_ptr == MAP_FAILED) {
                perror("UIO_mmap");
                std::runtime_error("UIO_mmap construction failed");
        }
}

UIO_mmap::~UIO_mmap() { munmap(_ptr, _size); }

A struct with std::atomic members is a convenient way to represent the device registers. This provides memory barriers to ensure proper ordering of reads and writes.

#include <cstdint>
#include <atomic>
struct FPGAregs {
        std::atomic<uint32_t> interrupt_status;
        std::atomic<uint32_t> busy;
        std::atomic<uint32_t> trigger_dma_ops;
        std::atomic<uint32_t> led;
};

This is a simplified example of usage:

UIO uio("/dev/uio0");
UIO_mmap mmap_regs(uio, 0, 4096);
UIO_mmap mmap_dma(uio, 1, 128*1024*1024);

// access the hardware registers
FPGAregs *regs = reinterpret_cast<FPGAregs *>(mmap_regs.get_ptr());
regs->led = 1;

// DMA operations with interrupt
// not shown: write outgoing data to the DMA buffer
uio.unmask_interrupt(); // we need to unmask first so we don't miss the interrupt
regs->trigger_dma_ops = 1; // trigger our hardware, creating an interrupt on completion
uio.wait_interrupt(1000);
// check the registers to verify operation completed and access result in the DMA buffers
if(regs->busy)
       ...