← Back to Blog

A/B OTA Updates That Actually Work: RAUC on systemd-boot

·12 min read
YoctoRAUCOTA UpdatesEmbedded LinuxSystemd-BootA/B Updates

Embedded devices in the field need updates. A security patch, a kernel bug fix, a new feature — without OTA (over-the-air) updates, every change requires physical access. But OTA updates carry risk: a power loss during the write, a corrupted bundle, a kernel that panics on the new rootfs. Without a fallback mechanism, a failed update means a bricked device 1.

RAUC (Robust Auto-Update Controller) is the go-to OTA framework for embedded Linux 2. It provides signed update bundles, A/B redundant root filesystems, automatic rollback, and a well-defined bootloader interface. But when I tried to integrate RAUC with real hardware, I hit problems the documentation did not prepare me for.

This article walks through what I built — an A/B update system on x86-64 UEFI hardware with systemd-boot. It focuses on the problems I actually ran into: EFI boot variables that did not work, kernel and rootfs that drifted apart, and state that I expected to survive an update but did not.


The A/B layout

An A/B update system maintains two complete copies of the root filesystem. One slot is active (currently booted); the other is inactive. An update writes the new image to the inactive slot, marks it as the primary boot target, and reboots. If the new slot boots successfully, it becomes the active slot. If it fails to boot, the bootloader falls back to the previous slot 3.

The canonical partition layout:

[ESP 256MB] [rootfs-a 4GB] [rootfs-b 4GB] [data ~remaining]
  • ESP (EFI System Partition): holds the kernel, systemd-boot loader, and per-slot boot entries.
  • rootfs-a / rootfs-b: two identical root filesystem partitions. Only one is active at a time.
  • data: persistent storage that survives rootfs updates (SSH keys, VPN state, application data, RAUC slot state).

The data partition turned out to be critical. Without it, every OTA update wiped my SSH host keys, regenerated them, and forced me to re-accept on the client side. Tailscale state, Docker volumes, and anything that needs to survive an update must live on the data partition 4.


Why EFI boot variables fail on real hardware

RAUC's EFI bootloader backend uses efibootmgr to set BootNext EFI variables that tell the firmware which slot to boot. This works on development boards, in QEMU, and on most consumer PCs.

It fails on many industrial and embedded x86-64 platforms. The Silicom Cordoba BIOS, and others like it, ignores EFI NVRAM boot variables entirely. The firmware has a fixed boot order, and no amount of efibootmgr manipulation changes which slot boots 5.

The symptom is maddening: rauc status shows the correct slot marked as primary, efibootmgr shows the correct BootNext, and the device still boots the old slot after an update. The BIOS simply does not read BootNext.

My fix was to abandon the EFI bootloader backend entirely and write a custom bootloader handler for systemd-boot. Instead of EFI variables, slot selection is controlled by loader entry files on the ESP.


Custom bootloader handler with systemd-boot

systemd-boot reads loader entries from the ESP at /loader/entries/ 6. Each entry is a simple text file that specifies the kernel, initrd, options, and a title:

title   Slot A
linux   /bzImage-A
options root=PARTUUID=<uuid-a> rootwait console=ttyS0,115200
title   Slot B
linux   /bzImage-B
options root=PARTUUID=<uuid-b> rootwait console=ttyS0,115200

The active slot is determined by loader.conf on the ESP:

default A.conf
timeout 3

To switch slots, the custom handler script modifies loader.conf to point to the other entry:

#!/bin/sh
# rauc-boot-handler.sh — custom bootloader backend for systemd-boot
 
ESP_MOUNT="/boot"
LOADER_CONF="${ESP_MOUNT}/loader/loader.conf"
 
case "$1" in
    get-primary)
        # Extract the default entry from loader.conf
        grep "^default" "$LOADER_CONF" | awk '{print $2}' | sed 's/-.*//'
        ;;
 
    set-primary)
        # Switch default entry to the new slot
        NEW_SLOT="$2"  # "A" or "B"
        sed -i "s/^default .*/default ${NEW_SLOT}.conf/" "$LOADER_CONF"
        ;;
 
    get-state)
        # Read slot state from persistent storage
        SLOT="$2"
        STATE_FILE="/data/rauc/slot-state/${SLOT}"
        if [ -f "$STATE_FILE" ]; then
            cat "$STATE_FILE"
        else
            echo "good"
        fi
        ;;
 
    set-state)
        # Write slot state to persistent storage
        SLOT="$2"
        STATE="$3"  # "good" or "bad"
        mkdir -p /data/rauc/slot-state
        echo "$STATE" > "/data/rauc/slot-state/${SLOT}"
        ;;
esac

Slot state is persisted to /data so it survives rootfs overwrites. The state directory is created on first boot if it does not exist.

The RAUC system.conf references the custom handler. The compatible string must match across the bootloader, system.conf, and update bundles, or RAUC will reject the update 15:

[system]
compatible=My Distro
bootloader=custom
 
[handlers]
bootloader-custom-backend=/usr/lib/rauc/rauc-boot-handler.sh
 
[slot.rootfs.0]
device=/dev/sda2
type=ext4
bootname=A
 
[slot.rootfs.1]
device=/dev/sda3
type=ext4
bootname=B

The bootname field maps to the slot letter used in the handler script and the systemd-boot loader entry titles 7.


Per-slot kernels on the ESP

A common early failure mode: the kernel is stored on the rootfs, the update replaces the rootfs, and the new kernel panics because its modules are on the new rootfs but the old kernel does not know about them. Or the reverse: the bootloader loads the new kernel, but the system boots the old rootfs because the kernel command line still points to the old partition.

What worked for me was storing per-slot kernels on the ESP 8. Each slot's loader entry references a separate kernel image:

/boot/bzImage-A    # Kernel for slot A
/boot/bzImage-B    # Kernel for slot B
/loader/entries/A.conf   # References bzImage-A + rootfs-a
/loader/entries/B.conf   # References bzImage-B + rootfs-b

During an update, the RAUC boot handler copies the kernel from the newly installed rootfs to the ESP during the set-primary hook:

set-primary)
    NEW_SLOT="$2"
    # Determine which partition corresponds to this slot
    if [ "$NEW_SLOT" = "A" ]; then
        ROOTFS_DEV="/dev/sda2"
    else
        ROOTFS_DEV="/dev/sda3"
    fi
 
    # Mount the new rootfs read-only
    mount -o ro "$ROOTFS_DEV" /mnt/new-rootfs
 
    # Copy kernel to ESP atomically
    cp /mnt/new-rootfs/usr/lib/kernel/bzImage "${ESP_MOUNT}/bzImage-${NEW_SLOT}.tmp"
    mv "${ESP_MOUNT}/bzImage-${NEW_SLOT}.tmp" "${ESP_MOUNT}/bzImage-${NEW_SLOT}"
    sync
 
    umount /mnt/new-rootfs
 
    # Switch the default entry
    sed -i "s/^default .*/default ${NEW_SLOT}.conf/" "$LOADER_CONF"
    ;;

The write-then-rename pattern (copy to .tmp, then mv) is atomic on the ESP's vfat filesystem. A partial write leaves a .tmp file that the bootloader ignores; only the completed mv exposes the new kernel. Without this, a power loss during the copy produces a truncated kernel that panics at boot 9.

The kernel image is stored on the rootfs at /usr/lib/kernel/bzImage by a custom Yocto recipe that installs it during image creation:

# kernel-image-rootfs.bb
# This recipe depends on the kernel being built first.
# In production, use DEPENDS = "virtual/kernel" and reference
# ${STAGING_KERNEL_DIR} rather than ${DEPLOY_DIR_IMAGE}.
do_install() {
    install -d ${D}/usr/lib/kernel
    install -m 0644 ${DEPLOY_DIR_IMAGE}/bzImage ${D}/usr/lib/kernel/bzImage
}

Post-boot kernel verification and automatic rollback

After an update, the device boots the new slot. If the kernel is corrupt, or the rootfs packages are incompatible, the device might boot but enter a degraded state. RAUC's built-in boot success detection only checks that the system reached multi-user target — not that the kernel matches the rootfs.

A post-boot verification service adds a stronger check 10:

# rauc-verify-boot.service
[Unit]
Description=Verify RAUC boot integrity
After=local-fs.target
Before=rauc.service
 
[Service]
Type=oneshot
ExecStart=/usr/lib/rauc/verify-boot.sh
RemainAfterExit=no
 
[Install]
WantedBy=multi-user.target

The verification script compares the running kernel against the kernel on the rootfs:

#!/bin/sh
# verify-boot.sh — verify the booted kernel matches the rootfs kernel
 
# Portable alternative to grep -oP (Busybox grep does not support -P)
SLOT=$(grep -o 'root=PARTUUID=[^ ]*' /proc/cmdline | cut -d= -f3)
 
# Determine which slot we are booting
if echo "$SLOT" | grep -q "uuid-a"; then
    CURRENT_SLOT="A"
else
    CURRENT_SLOT="B"
fi
 
# Hash the running kernel and the rootfs kernel
ESP_HASH=$(sha256sum "/boot/bzImage-${CURRENT_SLOT}" | awk '{print $1}')
ROOTFS_HASH=$(sha256sum /usr/lib/kernel/bzImage | awk '{print $1}')
 
if [ "$ESP_HASH" != "$ROOTFS_HASH" ]; then
    echo "Kernel mismatch between ESP and rootfs for slot ${CURRENT_SLOT}"
    echo "Marking slot as bad to trigger rollback"
    rauc status mark-bad "$CURRENT_SLOT"
    exit 1
fi
 
echo "Kernel verification passed for slot ${CURRENT_SLOT}"

If the hashes do not match, the script marks the slot as bad. On the next reboot, RAUC detects the bad slot and falls back to the other one 11.


Persistent state: what must survive an update

The rootfs is completely replaced during an OTA update. Anything on the rootfs that is not on the data partition is lost. The following must be moved to /data and symlinked or bound-mounted:

DataWhy it must survive
SSH host keys (/etc/ssh/ssh_host_*)Changing host keys breaks client trust
Tailscale state (/var/lib/tailscale/)Re-authenticating to the tailnet on every update is a denial-of-service risk
Docker volumes (/var/lib/docker/volumes/)Application data stored in volumes
RAUC slot stateSlot health tracking must persist across rootfs swaps

A data-persistence service runs on every boot, before SSH and Tailscale start 12:

[Unit]
Description=Ensure persistent data symlinks
DefaultDependencies=no
Before=sshd.service tailscaled.service
 
[Service]
Type=oneshot
ExecStart=/usr/lib/systemd/scripts/data-persistence.sh
RemainAfterExit=yes
 
[Install]
WantedBy=sysinit.target

The script migrates data on first boot and maintains symlinks on subsequent boots:

#!/bin/sh
# data-persistence.sh
 
# Migrate SSH host keys to /data on first boot
if [ ! -d /data/ssh ]; then
    mkdir -p /data/ssh
    if [ -d /etc/ssh ]; then
        cp -a /etc/ssh/ssh_host_* /data/ssh/ 2>/dev/null || true
    fi
fi
 
# Ensure symlinks point to /data.
# Only symlink if files actually exist (empty glob on first boot
# would create a broken /data/ssh/ssh_host_* symlink literal).
rm -f /etc/ssh/ssh_host_*
for key in /data/ssh/ssh_host_*; do
    [ -f "$key" ] && ln -sf "$key" /etc/ssh/
done
 
# Tailscale state
if [ ! -d /data/tailscale ]; then
    mkdir -p /data/tailscale
    if [ -d /var/lib/tailscale ]; then
        cp -a /var/lib/tailscale/* /data/tailscale/ 2>/dev/null || true
    fi
fi
rm -rf /var/lib/tailscale
ln -sf /data/tailscale /var/lib/tailscale

The service runs before the network stack, so the migrated keys and state are in place before SSH and Tailscale start.


RAUC bundle creation

RAUC update bundles are created on the build host using the rauc bundle command. The bundle contains the new rootfs image, a manifest, and a cryptographic signature 13:

```bash
# RAUC bundle syntax varies by version. Check your version with rauc bundle --help.
# This example uses the --cert/--key flags (RAUC 1.x):
rauc bundle \
  --cert=/path/to/development-1.cert.pem \
  --key=/path/to/development-1.key.pem \
  /path/to/rootfs.ext4 \
  update-bundle.raucb

Never use rauc install --no-verify on a device that already has a production keyring configured — it allows installing unsigned bundles and bypasses the signature check entirely.

The manifest describes the compatibility string and the image:

[update]
compatible=My Distro
version=2.0.4
 
[image.rootfs]
filename=rootfs.ext4

In production, tie compatible to your hardware revision rather than just the software distro name (e.g., compatible=myco-appliance-1.0). This prevents installing a bundle built for a different hardware SKU 14.

RAUC verifies the bundle signature on the device before installation. Production devices use a production key pair that is never distributed with the development toolchain. The CA certificate on the device trusts both the development and production signing keys, allowing development bundles on test devices but preventing them on production units with a properly configured keyring 14.


The full update flow

Putting it all together:

  1. Build: Yocto produces a production rootfs image and a RAUC update bundle (.raucb).

  2. Deploy: The bundle is uploaded to the device (via SCP, HTTP streaming, or a deployment server like hawkBit 16).

  3. Install: rauc install update-bundle.raucb writes the new rootfs to the inactive slot, copies the kernel to the ESP, and switches the systemd-boot default entry.

  4. Reboot: systemd-boot boots the new kernel and rootfs. The post-boot verification service checks kernel integrity.

  5. Success: The slot is marked good. The update is complete.

  6. Failure: If the kernel does not boot, or the verification service marks the slot as bad, RAUC falls back to the previous slot on the next reboot. The device is not bricked.


Summary

RAUC gave me the framework, but getting it working on real hardware took work beyond the defaults. The Silicom BIOS ignored EFI boot variables entirely, so I wrote a custom systemd-boot handler script. Storing per-slot kernels on the ESP prevented kernel-rootfs drift. Atomic ESP writes (write to .tmp, then rename) prevented corruption from power loss mid-update. A post-boot verification service caught kernel mismatches. Moving SSH keys and VPN state to a persistent data partition kept them working across rootfs swaps.

The update system I ended up with survives power loss at any point in the cycle and recovers without me needing to intervene. That was the goal.


References

[1] RAUC, "Updating Your Embedded Device," https://rauc.readthedocs.io/en/latest/updating.html, accessed June 2026.

[2] RAUC, "RAUC Basics," https://rauc.readthedocs.io/en/latest/basic.html, accessed June 2026.

[3] RAUC, "Scenarios — Symmetric A/B Setup," https://rauc.readthedocs.io/en/latest/scenarios.html, accessed June 2026.

[4] RAUC, "Advanced Topics — Data Storage," https://rauc.readthedocs.io/en/latest/advanced.html#sec-data-storage, accessed June 2026.

[5] RAUC, "Integration — Bootloader Interface," https://rauc.readthedocs.io/en/latest/integration.html, accessed June 2026.

[6] Freedesktop.org, "systemd-boot — A Simple UEFI Boot Manager," https://www.freedesktop.org/software/systemd/man/systemd-boot.html, accessed June 2026.

[7] RAUC, "Reference — System Configuration File," https://rauc.readthedocs.io/en/latest/reference.html#sec-ref-slot-config, accessed June 2026.

[8] Yocto Project, "Working with the Linux Kernel," https://docs.yoctoproject.org/kernel-dev/common.html, accessed June 2026.

[9] RAUC, "Integration — Kernel Configuration," https://rauc.readthedocs.io/en/latest/integration.html#kernel-configuration, accessed June 2026.

[10] RAUC, "Using RAUC — Boot Slot Selection," https://rauc.readthedocs.io/en/latest/using.html#boot-slot-selection, accessed June 2026.

[11] RAUC, "Design Checklist," https://rauc.readthedocs.io/en/latest/checklist.html, accessed June 2026.

[12] Freedesktop.org, "systemd.service — Service Unit Configuration," https://www.freedesktop.org/software/systemd/man/systemd.service.html, accessed June 2026.

[13] RAUC, "Using RAUC — Creating Bundles," https://rauc.readthedocs.io/en/latest/using.html#creating-bundles, accessed June 2026.

[14] RAUC, "Advanced Topics — Security," https://rauc.readthedocs.io/en/latest/advanced.html#security, accessed June 2026.

[15] RAUC, "Frequently Asked Questions," https://rauc.readthedocs.io/en/latest/faq.html, accessed June 2026.

[16] Eclipse Foundation, "hawkBit — IoT Update Server," https://eclipse.dev/hawkbit/, accessed June 2026. https://rauc.readthedocs.io/en/latest/faq.html, accessed June 2026.