A/B OTA Updates That Actually Work: RAUC on systemd-boot
Embedded devices in the field need updates. A security patch, a kernel bug fix, a new feature — without OTA (over-the-air) updates, every change requires physical access. But OTA updates carry risk: a power loss during the write, a corrupted bundle, a kernel that panics on the new rootfs. Without a fallback mechanism, a failed update means a bricked device 1.
RAUC (Robust Auto-Update Controller) is the go-to OTA framework for embedded Linux 2. It provides signed update bundles, A/B redundant root filesystems, automatic rollback, and a well-defined bootloader interface. But when I tried to integrate RAUC with real hardware, I hit problems the documentation did not prepare me for.
This article walks through what I built — an A/B update system on x86-64 UEFI hardware with systemd-boot. It focuses on the problems I actually ran into: EFI boot variables that did not work, kernel and rootfs that drifted apart, and state that I expected to survive an update but did not.
The A/B layout
An A/B update system maintains two complete copies of the root filesystem. One slot is active (currently booted); the other is inactive. An update writes the new image to the inactive slot, marks it as the primary boot target, and reboots. If the new slot boots successfully, it becomes the active slot. If it fails to boot, the bootloader falls back to the previous slot 3.
The canonical partition layout:
[ESP 256MB] [rootfs-a 4GB] [rootfs-b 4GB] [data ~remaining]
- ESP (EFI System Partition): holds the kernel, systemd-boot loader, and per-slot boot entries.
- rootfs-a / rootfs-b: two identical root filesystem partitions. Only one is active at a time.
- data: persistent storage that survives rootfs updates (SSH keys, VPN state, application data, RAUC slot state).
The data partition turned out to be critical. Without it, every OTA update wiped my SSH host keys, regenerated them, and forced me to re-accept on the client side. Tailscale state, Docker volumes, and anything that needs to survive an update must live on the data partition 4.
Why EFI boot variables fail on real hardware
RAUC's EFI bootloader backend uses efibootmgr to set BootNext EFI
variables that tell the firmware which slot to boot. This works on
development boards, in QEMU, and on most consumer PCs.
It fails on many industrial and embedded x86-64 platforms. The Silicom
Cordoba BIOS, and others like it, ignores EFI NVRAM boot variables entirely.
The firmware has a fixed boot order, and no amount of efibootmgr
manipulation changes which slot boots 5.
The symptom is maddening: rauc status shows the correct slot marked as
primary, efibootmgr shows the correct BootNext, and the device still
boots the old slot after an update. The BIOS simply does not read
BootNext.
My fix was to abandon the EFI bootloader backend entirely and write a custom bootloader handler for systemd-boot. Instead of EFI variables, slot selection is controlled by loader entry files on the ESP.
Custom bootloader handler with systemd-boot
systemd-boot reads loader entries from the ESP at /loader/entries/ 6.
Each entry is a simple text file that specifies the kernel, initrd, options,
and a title:
title Slot A
linux /bzImage-A
options root=PARTUUID=<uuid-a> rootwait console=ttyS0,115200
title Slot B
linux /bzImage-B
options root=PARTUUID=<uuid-b> rootwait console=ttyS0,115200
The active slot is determined by loader.conf on the ESP:
default A.conf
timeout 3
To switch slots, the custom handler script modifies loader.conf to point
to the other entry:
#!/bin/sh
# rauc-boot-handler.sh — custom bootloader backend for systemd-boot
ESP_MOUNT="/boot"
LOADER_CONF="${ESP_MOUNT}/loader/loader.conf"
case "$1" in
get-primary)
# Extract the default entry from loader.conf
grep "^default" "$LOADER_CONF" | awk '{print $2}' | sed 's/-.*//'
;;
set-primary)
# Switch default entry to the new slot
NEW_SLOT="$2" # "A" or "B"
sed -i "s/^default .*/default ${NEW_SLOT}.conf/" "$LOADER_CONF"
;;
get-state)
# Read slot state from persistent storage
SLOT="$2"
STATE_FILE="/data/rauc/slot-state/${SLOT}"
if [ -f "$STATE_FILE" ]; then
cat "$STATE_FILE"
else
echo "good"
fi
;;
set-state)
# Write slot state to persistent storage
SLOT="$2"
STATE="$3" # "good" or "bad"
mkdir -p /data/rauc/slot-state
echo "$STATE" > "/data/rauc/slot-state/${SLOT}"
;;
esacSlot state is persisted to /data so it survives rootfs overwrites. The
state directory is created on first boot if it does not exist.
The RAUC system.conf references the custom handler. The compatible
string must match across the bootloader, system.conf, and update bundles,
or RAUC will reject the update 15:
[system]
compatible=My Distro
bootloader=custom
[handlers]
bootloader-custom-backend=/usr/lib/rauc/rauc-boot-handler.sh
[slot.rootfs.0]
device=/dev/sda2
type=ext4
bootname=A
[slot.rootfs.1]
device=/dev/sda3
type=ext4
bootname=BThe bootname field maps to the slot letter used in the handler script and
the systemd-boot loader entry titles 7.
Per-slot kernels on the ESP
A common early failure mode: the kernel is stored on the rootfs, the update replaces the rootfs, and the new kernel panics because its modules are on the new rootfs but the old kernel does not know about them. Or the reverse: the bootloader loads the new kernel, but the system boots the old rootfs because the kernel command line still points to the old partition.
What worked for me was storing per-slot kernels on the ESP 8. Each slot's loader entry references a separate kernel image:
/boot/bzImage-A # Kernel for slot A
/boot/bzImage-B # Kernel for slot B
/loader/entries/A.conf # References bzImage-A + rootfs-a
/loader/entries/B.conf # References bzImage-B + rootfs-b
During an update, the RAUC boot handler copies the kernel from the newly
installed rootfs to the ESP during the set-primary hook:
set-primary)
NEW_SLOT="$2"
# Determine which partition corresponds to this slot
if [ "$NEW_SLOT" = "A" ]; then
ROOTFS_DEV="/dev/sda2"
else
ROOTFS_DEV="/dev/sda3"
fi
# Mount the new rootfs read-only
mount -o ro "$ROOTFS_DEV" /mnt/new-rootfs
# Copy kernel to ESP atomically
cp /mnt/new-rootfs/usr/lib/kernel/bzImage "${ESP_MOUNT}/bzImage-${NEW_SLOT}.tmp"
mv "${ESP_MOUNT}/bzImage-${NEW_SLOT}.tmp" "${ESP_MOUNT}/bzImage-${NEW_SLOT}"
sync
umount /mnt/new-rootfs
# Switch the default entry
sed -i "s/^default .*/default ${NEW_SLOT}.conf/" "$LOADER_CONF"
;;The write-then-rename pattern (copy to .tmp, then mv) is atomic on the
ESP's vfat filesystem. A partial write leaves a .tmp file that the
bootloader ignores; only the completed mv exposes the new kernel. Without
this, a power loss during the copy produces a truncated kernel that panics
at boot 9.
The kernel image is stored on the rootfs at /usr/lib/kernel/bzImage by a
custom Yocto recipe that installs it during image creation:
# kernel-image-rootfs.bb
# This recipe depends on the kernel being built first.
# In production, use DEPENDS = "virtual/kernel" and reference
# ${STAGING_KERNEL_DIR} rather than ${DEPLOY_DIR_IMAGE}.
do_install() {
install -d ${D}/usr/lib/kernel
install -m 0644 ${DEPLOY_DIR_IMAGE}/bzImage ${D}/usr/lib/kernel/bzImage
}Post-boot kernel verification and automatic rollback
After an update, the device boots the new slot. If the kernel is corrupt, or the rootfs packages are incompatible, the device might boot but enter a degraded state. RAUC's built-in boot success detection only checks that the system reached multi-user target — not that the kernel matches the rootfs.
A post-boot verification service adds a stronger check 10:
# rauc-verify-boot.service
[Unit]
Description=Verify RAUC boot integrity
After=local-fs.target
Before=rauc.service
[Service]
Type=oneshot
ExecStart=/usr/lib/rauc/verify-boot.sh
RemainAfterExit=no
[Install]
WantedBy=multi-user.targetThe verification script compares the running kernel against the kernel on the rootfs:
#!/bin/sh
# verify-boot.sh — verify the booted kernel matches the rootfs kernel
# Portable alternative to grep -oP (Busybox grep does not support -P)
SLOT=$(grep -o 'root=PARTUUID=[^ ]*' /proc/cmdline | cut -d= -f3)
# Determine which slot we are booting
if echo "$SLOT" | grep -q "uuid-a"; then
CURRENT_SLOT="A"
else
CURRENT_SLOT="B"
fi
# Hash the running kernel and the rootfs kernel
ESP_HASH=$(sha256sum "/boot/bzImage-${CURRENT_SLOT}" | awk '{print $1}')
ROOTFS_HASH=$(sha256sum /usr/lib/kernel/bzImage | awk '{print $1}')
if [ "$ESP_HASH" != "$ROOTFS_HASH" ]; then
echo "Kernel mismatch between ESP and rootfs for slot ${CURRENT_SLOT}"
echo "Marking slot as bad to trigger rollback"
rauc status mark-bad "$CURRENT_SLOT"
exit 1
fi
echo "Kernel verification passed for slot ${CURRENT_SLOT}"If the hashes do not match, the script marks the slot as bad. On the next reboot, RAUC detects the bad slot and falls back to the other one 11.
Persistent state: what must survive an update
The rootfs is completely replaced during an OTA update. Anything on the
rootfs that is not on the data partition is lost. The following must be
moved to /data and symlinked or bound-mounted:
| Data | Why it must survive |
|---|---|
SSH host keys (/etc/ssh/ssh_host_*) | Changing host keys breaks client trust |
Tailscale state (/var/lib/tailscale/) | Re-authenticating to the tailnet on every update is a denial-of-service risk |
Docker volumes (/var/lib/docker/volumes/) | Application data stored in volumes |
| RAUC slot state | Slot health tracking must persist across rootfs swaps |
A data-persistence service runs on every boot, before SSH and Tailscale
start 12:
[Unit]
Description=Ensure persistent data symlinks
DefaultDependencies=no
Before=sshd.service tailscaled.service
[Service]
Type=oneshot
ExecStart=/usr/lib/systemd/scripts/data-persistence.sh
RemainAfterExit=yes
[Install]
WantedBy=sysinit.targetThe script migrates data on first boot and maintains symlinks on subsequent boots:
#!/bin/sh
# data-persistence.sh
# Migrate SSH host keys to /data on first boot
if [ ! -d /data/ssh ]; then
mkdir -p /data/ssh
if [ -d /etc/ssh ]; then
cp -a /etc/ssh/ssh_host_* /data/ssh/ 2>/dev/null || true
fi
fi
# Ensure symlinks point to /data.
# Only symlink if files actually exist (empty glob on first boot
# would create a broken /data/ssh/ssh_host_* symlink literal).
rm -f /etc/ssh/ssh_host_*
for key in /data/ssh/ssh_host_*; do
[ -f "$key" ] && ln -sf "$key" /etc/ssh/
done
# Tailscale state
if [ ! -d /data/tailscale ]; then
mkdir -p /data/tailscale
if [ -d /var/lib/tailscale ]; then
cp -a /var/lib/tailscale/* /data/tailscale/ 2>/dev/null || true
fi
fi
rm -rf /var/lib/tailscale
ln -sf /data/tailscale /var/lib/tailscaleThe service runs before the network stack, so the migrated keys and state are in place before SSH and Tailscale start.
RAUC bundle creation
RAUC update bundles are created on the build host using the rauc bundle
command. The bundle contains the new rootfs image, a manifest, and a
cryptographic signature 13:
```bash
# RAUC bundle syntax varies by version. Check your version with rauc bundle --help.
# This example uses the --cert/--key flags (RAUC 1.x):
rauc bundle \
--cert=/path/to/development-1.cert.pem \
--key=/path/to/development-1.key.pem \
/path/to/rootfs.ext4 \
update-bundle.raucbNever use rauc install --no-verify on a device that already has a
production keyring configured — it allows installing unsigned bundles and
bypasses the signature check entirely.
The manifest describes the compatibility string and the image:
[update]
compatible=My Distro
version=2.0.4
[image.rootfs]
filename=rootfs.ext4In production, tie compatible to your hardware revision rather than just
the software distro name (e.g., compatible=myco-appliance-1.0). This
prevents installing a bundle built for a different hardware SKU 14.
RAUC verifies the bundle signature on the device before installation. Production devices use a production key pair that is never distributed with the development toolchain. The CA certificate on the device trusts both the development and production signing keys, allowing development bundles on test devices but preventing them on production units with a properly configured keyring 14.
The full update flow
Putting it all together:
-
Build: Yocto produces a production rootfs image and a RAUC update bundle (
.raucb). -
Deploy: The bundle is uploaded to the device (via SCP, HTTP streaming, or a deployment server like hawkBit 16).
-
Install:
rauc install update-bundle.raucbwrites the new rootfs to the inactive slot, copies the kernel to the ESP, and switches the systemd-boot default entry. -
Reboot: systemd-boot boots the new kernel and rootfs. The post-boot verification service checks kernel integrity.
-
Success: The slot is marked good. The update is complete.
-
Failure: If the kernel does not boot, or the verification service marks the slot as bad, RAUC falls back to the previous slot on the next reboot. The device is not bricked.
Summary
RAUC gave me the framework, but getting it working on real hardware took work beyond the defaults. The Silicom BIOS ignored EFI boot variables entirely, so I wrote a custom systemd-boot handler script. Storing per-slot kernels on the ESP prevented kernel-rootfs drift. Atomic ESP writes (write to .tmp, then rename) prevented corruption from power loss mid-update. A post-boot verification service caught kernel mismatches. Moving SSH keys and VPN state to a persistent data partition kept them working across rootfs swaps.
The update system I ended up with survives power loss at any point in the cycle and recovers without me needing to intervene. That was the goal.
References
[1] RAUC, "Updating Your Embedded Device," https://rauc.readthedocs.io/en/latest/updating.html, accessed June 2026.
[2] RAUC, "RAUC Basics," https://rauc.readthedocs.io/en/latest/basic.html, accessed June 2026.
[3] RAUC, "Scenarios — Symmetric A/B Setup," https://rauc.readthedocs.io/en/latest/scenarios.html, accessed June 2026.
[4] RAUC, "Advanced Topics — Data Storage," https://rauc.readthedocs.io/en/latest/advanced.html#sec-data-storage, accessed June 2026.
[5] RAUC, "Integration — Bootloader Interface," https://rauc.readthedocs.io/en/latest/integration.html, accessed June 2026.
[6] Freedesktop.org, "systemd-boot — A Simple UEFI Boot Manager," https://www.freedesktop.org/software/systemd/man/systemd-boot.html, accessed June 2026.
[7] RAUC, "Reference — System Configuration File," https://rauc.readthedocs.io/en/latest/reference.html#sec-ref-slot-config, accessed June 2026.
[8] Yocto Project, "Working with the Linux Kernel," https://docs.yoctoproject.org/kernel-dev/common.html, accessed June 2026.
[9] RAUC, "Integration — Kernel Configuration," https://rauc.readthedocs.io/en/latest/integration.html#kernel-configuration, accessed June 2026.
[10] RAUC, "Using RAUC — Boot Slot Selection," https://rauc.readthedocs.io/en/latest/using.html#boot-slot-selection, accessed June 2026.
[11] RAUC, "Design Checklist," https://rauc.readthedocs.io/en/latest/checklist.html, accessed June 2026.
[12] Freedesktop.org, "systemd.service — Service Unit Configuration," https://www.freedesktop.org/software/systemd/man/systemd.service.html, accessed June 2026.
[13] RAUC, "Using RAUC — Creating Bundles," https://rauc.readthedocs.io/en/latest/using.html#creating-bundles, accessed June 2026.
[14] RAUC, "Advanced Topics — Security," https://rauc.readthedocs.io/en/latest/advanced.html#security, accessed June 2026.
[15] RAUC, "Frequently Asked Questions," https://rauc.readthedocs.io/en/latest/faq.html, accessed June 2026.
[16] Eclipse Foundation, "hawkBit — IoT Update Server," https://eclipse.dev/hawkbit/, accessed June 2026. https://rauc.readthedocs.io/en/latest/faq.html, accessed June 2026.