Guest-to-host escape via QEMU CXL Type 3 mailbox overflows

April 2026 · Reported to qemu-security · Classified as non-security (CXL outside policy scope)

Summary

Three bugs in QEMU’s CXL Type 3 mailbox emulation (hw/cxl/cxl-mailbox-utils.c) chain into a deterministic guest-to-host escape with full ASLR bypass. The escape gets you arbitrary code execution on the host from a guest VM in four mailbox commands, no brute force needed. Bug 1 is a heap buffer overflow that writes attacker-controlled content past a 2048-byte payload buffer. Bug 2 is a write-what-where through unchecked offset fields across six Set Feature branches. Bug 3 is an out-of-bounds read caused by type confusion that leaks QEMU code pointers, defeating PIE and ASLR. Bugs 1 and 3 chain into the full escape; Bug 2 is an independent primitive.

All three are reachable from a guest with root access to an emulated CXL Type 3 device via mailbox MMIO writes to PCI BAR2, on the q35 machine type with KVM. Tested against QEMU v11.0.0-rc2, commit b6a7d06213e5d2f7d124d16418bc289c4a8a4b82.

CXL emulation is not part of QEMU’s security-supported virtualization use case, so these won’t receive CVEs. This is consistent with the published policy: CXL emulation exists as developer infrastructure for the Linux CXL stack, not as a production-hardened device model. The bugs are real, the chain works, and the technical detail is below.

The CXL mailbox interface

CXL Type 3 memory devices expose a mailbox interface through PCI BAR2 registers. The guest writes a command opcode, a payload length, and up to 2048 bytes of payload data into MMIO-mapped registers, then sets a doorbell bit in the control register. QEMU’s emulation dispatches the command to a handler function based on the opcode’s command set and command ID. The handler reads from the payload input buffer, does its work, and writes results back into the same 2048-byte payload output buffer.

The dispatch works by indexing into a two-dimensional array of cxl_cmd structs (cci->cxl_cmd_set[set][cmd]), each holding a function pointer (handler), a name string, and input/output size constraints. This structure becomes relevant later because it’s the overwrite target in the exploit chain.

The payload buffer is embedded inside CXLDeviceState, which is itself embedded inside the heap-allocated CXLType3Dev QOM object (roughly 7MB total). This embedding matters for two reasons: the buffer’s neighbours in memory are other device state fields rather than heap metadata, and overflows within the same QOM allocation won’t trigger AddressSanitizer.

The relevant memory layout within CXLType3Dev, derived via GDB’s ptype /o:

payload buffer (2048 bytes)    @ CXLDeviceState + 1496
...
cci.cxl_cmd_set[0][0] (32 bytes) @ offset 21504 from payload buffer start

The delta between the payload buffer and the first command table entry is 2648 bytes, which is 600 bytes past the end of the 2048-byte buffer.

Bug 1: Heap buffer overflow in cmd_ccls_get_lsa (the corruption primitive)

The Get LSA command (opcode 0x4102) reads data from the device’s Label Storage Area, a persistent metadata region backed by a host file. The guest supplies a 32-bit offset and a 32-bit length. The handler at line 2273 validates the requested range against the LSA backing store size (user-configured at VM creation, up to 1MB in the PoC) but never checks it against the 2048-byte payload output buffer. The underlying cvc->get_lsa() call copies length bytes from the LSA into payload_out.

If the LSA backing store is larger than 2048 bytes (any size above that is exploitable), the guest can request a read of up to the full LSA size. The copy writes straight past the end of the 2048-byte payload buffer into whatever follows it in CXLDeviceState.

The LSA content is attacker-controlled. The Set LSA command (opcode 0x4103) lets the guest write arbitrary data into the LSA at arbitrary offsets before triggering the overflow. So the attacker pre-stages the overflow content, then triggers the over-read to spray it past the buffer boundary.

With length=4096, the command returns rc=0 and len_out=4096. The 2048 bytes past the payload buffer overwrite adjacent fields. You can observe this directly: the “memory device caps” register, which normally reads as 0x14, comes back as 0xAAAAAAAAAAAAAAAA (the LSA fill pattern) after the overflow.

A note on sanitizer coverage: this overflow does not trigger AddressSanitizer. The payload buffer is embedded inside CXLDeviceState, which sits inside the roughly 7MB CXLType3Dev QOM allocation. ASan instruments malloc boundaries, not intra-object field boundaries. The overflow stays within the same allocation, so every write lands in what ASan considers valid memory. The corruption is proven by reading back adjacent register values, not by a sanitizer alert.

This is worth calling out because it’s a real blind spot. If you’re fuzzing QEMU device models, structure-internal overflows in large composite QOM objects won’t be caught by ASan without custom memory poisoning annotations (ASAN_POISON_MEMORY_REGION) between fields. The CXLType3Dev object is 7MB of densely packed state, and there’s substantial room for exploitable intra-object corruption that no standard sanitizer configuration will detect.

The fix adds a check against the mailbox payload buffer size (CXL_MAILBOX_MAX_PAYLOAD_SIZE, 2048) before the copy, capping the output length to the buffer capacity.

Bug 2: Write-what-where in cmd_features_set_feature (the independent primitive)

The Set Feature command writes configuration data to device feature registers. It dispatches on UUID to identify which feature is being configured. Six branches (soft_ppr, hard_ppr, cacheline_sparing, row_sparing, bank_sparing, and rank_sparing at lines 1816, 1835, 1854, 1872, 1890, 1908) perform a memcpy like this:

memcpy(&ct3d->X_wr_attrs + hdr->offset, data, bytes_to_copy)

hdr->offset is a guest-controlled uint16_t with range 0-65535. bytes_to_copy is derived from the payload size and can reach 2016 bytes. The target structs (X_wr_attrs for each feature) are 2-3 bytes each. There’s no bounds check on either offset or bytes_to_copy relative to the target struct’s actual size.

This gives the attacker a write-what-where primitive: controlled content, at a controlled offset from the target struct (up to 65535 bytes forward), repeatable across all six branches independently.

The inconsistency is visible in the same function. Two sibling branches, patrol_scrub (line 1763) and ecs (line 1790), do have a correct bounds check, returning CXL_MBOX_INVALID_PAYLOAD_LENGTH if the payload exceeds the target struct size. The six vulnerable branches just omit this validation. The same PoC payload that patrol_scrub correctly rejects (rc=0x16, Invalid Payload Length) is silently accepted by all six vulnerable branches (rc=0x00, Success).

With hdr->offset=0x1000 (4096), the write destination gets pushed 4096 bytes past the target struct, past the end of the 7MB CXLType3Dev allocation, into the heap redzone. ASan confirms it:

ERROR: AddressSanitizer: heap-buffer-overflow
WRITE of size 512 at 0x... cxl-mailbox-utils.c:1816

The fix applies the same bounds check already present in patrol_scrub and ecs to all six vulnerable branches. No new logic, just consistent application of an existing pattern.

Bug 3: Out-of-bounds read in cmd_logs_get_log (the ASLR bypass)

The Get Log command (opcode 0x0401) retrieves log data from the device. The guest supplies a 16-byte UUID identifying which log to read, a 32-bit offset into that log, and a 32-bit length. The handler for the CEL (Command Effects Log) case is at lines 1213-1220.

The bounds check on line 1213 compares (offset + length) against sizeof(cci->cel_log) using byte semantics. cel_log is declared as an array of 65536 struct cel_log entries, each 4 bytes wide, so sizeof(cci->cel_log) evaluates to 262144:

if ((uint64_t)get_log->offset + get_log->length >= sizeof(cci->cel_log)) {
    return CXL_MBOX_INVALID_INPUT;
}

But the memmove on line 1220 uses pointer arithmetic on cci->cel_log, which is typed as a pointer to struct cel_log. In C, adding an integer to a typed pointer advances by that many elements, not bytes. So cci->cel_log + get_log->offset advances by offset * sizeof(struct cel_log) = offset * 4 bytes:

memmove(payload_out, cci->cel_log + get_log->offset, get_log->length);

The check thinks in bytes. The pointer thinks in 4-byte elements. The attacker controls offset. The result is a 4x range amplification: the actual read position ends up four times further from the base than the bounds check allows.

Concrete example with offset=65588, length=8:

  • Bounds check: 65588 + 8 = 65596 < 262144. Passes.
  • Actual byte position: 65588 * 4 = 262352. That’s 208 bytes past the end of cel_log.

This lands on the handler field of vdm_fm_owned_ld_mctp_cci.cxl_cmd_set[0][1], which is a function pointer holding the address of cmd_infostat_identify in QEMU’s .text section. Reading 8 bytes here gives you the full 64-bit pointer. Subtract the known symbol offset of cmd_infostat_identify (from objdump -t on the QEMU binary) and you have the PIE base address. ASLR defeated in a single deterministic mailbox command. No brute force, no /proc/self/maps, no side channels.

The read range scales with offset. With offset=200000 and length=2048, the actual byte position is 800000, roughly 800KB past cel_log. This reaches into adjacent CXLCCI structs and leaks DeviceState pointers, QEMUTimer callback pointers, and more handler function pointers. Fully controlled, repeatable, and the data comes back in the mailbox payload output buffer where the guest reads it via MMIO.

The root cause is type confusion between byte offsets and element offsets. The bounds check and the pointer arithmetic disagree on units. The fix is a one-line cast:

- memmove(payload_out, cci->cel_log + get_log->offset, get_log->length);
+ memmove(payload_out, (uint8_t *)cci->cel_log + get_log->offset, get_log->length);

Casting to uint8_t* forces byte-granularity arithmetic so the pointer advance matches the bounds check.

The escape chain

Bugs 3 and 1 chain into a complete guest-to-host escape in four mailbox commands. Deterministic, no brute force, no information leaks beyond the mailbox interface itself. Validated against a non-ASan build of QEMU v11.0.0-rc2 on Ubuntu 24.04 with full ASLR (randomize_va_space=2).

Step 1: Leak PIE base (Bug 3, Get Log OOB read)

Send Get Log with the CEL UUID, offset=65588, length=8. Read back 8 bytes of payload. The response contains the address of cmd_infostat_identify. Subtract IDENTIFY_HANDLER_OFFSET (0x48d122 in this build) to get the PIE base. Verify page alignment as a sanity check. One command, one leak, ASLR defeated.

Step 2: Plant fake command table entry (Set LSA, no bug required)

Compute system@plt = PIE base + SYSTEM_PLT_OFFSET (0x348390 in this build). Construct a 32-byte fake cxl_cmd entry:

  • Bytes 0-7: “/tmp/x\0”, which occupies the name field. The dispatcher passes this as the first argument to the handler.
  • Bytes 8-15: address of system@plt, occupying the handler field.
  • Bytes 16-23: 0xFFFFFFFFFFFFFFFF, occupying the input size field so it accepts any payload length.

Write this entry into the LSA at offset 2648 via Set LSA. This offset is chosen precisely: it’s the distance from the start of the payload buffer to cci->cxl_cmd_set[0][0]. When the overflow in Step 3 copies LSA content through the payload buffer, bytes at LSA offset 2648 land exactly on the first command table entry.

Step 3: Overflow into command table (Bug 1, Get LSA overflow)

Send Get LSA with offset=0, length=2680. The handler copies 2680 bytes from the LSA into the 2048-byte payload buffer. The first 2048 bytes fill the buffer normally. Bytes 2049-2680 overflow past the buffer boundary. Bytes at position 2648-2680 overwrite cci->cxl_cmd_set[0][0] with the fake entry from Step 2. The dispatch table now maps command (set=0, cmd=0) to system@plt with argument “/tmp/x”.

Step 4: Trigger

Send any mailbox command with set=0, cmd=0, empty payload. The dispatcher indexes cxl_cmd_set[0][0], finds the overwritten handler pointer, and calls system(“/tmp/x”). The script executes on the host as the QEMU process user. Escape complete.

The PoC writes a proof file to /tmp/pwned-by-cxl-guest containing the QEMU process’s username, PID, and timestamp.

Reproduction

Environment:

  • Host: Ubuntu 24.04.4 LTS, kernel 6.17.0-19-generic
  • GCC: 13.3.0 (Ubuntu 13.3.0-6ubuntu2~24.04.1)
  • QEMU: commit b6a7d06213 (v11.0.0-rc2)
  • ASLR: randomize_va_space=2

For the individual bugs (with ASan, to confirm Bug 2’s OOB write):

../configure --enable-asan --target-list=x86_64-softmmu --enable-debug
ninja -j$(nproc)
QTEST_QEMU_BINARY=./qemu-system-x86_64 ./tests/qtest/cxl-mbox-test --verbose

Expected: test 1 passes (Get LSA overflow, verified by register corruption), test 2 passes (Set Feature OOB, ASan crash), test 3 passes (patrol_scrub correctly rejects).

For the full escape chain (without ASan, since the chain requires stable binary offsets):

../configure --target-list=x86_64-softmmu
ninja -j$(nproc)
rm -f /tmp/pwned-by-cxl-guest /tmp/x
QTEST_QEMU_BINARY=./qemu-system-x86_64 tests/qtest/cxl-escape-poc
cat /tmp/pwned-by-cxl-guest

The escape PoC contains hardcoded offsets (SYSTEM_PLT_OFFSET, IDENTIFY_HANDLER_OFFSET, overflow geometry) specific to this build. These will differ across compiler versions, optimization levels, and QEMU commits. Comments in the source document how each offset was derived: objdump -t for symbol offsets, GDB ptype /o for structure layout.

Full qtest PoC source: https://github.com/0xCyberstan/cxl-mailbox-overflow

QEMU command line used by the PoC:

-machine q35,cxl=on -m 2G
-object memory-backend-file,id=cxl-lsa,mem-path=<tmpfile>,size=1M
-object memory-backend-ram,id=cxl-mem,size=256M
-device pxb-cxl,bus_nr=0x34,bus=pcie.0,id=cxl.0
-M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=4G
-device cxl-rp,bus=cxl.0,id=cxl-rp0,chassis=0,slot=0
-device cxl-type3,bus=cxl-rp0,persistent-memdev=cxl-mem,lsa=cxl-lsa,id=cxl-pmem0

A broader pattern

The CXL mailbox spec defines a complex command dispatch surface: variable-length payloads, UUID-dispatched feature branches, offset-indexed reads and writes into heterogeneous backing structures. This kind of design systematically produces bounds-checking omissions.

The QEMU bugs here aren’t isolated. CVE-2026-23327 is a high-severity vulnerability in the Linux kernel’s own CXL driver (drivers/cxl/core/mbox.c) where the kernel fails to validate payload sizes before accessing mailbox contents, the same root cause class, just on the kernel side of the interface rather than the emulation side. That bug was found independently by a different researcher; the mailbox spec’s complexity produced the same category of failure in both implementations.

On the QEMU side, the pattern is visible within cxl-mailbox-utils.c itself. patrol_scrub and ecs have correct bounds checks; six sibling branches doing the same thing don’t. The Get LSA handler checks the length against the backing store but not against the output buffer. The Get Log handler’s bounds check and its pointer arithmetic disagree on units. Three different omission patterns in the same file, all exploitable, all straightforward to fix.

Mailbox interfaces, whether CXL, NVMe, or any other hardware spec that defines a command-payload-response protocol with variable-length fields and offset-indexed access, deserve systematic audit of every handler’s bounds checking. Not just spot checks on the handlers that happen to get fuzzed first.