A use-after-free in CPython’s perf_trampoline
January 2026 · CPython Issue #143228 · Fix PR #143233
Summary
This post describes a use-after-free in CPython’s perf_trampoline implementation, triggered when sys.deactivate_stack_trampoline() is called concurrently with active bytecode execution. The root cause is free_code_arenas, which munmaps executable memory pages without checking whether worker threads are currently executing code or unwinding through those regions. The bug manifests as an immediate SIGSEGV on Python 3.12 and as SystemError: error return without exception set on 3.13+.
This is a reliability / memory safety bug, not a security vulnerability in the standard threat model: the triggering APIs (sys.activate_stack_trampoline / sys.deactivate_stack_trampoline) require in-process code execution, so an attacker able to call them can already run arbitrary Python. It’s included here because the root cause — missing synchronization between munmap and active execution on the page being unmapped — is an interesting example of the class of bug that shows up when runtimes manage their own executable memory.
Discovery
The investigation started from a flag in a static analyzer I’ve been writing, which looks for patterns where executable memory is freed without a clear synchronization discipline protecting concurrent execution on it. The flag pointed at free_code_arenas in Python/perf_trampoline.c: it iterates a global list of arenas and munmaps each one, with no reference counting or barrier against threads currently running code inside those arenas. From there I wrote a repro to confirm the race actually fires in practice.
The bug
CPython’s perf profiler integration generates small assembly trampolines on the fly that let tools like Linux perf map Python frames to C stack frames. These trampolines live in arenas allocated via mmap with execute permissions.
The cleanup path, free_code_arenas, is called from sys.deactivate_stack_trampoline():
// Python/perf_trampoline.c
static void
free_code_arenas(void)
{
code_arena_t *cur = perf_code_arena;
code_arena_t *prev;
perf_code_arena = NULL;
while (cur) {
// No check for threads currently executing on this page
// or unwinding through it.
munmap(cur->start_addr, cur->size);
prev = cur->prev;
PyMem_RawFree(cur);
cur = prev;
}
}
Meanwhile, worker threads execute through py_trampoline_evaluator, which jumps into the generated assembly sitting inside those arenas:
// Python/perf_trampoline.c
static PyObject *
py_trampoline_evaluator(PyThreadState *ts, _PyInterpreterFrame *frame, int throw)
{
// ...
// 'f' points into the mmap'd arena.
return f(ts, frame, throw, _PyEval_EvalFrameDefault);
}
The crashing interleaving:
- A worker thread enters a trampoline frame. Its program counter is now inside the mapped arena.
- The main thread calls
munmapon that region. The kernel tears down the page table entries. - The worker thread returns or hits an exception path. The unwinder (
libgcc) tries to read the stack frame to find the return address. - Page fault: the PC points into memory that’s no longer mapped.
Reproducing it
Race conditions are timing-dependent by definition. I found this one reproduces most consistently when the process is pinned to a single core: forcing the OS to context-switch between the deactivator thread and the worker threads on the same physical CPU drastically increases the chance of the scheduler yanking the CPU away from a worker at exactly the wrong moment.
taskset -c 0 python3 poc.py
import sys
import threading
import os
def heavy_workload():
while True:
_ = sum(i * i for i in range(500))
def trigger_race():
print(f"[+] PID: {os.getpid()}")
for _ in range(8):
t = threading.Thread(target=heavy_workload, daemon=True)
t.start()
while True:
sys.activate_stack_trampoline("perf")
sys.deactivate_stack_trampoline()
if __name__ == "__main__":
trigger_race()
GDB output
The deactivating thread, mid-munmap:
Thread 9 (Thread 0x725ad0b00b80 (LWP 12791)):
#0 0x0000725ad0125d7b in __GI_munmap () at ../sysdeps/unix/syscall-template.S:117
#1 0x0000725ad071b9f4 in free_code_arenas () at Python/perf_trampoline.c:315
#2 _PyPerfTrampoline_FreeArenas () at Python/perf_trampoline.c:421
And the victim worker, trying to unwind through the freed region:
Thread 1 (Thread 0x725ace5fd6c0 (LWP 12846) (Exiting)):
#0 x86_64_fallback_frame_state ... at ./md-unwind-support.h:63
pc = 0x725ab661e00a <error: Cannot access memory at address 0x725ab661e00a>
#1 uw_frame_state_for ...
#2 0x0000725ab6c86c8a in _Unwind_ForcedUnwind_Phase2 ...
0x725ab661e00a is where the trampoline used to live. Thread 9 unmapped it; Thread 1 faults trying to read it.
The fix
The fix merged upstream (PR #143233) adds reference counting to the arenas and ties arena lifetime to code-object lifetime via PyCode_AddWatcher. Instead of unmapping immediately on deactivation, arenas are marked for deletion and only unmapped when the last code object referencing them is destroyed — guaranteeing no thread can still be executing on the page.
3.13 and 3.14 will pick up the fix. 3.12 is in security-fix-only mode and the CPython maintainers decided not to backport, since the bug isn’t a security issue and the backport complexity is high relative to the impact.

