Copy Fail CVE-2026-31431: how a tiny Linux bug breaks privilege boundaries

Última actualización: 05/02/2026
  • Copy Fail (CVE-2026-31431) is a high‑severity Linux kernel flaw that lets any local user become root using a tiny Python script.
  • The bug sits in the algif_aead/authencesn crypto path, corrupting 4 bytes in the page cache of any readable file without touching disk.
  • Almost all major Linux distributions since 2017 are or have been affected, with a major impact on multi‑tenant, cloud and container workloads.
  • Vendors have released kernel patches; until they’re applied, admins should disable algif_aead / AF_ALG and harden monitoring for AF_ALG + splice() misuse.

Linux privilege escalation vulnerability

The foundations of Linux security have taken a hit with the disclosure of CVE-2026-31431, nicknamed “Copy Fail”, a kernel vulnerability that turns almost any local account into a fast track to root. The issue, quietly present since 2017, affects the cryptographic AF_ALG subsystem and has forced administrators across the world to review kernels, containers and cloud nodes in record time.

What makes this bug stand out is the mix of trivial exploitation, broad distribution coverage and forensic stealth. A proof‑of‑concept in Python weighing roughly 732 bytes, using only standard library modules, is enough to escalate from an unprivileged shell to full system control on a wide range of modern Linux installations. No complex race windows, no kernel‑specific offsets and no third‑party payloads are required.

What Copy Fail (CVE-2026-31431) actually is

At its core, Copy Fail is a local privilege escalation (LPE) flaw in the Linux kernel’s crypto code, tracked as CVE-2026-31431 and rated with high severity (CVSS around 7.8). Any user capable of running code on a vulnerable system – a normal account, a CI job, a container process, or a compromised web application – can leverage it to become root in seconds.

The vulnerability resides in the algif_aead interface of the AF_ALG subsystem, which exposes authenticated encryption (AEAD) operations to user space through sockets. The bug hits a specific template, authencesn, which combines HMAC‑SHA256 with AES‑CBC and is widely enabled by default on mainstream distributions.

From the attacker’s point of view, the primitive is deceptively simple: an unprivileged user can write four fully controlled bytes into the page cache of any readable file, then trigger that corrupted in‑memory copy to be executed with elevated privileges. Because the kernel reads binaries from the page cache when possible, altering that cached data effectively changes what will run – without touching the on‑disk file.

This behaviour sets Copy Fail apart from many conventional bugs: the filesystem view remains pristine, checksums on disk still match, and file integrity scanners see nothing suspicious, even though the code actually being executed has been tampered with in RAM.

How a 2017 optimisation turned into a critical bug

Technically, Copy Fail originates from a logic error introduced in 2017 when developers added an optimisation to allow certain AEAD operations to run “in‑place” in algif_aead.c (commit often referenced as 72548b093ee3). The change was intended purely as a performance win, but it subtly altered how buffers and page cache pages were wired together.

For decryption, the kernel code began copying additional authenticated data (AAD) and ciphertext from the transmit scatterlist into a receive buffer, and then chained tag pages by reference rather than fully separating them. In practice, this turned page‑cache pages – meant to be read‑only – into writable segments of the destination scatterlist.

The authencesn template then performed a scratch write just past the end of the user‑visible buffer, walking over those now‑writable page‑cache pages. On paper, each individual change looked harmless, but the combination of in‑place optimisation, page‑cache reuse and the template’s scratch behaviour produced the four‑byte out‑of‑bounds write that underpins Copy Fail.

Unlike race‑condition classics such as Dirty COW (CVE-2016-5195) or Dirty Pipe (CVE-2022-0847), this vulnerability does not rely on timing windows or repeated attempts. The exploit is deterministic and robust across kernels, which is one reason security teams are treating it as one of the most serious Linux LPEs in recent years.

How the exploit works: AF_ALG, splice() and four dangerous bytes

The publicly described proof‑of‑concept is a tiny Python 3 script (about 732 bytes) that uses only standard modules such as os, socket and zlib. It requires Python 3.10+ for os.splice(), but otherwise avoids any compiled payloads, external dependencies or kernel‑specific tuning.

In broad strokes, the exploit opens a socket with domain AF_ALG configured for AEAD mode and binds it to the vulnerable authencesn template. It then arranges for pages from the kernel’s page cache of a target binary – typically a setuid program like /usr/bin/su or sudo – to be used as part of the AEAD output buffer.

To bridge those pieces, the script makes use of the splice() system call, which can connect file descriptors and pipe buffers without copying data to user space. By splicing page‑cache pages for a chosen binary into the AF_ALG data path, the exploit ensures that any overflow will land directly on that cached file content.

During the AEAD operation, due to the underlying bug, the kernel writes four attacker‑controlled bytes just beyond the intended buffer boundary. The exploit controls both the offset and the value, using carefully crafted AAD and ciphertext. By repeating this primitive multiple times, it can patch critical instructions in the cached copy of the setuid binary, for example bypassing password checks or invoking a root shell.

When the attacker finally launches the modified binary with execve(), the kernel executes the corrupted page‑cache copy instead of re‑reading the file from disk, and the process starts with full root privileges. The whole chain relies on valid system calls – socket(), splice(), sendmsg(), execve() – which makes the activity hard to distinguish from legitimate workload patterns in many environments.

Why it is so hard to spot: memory‑only corruption

Perhaps the most unsettling aspect of Copy Fail is that the attack never writes the modified data back to disk. The page‑cache entry holding the corrupted binary code is not marked as “dirty”, so the kernel has no reason to flush it back to storage. As far as the on‑disk filesystem is concerned, everything looks normal.

This means that hash‑based integrity checks, file monitoring agents and standard antivirus tools will typically fail to notice anything. A forensic investigator analysing the disk image after an incident would see a perfectly clean binary, even if that program had just been executed from a tampered in‑memory copy moments earlier.

The impact is also transient in a slightly deceptive way: a reboot or cache eviction will discard the corrupted page, removing traces of the manipulated content. From an incident‑response perspective, this combination of volatility and invisibility raises the bar for reliable detection and post‑mortem analysis.

Because the exploit leverages normal kernel interfaces, even advanced EDR or SIEM solutions can struggle, unless they are configured to watch very specific patterns of AF_ALG and splice() usage tied to setuid binaries. Several vendors have already added dedicated rules with names such as possible_copy_fail_cve_2026_31431 or possible_lpe_by_python to help flag suspicious behaviour.

Scope of exposure: which kernels and distributions are at risk

According to technical write‑ups, any Linux kernel version from 4.14 (July 2017) up to the first patched releases is vulnerable, provided the affected crypto path is enabled. That includes a broad swath of mainstream distributions and flavours deployed over nearly a decade.

Among the systems confirmed or strongly indicated as impacted are Ubuntu, Debian, SUSE, Red Hat Enterprise Linux (RHEL), Amazon Linux and several WSL2 kernels. Tests by researchers have shown the same unmodified Python exploit working across Ubuntu 24.04, Amazon Linux 2023, RHEL 10.1 and SUSE 16, underlining how little version‑specific tuning is required.

The severity is magnified in multi‑user and multi‑tenant environments such as shared hosting, campus servers, CI/CD runners and cloud platforms where arbitrary customer code runs on shared hardware. In those scenarios, any account or container with local code execution can be turned into a stepping stone to full host compromise.

It is worth noting that recent kernels and distributions with the official fix applied – for example Ubuntu 26.04 “Resolute” and later trees derived from patched upstream versions – are not affected. However, older LTS releases or custom kernels may remain vulnerable until explicitly updated, and unsupported installations simply will not receive a patch at all.

Extended support offerings, such as Ubuntu Pro’s ten‑year coverage window and similar enterprise programs, become more relevant in this context. They ensure that even older deployments receive urgent security fixes like the one for Copy Fail, reducing the number of “orphaned” systems still running exploitable kernels.

Cloud, containers and CI/CD: why modern stacks are especially exposed

On a single‑user laptop, a local‑only LPE might feel like a manageable risk. In modern infrastructures, where code from multiple tenants runs side by side on a shared kernel, the dynamics are very different. Copy Fail has quickly climbed the priority list precisely because it cuts across many of these isolation boundaries.

In container platforms such as Docker, LXC or Kubernetes, all workloads on a host share the same kernel, including the page cache. If that kernel is vulnerable and the AF_ALG interface is available, a process in a seemingly constrained pod can poison cached setuid binaries or other sensitive executables used by the host, then ride that foothold to root on the entire node.

Once a node is compromised, attackers can often pivot sideways to other pods, manipulate control‑plane components or access cluster credentials. For large Kubernetes installations in banks, telecom operators or SaaS providers, Copy Fail effectively turns a single pod exploit into a potential cluster‑wide incident.

CI/CD systems are in a similar position. Runners for GitHub Actions, GitLab CI, Jenkins or internal build farms frequently execute untrusted or semi‑trusted code. An apparently minor bug in one repository, or a malicious pull request, could be chained with Copy Fail to seize control of the runner host, tamper with build artefacts or access secrets used by other jobs.

Shared hosting and VPS providers face their own headaches. A customer with shell access to one virtual instance could, in some configurations, exploit Copy Fail to escape into the underlying host or influence neighbouring guests, depending on how tightly each layer is isolated and which interfaces are exposed.

How Copy Fail was found: AI steps into kernel auditing

One of the most striking details around CVE-2026-31431 is how it was discovered. The initial report comes from Theori’s researcher Taeyang Lee, using the company’s AI‑assisted auditing platform Xint Code. According to the published timeline, their tooling identified the issue in roughly an hour of analysis.

That timeframe is not just an anecdote: it hints at a shift in the economics of kernel vulnerabilities. Historically, high‑quality Linux LPEs with broad portability and no race conditions could command significant prices on grey‑market brokers. Bounties and private programmes have reportedly paid anywhere from tens of thousands to millions of dollars for zero‑days of this class.

If AI‑driven code analysis can now surface such bugs orders of magnitude faster and cheaper than traditional manual review, long‑held assumptions about the rarity of kernel‑grade issues become outdated. Defensive budgets and patch‑prioritisation strategies built on the belief that these bugs are scarce – because they are expensive to find – may need to be revisited.

At the same time, the discovery of Copy Fail reinforces the idea that AI is likely to become a standard part of the Linux kernel’s own review pipeline. Rather than replacing human maintainers, automated tools can act as an additional filter, flagging subtle interactions between optimisations and subsystems that would be hard to spot by eye alone.

Vendor response, patches and the official fix

The upstream Linux community has addressed the root cause through a patch identified as commit a664bf3d603d. This change effectively reverts the 2017 in‑place optimisation in algif_aead.c, re‑establishing a strict separation between the transmit scatterlist, which may include page‑cache pages, and the receive scatterlist used as user output.

By removing the code that chained both lists via sg_chain(), the patch prevents page‑cache entries from ever being repurposed as writable scratch space for authencesn operations. In other words, the optimisation is sacrificed in favour of predictable, safe buffer handling, closing the four‑byte out‑of‑bounds write at its source.

The disclosure timeline reflects a fairly standard coordinated process: Theori contacted the Linux kernel security team on 23 March 2026, fixes were merged into mainline on 1 April, the CVE identifier was assigned on 22 April, and public advisories went out around 29-30 April. Distributions then began shipping patched kernels for their supported releases.

Projects such as Debian, Ubuntu, SUSE and Amazon Linux issued security updates relatively quickly across supported branches. In some cases, specialised kernel builds – for example, variants targeting particular hardware platforms – were listed as not affected if they never included the vulnerable code path.

Red Hat initially signalled a more cautious approach, indicating that certain products might defer the fix pending impact analysis. Under pressure from customers and the wider community, and given the exploit’s simplicity, guidance was updated so that RHEL lines received the patch on roughly the same cadence as other enterprise distributions.

Temporary mitigations while you patch

In ideal conditions, administrators would simply roll out patched kernels and reboot every system. In practice, production environments with strict uptime requirements often need interim measures to reduce risk until maintenance windows are available.

Security advisories, including recommendations from CERT‑EU, describe a straightforward mitigation: disable the vulnerable algif_aead module. This can be done persistently by adding a rule such as install algif_aead /bin/false to /etc/modprobe.d/disable-algif.conf and then attempting to unload the module with rmmod algif_aead 2>/dev/null || true.

Crucially, this workaround does not interfere with common crypto stacks in user space. Technologies like dm‑crypt/LUKS, kTLS, IPsec/XFRM, OpenSSL, GnuTLS, NSS or SSH do not depend on the specific algif_aead path being disabled, so most environments can apply this mitigation with limited side effects.

For containerised workloads, CERT‑EU and other bodies advise taking an extra step: block the creation of AF_ALG sockets via seccomp profiles or similar sandboxing policies across all pods and jobs, regardless of whether the host kernel is patched. Cutting off AF_ALG to untrusted workloads effectively removes the main exploitation channel even on kernels that have not yet been upgraded.

Some vendors also recommend reducing the number of setuid binaries present on servers, especially in high‑risk environments such as shared CI runners or multi‑tenant cloud nodes. While this does not fix Copy Fail itself, it shrinks the set of high‑value targets that an attacker can manipulate via the page‑cache write primitive.

Monitoring and detection: what to watch for

Alongside patching and hardening, many organisations are investing time in instrumenting detection for potential Copy Fail exploitation. Given that the bug does not leave obvious traces on disk, monitoring has to focus on behavioural indicators at the kernel and process level.

A common recommendation is to configure auditd to log suspicious combinations of system calls. For example, teams can track reads of setuid binaries such as su, sudo, passwd, gpasswd, newgrp, chfn, chsh, mount, umount or fusermount3 when performed by interpreters like Python running from untrusted paths.

On top of that, calls to splice() by non‑root users that manipulate file descriptors pointing to those same setuid binaries can be logged and correlated. A sequence where a user script reads a setuid binary, uses splice() and then launches a shell via that binary is the sort of pattern worth flagging.

Another signal is creation of sockets with domain AF_ALG (numeric value 26) from regular user accounts (e.g. UIDs ≥ 1000) on systems where AF_ALG is not expected to be part of routine application behaviour. SIEM platforms can aggregate these events and raise alerts when they occur outside of known‑good processes.

Commercial EDR products are rapidly evolving their rulesets to include Copy Fail‑aware analytics, often under event labels such as possible_copy_fail_cve_2026_31431. These rules typically watch for combinations of AF_ALG usage, splice() sequences and unusual privilege transitions inside a single process lineage.

Copy Fail in context: a new chapter in Linux LPE history

Copy Fail inevitably invites comparisons with earlier high‑profile Linux kernel flaws. Vulnerabilities like Dirty COW and Dirty Pipe also revolved around manipulating the kernel’s handling of supposedly immutable data to achieve writes where none should be possible.

Where those earlier bugs often centred on on‑disk file modifications or pipe buffer corruption, Copy Fail takes a slightly different route: it focuses on transient, in‑memory corruption of the page cache via the crypto subsystem. For attackers, this offers a blend of portability, reliability and stealth that is attractive for chaining into larger intrusion playbooks.

The case also feeds into a broader discussion about the complexity of modern kernels and the limits of human review. Here, three perfectly reasonable design decisions – an in‑place optimisation, a particular use of page‑cache pages and a scratch write in authencesn – intersected to create a serious vulnerability that remained hidden for nearly nine years.

For enterprises and operators that rely heavily on Linux, the main takeaway is less about one specific bug and more about the importance of fast patching, conservative exposure of kernel interfaces and continuous security monitoring. Extending kernel lifecycles without a clear strategy for timely security updates significantly increases the odds that similar issues will resurface as critical operational problems later on.

Copy Fail (CVE-2026-31431) has shown how a small logic error in the Linux kernel’s crypto code can undermine privilege boundaries across containers, CI pipelines and shared servers with very little effort from an attacker. Rolling out patched kernels, disabling algif_aead and AF_ALG where they are not strictly needed, and tightening monitoring around AF_ALG and splice() usage have quickly become essential tasks for Linux administrators who want to keep local footholds from turning into full system compromise.

como usar sudo en linux
Artículo relacionado:
How to Use sudo in Linux Like a Pro
Related posts: