From C Hacks to C++20: Evolution of Coroutines

Última actualización: 12/27/2025
  • Coroutines generalize subroutines by preserving local state and resuming execution at suspension points, enabling natural expression of state machines, generators and cooperative concurrency.
  • C implementations evolved from manual stack manipulation and POSIX context APIs to macro-based approximations and portable coroutine libraries built on user-level context switching.
  • C++20 standardizes a stackless coroutine model with promises, co_await, co_yield and coroutine frames, letting libraries define high-level async and generator abstractions.
  • The standardized model, combined with awaiters and custom promise types, unifies coroutine usage across libraries while maintaining predictable performance and control.

coroutines evolution from C to C++

Coroutines sit in a fascinating middle ground between classic functions and full-blown threads, and their story from low-level C tricks to standardized C++20 language support is one of the most interesting evolutions in modern systems programming. If you have ever tried to juggle callbacks, state machines and thread synchronization just to handle non-blocking I/O, you have already met the kind of pain that coroutines were designed to relieve.

In this article we are going to walk through how coroutines evolved from hand-crafted C hacks and POSIX context APIs to the high-level, stackless C++20 coroutines model, explaining what a coroutine really is, how it differs from generators, threads and fibers, what “stackful” vs “stackless” means, and how the C++20 machinery (promise objects, coroutine handles, co_await, co_yield, co_return) actually behaves under the hood.

What is a coroutine, really?

características en profundidad csharp
Artículo relacionado:
In-depth C# features, ecosystem and tooling

There is no single universally accepted formal definition of coroutine, but the literature converges on two key properties that distinguish coroutines from ordinary subroutines:

  • Local state survives across suspensions: data local to a coroutine persists between activations, so each coroutine instance behaves like an object with memory.
  • Execution can pause and later resume from the same point: when control leaves a coroutine, it can be re-entered at the exact suspension point instead of always starting from the top like a normal function.

Instead of having a single, one-shot entry and exit like subroutines, coroutines support multiple entry and exit points over their lifetime, which makes them powerful for expressing producers, consumers, state machines, cooperative schedulers and asynchronous workflows in a linear, readable style.

Core dimensions of coroutine design

Real-world coroutine systems vary along three important axes that define how they behave and how expressive they are: the control-transfer model, whether coroutines are first-class values, and whether they are stackful or stackless.

First, the control-transfer mechanism separates asymmetric from symmetric coroutines. In an asymmetric design, the active coroutine can only yield back to its direct caller (using an operation conceptually similar to yield), and the caller resumes it later (with an operation akin to resume). In symmetric designs, a coroutine can explicitly transfer control to any other coroutine, rather than always returning to whoever invoked it.

Second, some languages treat coroutine instances as first-class objects that you can store, pass around and manipulate freely, while others only expose coroutines as syntactic constructs with limited ways to interact with them. First-class support dramatically increases flexibility and composability.

Third, coroutines may be stackful or stackless. A stackful coroutine can suspend deep inside a nested call stack; when it resumes, every frame in that stack continues where it left off. Stackless coroutines only suspend at the level of the coroutine function itself: regular helper functions cannot yield unless they are themselves coroutines or specially annotated.

The term “full coroutine” has been proposed for the most expressive combination: a stackful, first-class coroutine, either symmetric or asymmetric, which is powerful enough to express one-shot continuations or delimited continuations. Even though symmetric and asymmetric styles have equivalent expressive power, the asymmetric model often feels more ‘routine-like’ and familiar to most programmers.

Subroutines, coroutines, generators and threads

Subroutines can be seen as a special case of coroutines with severely restricted control flow and state behaviour. A normal function always starts at its first instruction, exits once, and throws away its local state afterwards. A coroutine, in contrast, can transfer control to other coroutines, be resumed later at the yield point, and keep its state across these transfers. Multiple coroutine instances of the same function may coexist, each with its own preserved local data.

Generators form a notable subset of coroutines, sometimes called “semicoroutines”. Like coroutines, generators may suspend execution multiple times and later continue, but they always yield back to their direct caller and have no way to redirect execution to an arbitrary third coroutine. That constraint is deliberate: generators are optimized for implementing iterators and lazy sequences where each yield simply means “produce a value for whoever is iterating me”.

In fact, you can emulate general coroutines by layering a dispatcher on top of a generator system, for example by having a top-level trampoline that receives tokens from generators and decides which generator to activate next. This technique was historically used in languages like early Python versions, which only had generators but no built-in coroutine primitives.

Coroutines are often compared to threads, but they are fundamentally about cooperative scheduling rather than preemptive parallelism. Coroutines provide concurrency in the sense of interleaving tasks without changing overall semantics, yet they do not run simultaneously on multiple cores by themselves. A coroutine only yields control at explicit suspension points, so code between those points runs without interruption from other coroutines.

This cooperative model eliminates many synchronization headaches common with threads: because only one coroutine runs at a time on a given scheduler, you often don’t need mutexes or atomic operations for ordinary shared state. On the flip side, coroutines by themselves will not exploit multiple CPU cores unless you combine them with threads or a multi-threaded executor.

Classic coroutine example: producer-consumer

A textbook demonstration of symmetric coroutines is the producer-consumer pattern with a shared queue. One coroutine generates items and pushes them into a queue until it is full, then yields to the consumer; the consumer pops items until the queue is empty, then yields back to the producer. Execution bounces back and forth as each side cooperatively yields control.

In such an implementation, the producer and consumer appear to run “in parallel” from the programmer’s perspective, although they are in fact just jumping back and forth within a single thread of execution. There is no need for OS-level threads or context switches: the yield operation can be a low-level jump that rewires the active stack frame.

This example is frequently used to introduce multithreading, but it is important to note that coroutines alone are sufficient to express the logic, and that replacing them with threads might be unnecessary or even harmful in environments that care about real-time guarantees or minimal runtime overhead.

Why coroutines matter: state machines, actors and async workflows

Because coroutines retain both their execution point and their local variables across yields, they provide a very natural way to implement complex state machines without sprawling switch statements, flags or explicit program counters. The current suspension point literally represents the current state.

Coroutines are also well-suited to actor-style concurrency models, such as those used in many game engines. Each actor can be implemented as a coroutine that periodically yields control back to a central scheduler, which runs one actor after another on a single thread. This cooperative multitasking eliminates the need for most locking while still providing responsive behaviour.

Generators built on coroutines are ideal for working with streams and data structure traversals, particularly when you want lazy, on-demand production of values. Instead of pushing values into a consumer, a generator lets the consumer pull values one by one using a simple loop.

Coroutines also shine in communication patterns like pipelines and communicating sequential processes (CSP), where each stage is a coroutine that yields when it waits for input or output. A scheduler then resumes coroutines when their communication channels are ready, providing an elegant alternative to callback-heavy event loops.

Finally, many numerical libraries use a style sometimes called “reverse communication”, where a solver suspends itself whenever it needs the user to provide some function evaluation, then resumes once the user responds. Coroutines provide a direct and readable way to express that back-and-forth control flow.

From low-level C implementations to portable libraries

One class of implementations obtains a second call stack manually and then uses setjmp/longjmp to switch between coroutines. Platform-specific inline assembly can set up a fresh stack for each coroutine; on POSIX systems, signals combined with sigaltstack can be used to bootstrap execution on an alternate stack in pure C. Once each coroutine has its own stack, setjmp saves the CPU state and stack pointer, and longjmp restores them to resume the coroutine.

Some POSIX and UNIX-aligned C libraries historically exposed helper functions like getcontext, setcontext, makecontext and swapcontext, which directly encapsulate the idea of switching between user-level contexts. Although these have since been marked obsolete in POSIX.1-2008, they formed the backbone of several coroutine libraries and inspired later designs.

Minimal coroutine implementations bypass setjmp/longjmp and context APIs entirely, opting instead for hand-written assembly that only swaps the program counter and stack pointer, clobbering other registers. This can be dramatically faster on some ABIs because it saves exactly what is needed and nothing more, whereas setjmp has to conservatively store a larger set of registers.

To hide all this complexity from application code, multiple C libraries appeared over the years that package coroutine switching into clean APIs, such as Russ Cox’s libtask and a variety of others (libpcl, coro, lthread, libcoro, libaco, libco and more). These libraries typically provide abstractions like lightweight tasks or fibers that can be resumed and yielded without the caller needing to worry about the underlying assembly tricks.

Approximate coroutines in C using macros

Where separate stacks or context-switching APIs are not available or desirable, developers have also approximated coroutines in pure C with macros and switch statements, a technique famously documented by Simon Tatham and related to the classic “Duff’s device” trick.

The core idea is to encode the coroutine’s state as a program counter implemented with a switch and case labels, where each yield-like macro expands to code that records the current label in a static variable and then returns to the caller. On the next call, the function jumps back to that label instead of starting at the beginning.

Libraries like Protothreads build on this pattern to provide extremely lightweight, stackless coroutines that fit in constrained embedded environments, but the approach comes with serious limitations: local variables do not naturally persist across yields unless they are stored in static or external structures, you cannot easily suspend from nested function calls, and you generally have only a single entry point.

Even its proponents describe this macro-based trickery as some of the ugliest C code ever used in production, and critics point out that the resulting control flow can be hard to reason about and maintain over time. Nonetheless, it remains a useful compromise in systems where additional stacks or linker tricks are off the table.

Stepping stone: fibers, threads and related abstractions

In mainstream environments that lack native coroutines, threads (and, to a lesser extent, fibers) have become the default building block for concurrency, even when cooperative behaviour would suffice. Threads are often well-supported and well-documented, but they solve a broader and more complex problem than most coroutine use cases actually need.

Fibers, where available, present a closer match to user-level coroutines because they are cooperatively scheduled and can be switched without OS involvement, making them a natural substrate on which to implement coroutine-style APIs. However, system support for fibers is patchy compared to threads, and portability suffers.

One notable difference between threads and coroutines is scheduling behaviour. Threads are typically preempted at arbitrary points, which forces programmers to reason about race conditions and synchronization everywhere. Coroutines, by contrast, only change control at explicit suspension points, which often allows you to write simpler code without locks or atomic operations.

Languages and runtimes have explored many paths to emulate coroutines on top of existing infrastructure, from rewriting bytecode (as in some Java coroutine frameworks) to mapping coroutine-like constructs to iterators (as C# did with yield before async/await) or building them on top of green threads, continuations or fibers.

Coroutines across programming languages

Over the decades, many languages have experimented with coroutine-like constructs, each with its own flavour and trade-offs, and understanding this ecosystem helps put the C++ evolution in context.

Some languages offer first-class, stackful coroutines directly in the runtime and standard library. Lua, for instance, has supported asymmetric, stackful coroutines since version 5.0 via its standard coroutine API, with primitives to create, resume and yield. Modula-2 historically included coroutine support through procedures like NEWPROCESS and TRANSFER that set up separate stacks and switch between contexts.

Other ecosystems built coroutines on top of existing primitives like continuations or green threads. Racket (and Scheme dialects in general) can implement coroutines almost trivially because they expose continuations as first-class values. Smalltalk systems, where execution stacks are manipulable objects, can similarly host coroutine abstractions without extra VM support. In OCaml, cooperative concurrency has been provided via modules that schedule threads preemptively on a single OS thread, while newer versions add green-thread style support.

Languages focused on asynchronous programming often started with generators before introducing full coroutines. C# initially added generators through yield and the iterator pattern, then evolved into async/await to model asynchronous operations as coroutines. JavaScript followed a similar path: ES2015 introduced generators as a special case of coroutines, and later versions added async/await built on top of promises and generators.

In the JVM world, Java itself does not offer native coroutines, but tools and languages around it fill the gap. Some libraries modify bytecode to simulate coroutine behaviour, others use JNI to access platform-specific mechanisms, and some rely on threads to emulate coroutine semantics at a higher cost. Kotlin, on the other hand, provides coroutines as a first-party library feature and can interoperate with Java code (albeit Java cannot naturally “suspend” and must instead block or use futures).

Scripting and dynamic languages have taken varied approaches. Python started with enhanced generators (PEP 342), extended them with subgenerator delegation (PEP 380), and eventually introduced explicit native coroutines with async/await (PEP 492), later reserving those keywords in Python 3.7. Ruby implements coroutine-like behaviour via fibers; Raku and Tcl offer native coroutine constructs; PHP 8.1 added fibers to support coroutine-based libraries for asynchronous I/O.

System-oriented languages also explore coroutine-like models with their own twist. Go uses goroutines — lightweight, multiplexed processes with dynamically-sized stacks. Although goroutines are not coroutines in the strict sense (they are closer to green threads, and local data does not survive multiple ‘calls’ in the coroutine sense), they occupy a similar mental space as user-level tasks managed by a runtime scheduler. D exposes coroutines via Fiber in its standard library, and some frameworks wrap them into convenient generator-style interfaces.

Enter C++: libraries before the standard

Before C++ standardized coroutines, the ecosystem relied on third-party libraries to bring coroutine semantics into the language, using a mixture of assembler context switching, platform APIs and clever template metaprogramming.

Boost.Context appeared as a low-level foundation for switching execution contexts across multiple architectures and operating systems, providing a portable way to manipulate user-space stacks. On top of that, Boost.Coroutine and later Boost.Coroutine2 offered higher-level coroutine abstractions, moving from support for both symmetric and asymmetric forms to a more modern asymmetric interface that aligns better with contemporary C++ idioms.

Other projects explored different angles, such as preprocessor-based stackless coroutines that emulate await/yield semantics, single-header libraries that wrap platform fibers, or frameworks (like Mordor or Oat++ coroutines) that focus specifically on hiding asynchronous I/O callbacks behind coroutine-like sequential code.

These ecosystems demonstrated that C++ developers were hungry for coroutine-like expressiveness, but they also revealed the pain points of ad-hoc solutions: inconsistent syntax, tricky portability concerns, awkward debugging and tooling, and non-trivial integration with the rest of the standard library.

C++20 coroutines: a standardized, stackless model

C++20 finally brought coroutines into the language as a first-class feature, but with a deliberately minimal and low-level design. Rather than shipping a specific high-level abstraction (such as “task”, “generator” or “future”) into the standard library, C++20 standardized the building blocks that allow libraries to define their own coroutine-friendly types.

A function becomes a coroutine if its body contains any of the coroutine-specific constructs: the co_await operator to suspend until some event or value is ready, the co_yield expression to produce a value and suspend (as in generators), or the co_return statement to complete the coroutine, optionally with a result.

Once the compiler detects any of these constructs, it transforms the function into a state machine whose persistent state is stored in a heap-allocated “coroutine frame”, unless optimization proves that the frame’s lifetime is strictly nested within the caller and can be embedded into the caller’s stack frame. This frame holds the promise object, copies of arguments, local variables that live across suspension points and bookkeeping metadata for resuming execution.

Crucially, C++20 coroutines are stackless: a coroutine can only suspend at explicit suspension points (co_await or co_yield) and cannot transparently yield from within arbitrary nested calls unless those functions are also coroutines or otherwise participate in the coroutine machinery. This makes the implementation simpler and more predictable, at the cost of some expressive power compared to fully stackful designs.

Restrictions and lifecycle of a C++20 coroutine

Not every function in C++20 is allowed to be a coroutine; the standard imposes several constraints to keep the model sane. Coroutines cannot be constexpr or consteval functions, they cannot be constructors, destructors, or the main function, and they may not use C-style variadic arguments or placeholder return types like plain auto without additional specification.

When a coroutine is first called, it does not immediately behave like a normal function body. Instead, the compiler-generated prologue allocates the coroutine frame (usually via operator new), copies function parameters into that frame (by value or by reference as declared), constructs the promise object and then calls promise.get_return_object(), which typically yields some handle or wrapper object that is returned to the caller.

The promise object is a user-defined type, discovered through std::coroutine_traits based on the coroutine’s return type and parameter list, and it dictates how results, exceptions and suspension policies work. The compiler deduces a Promise type and then calls methods like initial_suspend(), final_suspend(), return_value() or return_void(), and unhandled_exception() at the appropriate phases in the coroutine lifecycle.

At the start of execution, the coroutine calls promise.initial_suspend() and co_awaits whatever that returns, which allows library authors to decide whether their coroutine type is “eager” (starts running immediately) or “lazy” (returns to the caller until explicitly resumed). When the coroutine eventually finishes via co_return or an unhandled exception, it invokes promise.final_suspend(), giving the library a last chance to schedule continuations or clean up.

When the coroutine frame is destroyed — either after completion or via an explicit destroy operation on its handle — the runtime destroys the promise object, the copies of parameters and any remaining live locals, then frees the memory with operator delete (or with a promise-specific allocator if provided). If allocation fails and the promise defines get_return_object_on_allocation_failure(), the coroutine can gracefully signal failure without throwing std::bad_alloc.

co_await, awaitables and awaiters

The co_await operator is the primary suspension primitive in C++20’s coroutine system, and understanding its mechanics is crucial for designing robust asynchronous abstractions.

When you write co_await expr; inside a coroutine, the compiler first converts expr into an “awaitable” object, either by passing it through promise.await_transform(expr) if such a member exists, or by using it as-is. Then it determines the “awaiter” object either by calling a member operator co_await on the awaitable, a non-member operator co_await, or simply treating the awaitable itself as the awaiter if no such operator exists.

The awaiter must provide three key operations: await_ready(), await_suspend(handle) and await_resume(). If await_ready() returns true, the coroutine does not suspend and directly calls await_resume(), enabling fast paths for already-completed operations. If it returns false, the coroutine suspends, its state is stored in the frame, and await_suspend() is called with a handle to the current coroutine.

Inside await_suspend(), the awaiter can decide what to do with the coroutine handle: schedule it for later resumption on some executor, resume another coroutine, or even resume the same coroutine immediately (depending on the return type and value of await_suspend()). When the awaited operation completes, somebody eventually calls handle.resume(), at which point control returns to just before await_resume(), and then await_resume() yields the result of the co_await expression.

The standard library ships two trivial awaitables: std::suspend_always and std::suspend_never, which are often used in initial_suspend() and final_suspend() implementations to indicate lazy or eager start and how to behave at the end. More sophisticated awaiters can hold per-operation state, for example to tie coroutines into asynchronous I/O APIs, and that state lives inside the coroutine frame across the suspension point.

co_yield and generator-style coroutines

The co_yield expression builds on top of co_await to support generator-like behaviour, where the coroutine repeatedly produces values for a caller that iterates over them.

Conceptually, co_yield value; expands into a call to promise.yield_value(value) and then a suspension, typically via co_await std::suspend_always or a similar awaitable. The promise implementation is responsible for storing the yielded value in some accessible place (by copying, moving or referencing it) so that the consumer can retrieve it before the coroutine resumes.

Library code that implements generators commonly defines a promise type that exposes methods for accessing the currently-yielded value and for integrating with standard iteration protocols, such as providing begin()/end() on the handle wrapper and advancing the underlying coroutine on each increment.

Error handling, dangling references and subtle details

C++20 coroutines integrate with C++ exception handling through the promise’s unhandled_exception() method, which the compiler calls if an exception escapes the coroutine body. The coroutine then proceeds to its final suspension, and the promise is expected to arrange for the error to be communicated to whoever owns the coroutine’s result type.

Because parameters are copied or referenced into the coroutine frame at creation time, care must be taken with reference parameters: if they refer to objects whose lifetime ends before the coroutine is resumed, the coroutine may dereference dangling references. This is not a coroutine-specific problem, but the persistent nature of the frame makes it easier to accidentally outlive referenced objects.

The standard has also evolved to clarify edge cases via defect reports, such as making certain invalid return_void setups ill-formed instead of producing undefined behaviour when falling off the end of a coroutine, and allowing co_await in more contexts like lambda bodies.

Together, these rules and refinements shape a relatively low-level but predictable model on which higher-level coroutine libraries can safely build, ranging from simple generators to full async/await task systems integrated with executors and schedulers.

Seen from above, the journey from ad-hoc assembly tricks in C to the structured, stackless, promise-driven coroutines of C++20 reflects a steady push towards safer, more composable abstractions for expressing complex control flow, asynchronous operations and stateful computations, without giving up the performance and control that systems programmers rely on.

Related posts: