Futures Explained in 200 Lines of Rust

This book aims to explain Futures in Rust using an example driven approach, exploring why they're designed the way they are, and how they work. We'll also take a look at some of the alternatives we have when dealing with concurrency in programming.

Going into the level of detail I do in this book is not needed to use futures or async/await in Rust. It's for the curious out there that want to know how it all works.

What this book covers

This book will try to explain everything you might wonder about up until the topic of different types of executors and runtimes. We'll just implement a very simple runtime in this book introducing some concepts but it's enough to get started.

Stjepan Glavina has made an excellent series of articles about async runtimes and executors.

The way you should go about it is to read this book first, then continue reading Stjepan's articles to learn more about runtimes and how they work, especially:

  1. Build your own block_on()
  2. Build your own executor

You should also check out the smol runtime as it's a real runtime made by the same author. It's well commented and made to be easy to learn from.

I've limited myself to a 200 line main example (hence the title) to limit the scope and introduce an example that can easily be explored further.

However, there is a lot to digest and it's not what I would call easy, but we'll take everything step by step so get a cup of tea and relax.

I hope you enjoy the ride.

This book is developed in the open, and contributions are welcome. You'll find the repository for the book itself here. The final example which you can clone, fork or copy can be found here. Any suggestions or improvements can be filed as a PR or in the issue tracker for the book.

As always, all kinds of feedback is welcome.

Reader exercises and further reading

In the last chapter I've taken the liberty to suggest some small exercises if you want to explore a little further.

This book is also the fourth book I have written about concurrent programming in Rust. If you like it, you might want to check out the others as well:

Credits and thanks

I'd like to take this chance to thank the people behind mio, tokio, async_std, futures, libc, crossbeam which underpins so much of the async ecosystem and and rarely gets enough praise in my eyes.

A special thanks to jonhoo who was kind enough to give me some valuable feedback on a very early draft of this book. He has not read the finished product, but a big thanks is definitely due.

Thanks to @ckaran for suggestions and contributions in chapter 2 and 3.

Translations

This book has been translated to Chinese by nkbai.

Some Background Information

Before we go into the details about Futures in Rust, let's take a quick look at the alternatives for handling concurrent programming in general and some pros and cons for each of them.

While we do that we'll also explain some aspects when it comes to concurrency which will make it easier for us when we dive into Futures specifically.

For fun, I've added a small snippet of runnable code with most of the examples. If you're like me, things get way more interesting then and maybe you'll see some things you haven't seen before along the way.

Threads provided by the operating system

Now, one way of accomplishing concurrent programming is letting the OS take care of everything for us. We do this by simply spawning a new OS thread for each task we want to accomplish and write code like we normally would.

The runtime we use to handle concurrency for us is the operating system itself.

Advantages:

  • Simple
  • Easy to use
  • Switching between tasks is reasonably fast
  • You get parallelism for free

Drawbacks:

  • OS level threads come with a rather large stack. If you have many tasks waiting simultaneously (like you would in a web server under heavy load) you'll run out of memory pretty fast.
  • There are a lot of syscalls involved. This can be pretty costly when the number of tasks is high.
  • The OS has many things it needs to handle. It might not switch back to your thread as fast as you'd wish.
  • Might not be an option on some systems

Using OS threads in Rust looks like this:

use std::thread;

fn main() {
    println!("So we start the program here!");
    let t1 = thread::spawn(move || {
        thread::sleep(std::time::Duration::from_millis(200));
        println!("We create tasks which gets run when they're finished!");
    });

    let t2 = thread::spawn(move || {
        thread::sleep(std::time::Duration::from_millis(100));
        println!("We can even chain callbacks...");
        let t3 = thread::spawn(move || {
            thread::sleep(std::time::Duration::from_millis(50));
            println!("...like this!");
        });
        t3.join().unwrap();
    });
    println!("While our tasks are executing we can do other stuff here.");

    t1.join().unwrap();
    t2.join().unwrap();
}

OS threads sure have some pretty big advantages. So why all this talk about "async" and concurrency in the first place?

First, for computers to be efficient they need to multitask. Once you start to look under the covers (like how an operating system works) you'll see concurrency everywhere. It's very fundamental in everything we do.

Secondly, we have the web.

Web servers are all about I/O and handling small tasks (requests). When the number of small tasks is large it's not a good fit for OS threads as of today because of the memory they require and the overhead involved when creating new threads.

This gets even more problematic when the load is variable which means the current number of tasks a program has at any point in time is unpredictable. That's why you'll see so many async web frameworks and database drivers today.

However, for a huge number of problems, the standard OS threads will often be the right solution. So, just think twice about your problem before you reach for an async library.

Now, let's look at some other options for multitasking. They all have in common that they implement a way to do multitasking by having a "userland" runtime.

Green threads/stackful coroutines

In this book I'll use the term "green threads" to mean stackful coroutines to differentiate them from the other continuation mechanisms described in this chapter. You can, however, see the term "green threads" be used to describe a broader set of continuation mechanisms in different literature or discussions on the internet.

Green threads use the same mechanism as an OS - creating a thread for each task, setting up a stack, saving the CPU's state, and jumping from one task(thread) to another by doing a "context switch".

We yield control to the scheduler (which is a central part of the runtime in such a system) which then continues running a different task.

Rust had green threads once, but they were removed before it hit 1.0. The state of execution is stored in each stack so in such a solution there would be no need for async, await, Future or Pin. In many ways, green threads mimics how an operating system facilitates concurrency, and implementing them is a great learning experience.

The typical flow looks like this:

  1. Run some non-blocking code.
  2. Make a blocking call to some external resource.
  3. CPU "jumps" to the "main" thread which schedules a different thread to run and "jumps" to that stack.
  4. Run some non-blocking code on the new thread until a new blocking call or the task is finished.
  5. CPU "jumps" back to the "main" thread, schedules a new thread which is ready to make progress, and "jumps" to that thread.

These "jumps" are known as context switches. Your OS is doing it many times each second as you read this.

Advantages:

  1. Simple to use. The code will look like it does when using OS threads.
  2. A "context switch" is reasonably fast.
  3. Each stack only gets a little memory to start with so you can have hundreds of thousands of green threads running.
  4. It's easy to incorporate preemption which puts a lot of control in the hands of the runtime implementors.

Drawbacks:

  1. The stacks might need to grow. Solving this is not easy and will have a cost.
  2. You need to save the CPU state on every switch.
  3. It's not a zero cost abstraction (Rust had green threads early on and this was one of the reasons they were removed).
  4. Complicated to implement correctly if you want to support many different platforms.

A green threads example could look something like this:

The example presented below is an adapted example from an earlier gitbook I wrote about green threads called Green Threads Explained in 200 lines of Rust. If you want to know what's going on you'll find everything explained in detail in that book. The code below is wildly unsafe and it's just to show a real example. It's not in any way meant to showcase "best practice". Just so we're on the same page.

Press the expand icon in the top right corner to show the example code (you'll actually find a minimal implementation of green threads)

#![feature(naked_functions)]
 use std::arch::asm;

 const DEFAULT_STACK_SIZE: usize = 1024 * 1024 * 2;
 const MAX_THREADS: usize = 4;
 static mut RUNTIME: usize = 0;

 pub struct Runtime {
     threads: Vec<Thread>,
     current: usize,
 }

 #[derive(PartialEq, Eq, Debug)]
 enum State {
     Available,
     Running,
     Ready,
 }

 struct Thread {
     id: usize,
     stack: Vec<u8>,
     ctx: ThreadContext,
     state: State,
     task: Option<Box<dyn Fn()>>,
 }

 #[derive(Debug, Default)]
 #[repr(C)]
 struct ThreadContext {
     rsp: u64,
     r15: u64,
     r14: u64,
     r13: u64,
     r12: u64,
     rbx: u64,
     rbp: u64,
     thread_ptr: u64,
 }

 impl Thread {
     fn new(id: usize) -> Self {
         Thread {
             id,
             stack: vec![0_u8; DEFAULT_STACK_SIZE],
             ctx: ThreadContext::default(),
             state: State::Available,
             task: None,
         }
     }
 }

 impl Runtime {
     pub fn new() -> Self {
         let base_thread = Thread {
             id: 0,
             stack: vec![0_u8; DEFAULT_STACK_SIZE],
             ctx: ThreadContext::default(),
             state: State::Running,
             task: None,
         };

         let mut threads = vec![base_thread];
         threads[0].ctx.thread_ptr = &threads[0] as *const Thread as u64;
         let mut available_threads: Vec<Thread> = (1..MAX_THREADS).map(|i| Thread::new(i)).collect();
         threads.append(&mut available_threads);

         Runtime {
             threads,
             current: 0,
         }
     }

     pub fn init(&self) {
         unsafe {
             let r_ptr: *const Runtime = self;
             RUNTIME = r_ptr as usize;
         }
     }

     pub fn run(&mut self) -> ! {
         while self.t_yield() {}
         std::process::exit(0);
     }

     fn t_return(&mut self) {
         if self.current != 0 {
             self.threads[self.current].state = State::Available;
             self.t_yield();
         }
     }

     #[inline(never)]
     fn t_yield(&mut self) -> bool {
         let mut pos = self.current;
         while self.threads[pos].state != State::Ready {
             pos += 1;
             if pos == self.threads.len() {
                 pos = 0;
             }
             if pos == self.current {
                 return false;
             }
         }

         if self.threads[self.current].state != State::Available {
             self.threads[self.current].state = State::Ready;
         }

         self.threads[pos].state = State::Running;
         let old_pos = self.current;
         self.current = pos;

         unsafe {
            let old: *mut ThreadContext = &mut self.threads[old_pos].ctx;
            let new: *const ThreadContext = &self.threads[pos].ctx;
            asm!("call switch", in("rdi") old, in("rsi") new, clobber_abi("C"));
        }
        self.threads.len() > 0
     }

     pub fn spawn<F: Fn() + 'static>(f: F){
         unsafe {
             let rt_ptr = RUNTIME as *mut Runtime;
             let available = (*rt_ptr)
                 .threads
                 .iter_mut()
                 .find(|t| t.state == State::Available)
                 .expect("no available thread.");

             let size = available.stack.len();
             let s_ptr = available.stack.as_mut_ptr().offset(size as isize);
             let s_ptr = (s_ptr as usize & !15) as *mut u8;
             available.task = Some(Box::new(f));
             available.ctx.thread_ptr = available as *const Thread as u64;
             //ptr::write(s_ptr.offset((size - 8) as isize) as *mut u64, guard as u64);
             std::ptr::write(s_ptr.offset(-16) as *mut u64, guard as u64);
             std::ptr::write(s_ptr.offset(-24) as *mut u64, skip as u64);
             std::ptr::write(s_ptr.offset(-32) as *mut u64, call as u64);
             available.ctx.rsp = s_ptr.offset(-32) as u64;
             available.state = State::Ready;
         }
     }
 }

 fn call(thread: u64) {
     let thread = unsafe { &*(thread as *const Thread) };
     if let Some(f) = &thread.task {
         f();
     }
 }

 #[naked]
 unsafe extern "C" fn skip() {
     asm!("ret", options(noreturn))
 }

 fn guard() {
     unsafe {
         let rt_ptr = RUNTIME as *mut Runtime;
         let rt = &mut *rt_ptr;
         println!("THREAD {} FINISHED.", rt.threads[rt.current].id);
         rt.t_return();
     };
 }

 pub fn yield_thread() {
     unsafe {
         let rt_ptr = RUNTIME as *mut Runtime;
         (*rt_ptr).t_yield();
     };
 }
#[naked]
#[no_mangle]
unsafe extern "C" fn switch() {
    asm!(
        "mov 0x00[rdi], rsp",
        "mov 0x08[rdi], r15",
        "mov 0x10[rdi], r14",
        "mov 0x18[rdi], r13",
        "mov 0x20[rdi], r12",
        "mov 0x28[rdi], rbx",
        "mov 0x30[rdi], rbp",
        "mov rsp, 0x00[rsi]",
        "mov r15, 0x08[rsi]",
        "mov r14, 0x10[rsi]",
        "mov r13, 0x18[rsi]",
        "mov r12, 0x20[rsi]",
        "mov rbx, 0x28[rsi]",
        "mov rbp, 0x30[rsi]",
        "mov rdi, 0x38[rsi]",
        "ret", options(noreturn)
    );
}
#[cfg(not(windows))]
pub fn main() {
    let mut runtime = Runtime::new();
    runtime.init();
    Runtime::spawn(|| {
        println!("I haven't implemented a timer in this example.");
        yield_thread();
        println!("Finally, notice how the tasks are executed concurrently.");
    });
    Runtime::spawn(|| {
        println!("But we can still nest tasks...");
        Runtime::spawn(|| {
            println!("...like this!");
        })
    });
    runtime.run();
}
#[cfg(windows)]
fn main() { }

Still hanging in there? Good. Don't get frustrated if the code above is difficult to understand. If I hadn't written it myself I would probably feel the same. You can always go back and read the book which explains it later.

Callback based approaches

You probably already know what we're going to talk about in the next paragraphs from JavaScript which I assume most know.

If your exposure to JavaScript callbacks has given you any sorts of PTSD earlier in life, close your eyes now and scroll down for 2-3 seconds. You'll find a link there that takes you to safety.

The whole idea behind a callback based approach is to save a pointer to a set of instructions we want to run later together with whatever state is needed. In Rust this would be a closure. In the example below, we save this information in a HashMap but it's not the only option.

The basic idea of not involving threads as a primary way to achieve concurrency is the common denominator for the rest of the approaches. Including the one Rust uses today which we'll soon get to.

Advantages:

  • Easy to implement in most languages
  • No context switching
  • Relatively low memory overhead (in most cases)

Drawbacks:

  • Since each task must save the state it needs for later, the memory usage will grow linearly with the number of callbacks in a chain of computations.
  • Can be hard to reason about. Many people already know this as "callback hell".
  • It's a very different way of writing a program, and will require a substantial rewrite to go from a "normal" program flow to one that uses a "callback based" flow.
  • Sharing state between tasks is a hard problem in Rust using this approach due to its ownership model.

An extremely simplified example of a how a callback based approach could look like is:

fn program_main() {
    println!("So we start the program here!");
    set_timeout(200, || {
        println!("We create tasks with a callback that runs once the task finished!");
    });
    set_timeout(100, || {
        println!("We can even chain sub-tasks...");
        set_timeout(50, || {
            println!("...like this!");
        })
    });
    println!("While our tasks are executing we can do other stuff instead of waiting.");
}

fn main() {
    RT.with(|rt| rt.run(program_main));
}

use std::sync::mpsc::{channel, Receiver, Sender};
use std::{cell::RefCell, collections::HashMap, thread};

thread_local! {
    static RT: Runtime = Runtime::new();
}

struct Runtime {
    callbacks: RefCell<HashMap<usize, Box<dyn FnOnce() -> ()>>>,
    next_id: RefCell<usize>,
    evt_sender: Sender<usize>,
    evt_receiver: Receiver<usize>,
}

fn set_timeout(ms: u64, cb: impl FnOnce() + 'static) {
    RT.with(|rt| {
        let id = *rt.next_id.borrow();
        *rt.next_id.borrow_mut() += 1;
        rt.callbacks.borrow_mut().insert(id, Box::new(cb));
        let evt_sender = rt.evt_sender.clone();
        thread::spawn(move || {
            thread::sleep(std::time::Duration::from_millis(ms));
            evt_sender.send(id).unwrap();
        });
    });
}

impl Runtime {
    fn new() -> Self {
        let (evt_sender, evt_receiver) = channel();
        Runtime {
            callbacks: RefCell::new(HashMap::new()),
            next_id: RefCell::new(1),
            evt_sender,
            evt_receiver,
        }
    }

    fn run(&self, program: fn()) {
        program();
        for evt_id in &self.evt_receiver {
            let cb = self.callbacks.borrow_mut().remove(&evt_id).unwrap();
            cb();
            if self.callbacks.borrow().is_empty() {
                break;
            }
        }
    }
}

We're keeping this super simple, and you might wonder what's the difference between this approach and the one using OS threads and passing in the callbacks to the OS threads directly.

The difference is that the callbacks are run on the same thread using this example. The OS threads we create are basically just used as timers but could represent any kind of resource that we'll have to wait for.

From callbacks to promises

You might start to wonder by now, when are we going to talk about Futures?

Well, we're getting there. You see Promises, Futures, and other names for deferred computations are often used interchangeably.

There are formal differences between them, but we won't cover those here. It's worth explaining promises a bit since they're widely known due to their use in JavaScript. Promises also have a lot in common with Rust's Futures.

First of all, many languages have a concept of promises, but I'll use the one from JavaScript in the examples below.

Promises are one way to deal with the complexity which comes with a callback based approach.

Instead of:

setTimer(200, () => {
  setTimer(100, () => {
    setTimer(50, () => {
      console.log("I'm the last one");
    });
  });
});

We can do this:

function timer(ms) {
    return new Promise((resolve) => setTimeout(resolve, ms));
}

timer(200)
.then(() => timer(100))
.then(() => timer(50))
.then(() => console.log("I'm the last one"));

The change is even more substantial under the hood. You see, promises return a state machine which can be in one of three states: pending, fulfilled or rejected.

When we call timer(200) in the sample above, we get back a promise in the state pending.

Since promises are re-written as state machines, they also enable an even better syntax which allows us to write our last example like this:

async function run() {
    await timer(200);
    await timer(100);
    await timer(50);
    console.log("I'm the last one");
}

You can consider the run function as a pausable task consisting of several sub-tasks. On each "await" point it yields control to the scheduler (in this case it's the well-known JavaScript event loop).

Once one of the sub-tasks changes state to either fulfilled or rejected, the task is scheduled to continue to the next step.

Syntactically, Rust's Futures 0.1 was a lot like the promises example above, and Rust's Futures 0.3 is a lot like async/await in our last example.

Now this is also where the similarities between JavaScript promises and Rust's Futures stop. The reason we go through all this is to get an introduction and get into the right mindset for exploring Rust's Futures.

To avoid confusion later on: There's one difference you should know. JavaScript promises are eagerly evaluated. That means that once it's created, it starts running a task. Rust's Futures on the other hand are lazily evaluated. They need to be polled once before they do any work.


PANIC BUTTON (next chapter)

Futures in Rust

Overview:

  • Get a high level introduction to concurrency in Rust
  • Know what Rust provides and not when working with async code
  • Get to know why we need a runtime-library in Rust
  • Understand the difference between "leaf-future" and a "non-leaf-future"
  • Get insight on how to handle CPU intensive tasks

Futures

So what is a future?

A future is a representation of some operation which will complete in the future.

Async in Rust uses a Poll based approach, in which an asynchronous task will have three phases.

  1. The Poll phase. A Future is polled which results in the task progressing until a point where it can no longer make progress. We often refer to the part of the runtime which polls a Future as an executor.
  2. The Wait phase. An event source, most often referred to as a reactor, registers that a Future is waiting for an event to happen and makes sure that it will wake the Future when that event is ready.
  3. The Wake phase. The event happens and the Future is woken up. It's now up to the executor which polled the Future in step 1 to schedule the future to be polled again and make further progress until it completes or reaches a new point where it can't make further progress and the cycle repeats.

Now, when we talk about futures I find it useful to make a distinction between non-leaf futures and leaf futures early on because in practice they're pretty different from one another.

Leaf futures

Runtimes create leaf futures which represent a resource like a socket.

// stream is a **leaf-future**
let mut stream = tokio::net::TcpStream::connect("127.0.0.1:3000");

Operations on these resources, like a Read on a socket, will be non-blocking and return a future which we call a leaf future since it's the future which we're actually waiting on.

It's unlikely that you'll implement a leaf future yourself unless you're writing a runtime, but we'll go through how they're constructed in this book as well.

It's also unlikely that you'll pass a leaf-future to a runtime and run it to completion alone as you'll understand by reading the next paragraph.

Non-leaf-futures

Non-leaf-futures are the kind of futures we as users of a runtime write ourselves using the async keyword to create a task which can be run on the executor.

The bulk of an async program will consist of non-leaf-futures, which are a kind of pause-able computation. This is an important distinction since these futures represents a set of operations. Often, such a task will await a leaf future as one of many operations to complete the task.

// Non-leaf-future
let non_leaf = async {
    let mut stream = TcpStream::connect("127.0.0.1:3000").await.unwrap();// <- yield
    println!("connected!");
    let result = stream.write(b"hello world\n").await; // <- yield
    println!("message sent!");
    ...
};

The key to these tasks is that they're able to yield control to the runtime's scheduler and then resume execution again where it left off at a later point.

In contrast to leaf futures, these kind of futures do not themselves represent an I/O resource. When we poll them they will run until they get to a leaf-future which returns Pending and then yield control to the scheduler (which is a part of what we call the runtime).

Runtimes

Languages like C#, JavaScript, Java, GO, and many others comes with a runtime for handling concurrency. So if you come from one of those languages this will seem a bit strange to you.

Rust is different from these languages in the sense that Rust doesn't come with a runtime for handling concurrency, so you need to use a library which provides this for you.

Quite a bit of complexity attributed to Futures is actually complexity rooted in runtimes; creating an efficient runtime is hard.

Learning how to use one correctly requires quite a bit of effort as well, but you'll see that there are several similarities between these kind of runtimes, so learning one makes learning the next much easier.

The difference between Rust and other languages is that you have to make an active choice when it comes to picking a runtime. Most often in other languages, you'll just use the one provided for you.

A useful mental model of an async runtime

I find it easier to reason about how Futures work by creating a high level mental model we can use. To do that I have to introduce the concept of a runtime which will drive our Futures to completion.

Please note that the mental model I create here is not the only way to drive Futures to completion and that Rust’s Futures does not impose any restrictions on how you actually accomplish this task.

A fully working async system in Rust can be divided into three parts:

  1. Reactor
  2. Executor
  3. Future

So, how does these three parts work together? They do that through an object called the Waker. The Waker is how the reactor tells the executor that a specific Future is ready to run. Once you understand the life cycle and ownership of a Waker, you'll understand how futures work from a user's perspective. Here is the life cycle:

  • A Waker is created by the executor.
  • When a future is polled the first time by the executor, it’s given a clone of the Waker object created by the executor. Since this is a shared object (e.g. an Arc<T>), all clones actually point to the same underlying object. Thus, anything that calls any clone of the original Waker will wake the particular Future that was registered to it.
  • The future clones the Waker and passes it to the reactor, which stores it to use later.

You could think of a "future" like a channel for the Waker: The channel starts with the future that's polled the first time by the executor and is passed a handle to a Waker. It ends in a leaf-future which passes that handle to the reactor.

Note that the Waker is wrapped in a rather uninteresting Context struct which we will learn more about later. The interesting part is the Waker that is passed on.

At some point in the future, the reactor will decide that the future is ready to run. It will wake the future via the Waker that it stored. This action will do what is necessary to get the executor in a position to poll the future. We'll go into more detail on Wakers in the Waker and Context chapter.

Since the interface is the same across all executors, reactors can in theory be completely oblivious to the type of the executor, and vice-versa. Executors and reactors never need to communicate with one another directly.

This design is what gives the futures framework it's power and flexibility and allows the Rust standard library to provide an ergonomic, zero-cost abstraction for us to use.

In an effort to try to visualize how these parts work together I put together a set of slides in the next chapter that I hope will help.

The two most popular runtimes for Futures as of writing this is:

What Rust's standard library takes care of

  1. A common interface representing an operation which will be completed in the future through the Future trait.
  2. An ergonomic way of creating tasks which can be suspended and resumed through the async and await keywords.
  3. A defined interface to wake up a suspended task through the Waker type.

That's really what Rust's standard library does. As you see there is no definition of non-blocking I/O, how these tasks are created, or how they're run.

I/O vs CPU intensive tasks

As you know now, what you normally write are called non-leaf futures. Let's take a look at this async block using pseudo-rust as example:

let non_leaf = async {
    let mut stream = TcpStream::connect("127.0.0.1:3000").await.unwrap(); // <-- yield

    // request a large dataset
    let result = stream.write(get_dataset_request).await.unwrap(); // <-- yield

    // wait for the dataset
    let mut response = vec![];
    stream.read(&mut response).await.unwrap(); // <-- yield

    // do some CPU-intensive analysis on the dataset
    let report = analyzer::analyze_data(response).unwrap();

    // send the results back
    stream.write(report).await.unwrap(); // <-- yield
};

Now, as you'll see when we go through how Futures work, the code we write between the yield points are run on the same thread as our executor.

That means that while our analyzer is working on the dataset, the executor is busy doing calculations instead of handling new requests.

Fortunately there are a few ways to handle this, and it's not difficult, but it's something you must be aware of:

  1. We could create a new leaf future which sends our task to another thread and resolves when the task is finished. We could await this leaf-future like any other future.

  2. The runtime could have some kind of supervisor that monitors how much time different tasks take, and move the executor itself to a different thread so it can continue to run even though our analyzer task is blocking the original executor thread.

  3. You can create a reactor yourself which is compatible with the runtime which does the analysis any way you see fit, and returns a Future which can be awaited.

Now, #1 is the usual way of handling this, but some executors implement #2 as well. The problem with #2 is that if you switch runtime you need to make sure that it supports this kind of supervision as well or else you will end up blocking the executor.

And #3 is more of theoretical importance, normally you'd be happy by sending the task to the thread-pool most runtimes provide.

Most executors have a way to accomplish #1 using methods like spawn_blocking.

These methods send the task to a thread-pool created by the runtime where you can either perform CPU-intensive tasks or "blocking" tasks which are not supported by the runtime.

Now, armed with this knowledge you are already on a good way for understanding Futures, but we're not gonna stop yet, there are lots of details to cover.

Take a break or a cup of coffee and get ready as we go for a deep dive in the next chapters.

Want to learn more about concurrency and async?

If you find the concepts of concurrency and async programming confusing in general, I know where you're coming from and I have written some resources to try to give a high-level overview that will make it easier to learn Rust's Futures afterwards:

Learning these concepts by studying futures is making it much harder than it needs to be, so go on and read these chapters if you feel a bit unsure.

I'll be right here when you're back.

However, if you feel that you have the basics covered, then let's get moving!

Bonus section - additional notes on Futures and Wakers

In this section we take a deeper look at some advantages of having a loose coupling between the Executor-part and Reactor-part of an async runtime.

Earlier in this chapter, I mentioned that it is common for the executor to create a new Waker for each Future that is registered with the executor, but that the Waker is a shared object similar to a Arc<T>. One of the reasons for this design is that it allows different Reactors the ability to Wake a Future.

As an example of how this can be used, consider how you could create a new type of Future that has the ability to be canceled:

One way to achieve this would be to add an AtomicBool to the instance of the future, and an extra method called cancel(). The cancel() method will first set the AtomicBool to signal that the future is now canceled, and then immediately call instance's own copy of the Waker.

Once the executor starts executing the Future, the Future will know that it was canceled, and will do the appropriate cleanup actions to terminate itself.

The main reason for designing the Future in this manner is because we don't have to modify either the Executor or the other Reactors; they are all oblivious to the change.

The only possible issue is with the design of the Future itself; a Future that is canceled still needs to terminate correctly according to the rules outlined in the docs for Future. That means that it can't just delete it's resources and then sit there; it needs to return a value. It is up to you to decide if a canceled future will return Pending forever, or if it will return a value in Ready. Just be aware that if other Futures are awaiting it, they won't be able to start until Ready is returned.

A common technique for cancelable Futures is to have them return a Result with an error that signals the Future was canceled; that will permit any Futures that are awaiting the canceled Future a chance to progress, with the knowledge that the Future they depended on was canceled. There are additional concerns as well, but beyond the scope of this book. Read the documentation and code for the futures crate for a better understanding of what the concerns are.

Thanks to @ckaran for contributing this bonus segment.

A mental model of how Futures and runtimes work

The main goal in this part is to build a high level mental model of how the different pieces we read about in the previous chapter works together. I hope this will make it easier to understand the high level concepts before we take a deep dive into topics like trait objects and generators in the next few chapters.

This is not the only way to create a model of an async system since we're making assumptions on runtime specifics that can vary a great deal. It's the way I found it easiest to build upon and it's relevant for understanding a lot of real implementations you'll find in the async ecosystem.

Finally, please note that the code itself is "pseudo-rust" due to the need for brevity and clarity.

Click on a page to open a larger view in a new tab.

slide1 slide2 slide3 slide4 slide5 slide6 slide7 slide8 slide9 slide10 slide11 slide12 slide13 slide14 slide15 slide16 slide17 slide18 slide19

Waker and Context

Overview:

  • Understand how the Waker object is constructed
  • Learn how the runtime knows when a leaf-future can resume
  • Learn the basics of dynamic dispatch and trait objects

The Waker type is described as part of RFC#2592.

The Waker

The Waker type allows for a loose coupling between the reactor-part and the executor-part of a runtime.

By having a wake up mechanism that is not tied to the thing that executes the future, runtime-implementors can come up with interesting new wake-up mechanisms. An example of this can be spawning a thread to do some work that eventually notifies the future, completely independent of the current runtime.

Without a waker, the executor would be the only way to notify a running task, whereas with the waker, we get a loose coupling where it's easy to extend the ecosystem with new leaf-level tasks.

If you want to read more about the reasoning behind the Waker type I can recommend Withoutboats articles series about them.

The Context type

As the docs state as of now this type only wraps a Waker, but it gives some flexibility for future evolutions of the API in Rust. The context can for example hold task-local storage and provide space for debugging hooks in later iterations.

Understanding the Waker

One of the most confusing things we encounter when implementing our own Futures is how we implement a Waker . Creating a Waker involves creating a vtable which allows us to use dynamic dispatch to call methods on a type erased trait object we construct ourselves.

The Waker implementation is specific to the type of executor in use, but all Wakers share a similar interface. It's useful to think of it as a Trait. It's not implemented as such since that would require us to treat it like a trait object like &dyn Waker or Arc<dyn Waker> which either restricts the API by requiring a &dyn Waker trait object, or would require an Arc<dyn Waker> which in turn requires a heap allocation which a lot of embedded-like systems can't do.

Having the Waker implemented the way it is supports users creating a statically-allocated wakers and even more exotic mechanisms to on platforms where that makes sense.

If you want to know more about dynamic dispatch in Rust I can recommend an article written by Adam Schwalm called Exploring Dynamic Dispatch in Rust.

Let's explain this a bit more in detail.

Fat pointers in Rust

To get a better understanding of how we implement the Waker in Rust, we need to take a step back and talk about some fundamentals. Let's start by taking a look at the size of some different pointer types in Rust.

Run the following code (You'll have to press "play" to see the output):

use std::mem::size_of;
trait SomeTrait { }

fn main() {
    println!("======== The size of different pointers in Rust: ========");
    println!("&dyn Trait:------{}", size_of::<&dyn SomeTrait>());
    println!("&[&dyn Trait]:---{}", size_of::<&[&dyn SomeTrait]>());
    println!("Box<Trait>:------{}", size_of::<Box<SomeTrait>>());
    println!("Box<Box<Trait>>:-{}", size_of::<Box<Box<SomeTrait>>>());
    println!("&i32:------------{}", size_of::<&i32>());
    println!("&[i32]:----------{}", size_of::<&[i32]>());
    println!("Box<i32>:--------{}", size_of::<Box<i32>>());
    println!("&Box<i32>:-------{}", size_of::<&Box<i32>>());
    println!("[&dyn Trait;4]:--{}", size_of::<[&dyn SomeTrait; 4]>());
    println!("[i32;4]:---------{}", size_of::<[i32; 4]>());
}

As you see from the output after running this, the sizes of the references varies. Many are 8 bytes (which is a pointer size on 64 bit systems), but some are 16 bytes.

The 16 byte sized pointers are called "fat pointers" since they carry extra information.

Example &[i32] :

  • The first 8 bytes is the actual pointer to the first element in the array (or part of an array the slice refers to)
  • The second 8 bytes is the length of the slice.

Example &dyn SomeTrait:

This is the type of fat pointer we'll concern ourselves about going forward. &dyn SomeTrait is a reference to a trait, or what Rust calls a trait object.

The layout for a pointer to a trait object looks like this:

  • The first 8 bytes points to the data for the trait object
  • The second 8 bytes points to the vtable for the trait object

The reason for this is to allow us to refer to an object we know nothing about except that it implements the methods defined by our trait. To accomplish this we use dynamic dispatch.

Let's explain this in code instead of words by implementing our own trait object from these parts:

use std::mem::{align_of, size_of};
// A reference to a trait object is a fat pointer: (data_ptr, vtable_ptr)
trait Test {
    fn add(&self) -> i32;
    fn sub(&self) -> i32;
    fn mul(&self) -> i32;
}

// This will represent our home-brewed fat pointer to a trait object
#[repr(C)]
struct FatPointer<'a> {
    /// A reference is a pointer to an instantiated `Data` instance
    data: &'a mut Data,
    /// Since we need to pass in literal values like length and alignment it's
    /// easiest for us to convert pointers to usize-integers instead of the other way around.
    vtable: *const usize,
}

// This is the data in our trait object. It's just two numbers we want to operate on.
struct Data {
    a: i32,
    b: i32,
}

// ====== function definitions ======
fn add(s: &Data) -> i32 {
    s.a + s.b
}
fn sub(s: &Data) -> i32 {
    s.a - s.b
}
fn mul(s: &Data) -> i32 {
    s.a * s.b
}

fn main() {
    let mut data = Data {a: 3, b: 2};
    // vtable is like special purpose array of pointer-length types with a fixed
    // format where the three first values contains some general information like
    // a pointer to drop and the length and data alignment of `data`.
    let vtable = vec![
        0,                  // pointer to `Drop` (which we're not implementing here)
        size_of::<Data>(),  // length of data
        align_of::<Data>(), // alignment of data

        // we need to make sure we add these in the same order as defined in the Trait.
        add as usize, // function pointer - try changing the order of `add`
        sub as usize, // function pointer - and `sub` to see what happens
        mul as usize, // function pointer
    ];

    let fat_pointer = FatPointer { data: &mut data, vtable: vtable.as_ptr()};
    let test = unsafe { std::mem::transmute::<FatPointer, &dyn Test>(fat_pointer) };

    // And voalá, it's now a trait object we can call methods on
    println!("Add: 3 + 2 = {}", test.add());
    println!("Sub: 3 - 2 = {}", test.sub());
    println!("Mul: 3 * 2 = {}", test.mul());
}

Later on, when we implement our own Waker we'll actually set up a vtable like we do here. The way we create it is slightly different, but now that you know how regular trait objects work you will probably recognize what we're doing which makes it much less mysterious.

Bonus section

You might wonder why the Waker was implemented like this and not just as a normal trait?

The reason is flexibility. Implementing the Waker the way we do here gives a lot of flexibility of choosing what memory management scheme to use.

The "normal" way is by using an Arc to use reference count keep track of when a Waker object can be dropped. However, this is not the only way, you could also use purely global functions and state, or any other way you wish.

This leaves a lot of options on the table for runtime implementors.

Generators and async/await

Overview:

  • Understand how the async/await syntax works under the hood
  • See first hand why we need Pin
  • Understand what makes Rust's async model very memory efficient

The motivation for Generators can be found in RFC#2033. It's very well written and I can recommend reading through it (it talks as much about async/await as it does about generators).

Why learn about generators?

Generators/yield and async/await are so similar that once you understand one you should be able to understand the other.

It's much easier for me to provide runnable and short examples using Generators instead of Futures which require us to introduce a lot of concepts now that we'll cover later just to show an example.

Async/await works like generators but instead of returning a generator it returns a special object implementing the Future trait.

A small bonus is that you'll have a pretty good introduction to both Generators and Async/Await by the end of this chapter.

Basically, there were three main options discussed when designing how Rust would handle concurrency:

  1. Stackful coroutines, better known as green threads.
  2. Using combinators.
  3. Stackless coroutines, better known as generators.

We covered green threads in the background information so we won't repeat that here. We'll concentrate on the variants of stackless coroutines which Rust uses today.

Combinators

Futures 0.1 used combinators. If you've worked with Promises in JavaScript, you already know combinators. In Rust they look like this:

let future = Connection::connect(conn_str).and_then(|conn| {
    conn.query("somerequest").map(|row|{
        SomeStruct::from(row)
    }).collect::<Vec<SomeStruct>>()
});

let rows: Result<Vec<SomeStruct>, SomeLibraryError> = block_on(future);

There are mainly three downsides I'll focus on using this technique:

  1. The error messages produced could be extremely long and arcane
  2. Not optimal memory usage
  3. Did not allow borrowing across combinator steps.

Point #3, is actually a major drawback with Futures 0.1.

Not allowing borrows across suspension points ends up being very un-ergonomic and to accomplish some tasks it requires extra allocations or copying which is inefficient.

The reason for the higher than optimal memory usage is that this is basically a callback-based approach, where each closure stores all the data it needs for computation. This means that as we chain these, the memory required to store the needed state increases with each added step.

Stackless coroutines/generators

This is the model used in Rust today. It has a few notable advantages:

  1. It's easy to convert normal Rust code to a stackless coroutine using async/await as keywords (it can even be done using a macro).
  2. No need for context switching and saving/restoring CPU state
  3. No need to handle dynamic stack allocation
  4. Very memory efficient
  5. Allows us to borrow across suspension points

The last point is in contrast to Futures 0.1. With async/await we can do this:

async fn myfn() {
    let text = String::from("Hello world");
    let borrowed = &text[0..5];
    somefuture.await;
    println!("{}", borrowed);
}

Async in Rust is implemented using Generators. So to understand how async really works we need to understand generators first. Generators in Rust are implemented as state machines.

The memory footprint of a chain of computations is defined by the largest footprint that a single step requires.

That means that adding steps to a chain of computations might not require any increased memory at all and it's one of the reasons why Futures and Async in Rust has very little overhead.

How generators work

In Nightly Rust today you can use the yield keyword. Basically using this keyword in a closure, converts it to a generator. A closure could look like this before we had a concept of Pin:

#![feature(generators, generator_trait)]
use std::ops::{Generator, GeneratorState};

fn main() {
    let a: i32 = 4;
    let mut gen = move || {
        println!("Hello");
        yield a * 2;
        println!("world!");
    };

    if let GeneratorState::Yielded(n) = gen.resume() {
        println!("Got value {}", n);
    }

    if let GeneratorState::Complete(()) = gen.resume() {
        ()
    };
}

Early on, before there was a consensus about the design of Pin, this compiled to something looking similar to this:

fn main() {
    let mut gen = GeneratorA::start(4);

    if let GeneratorState::Yielded(n) = gen.resume() {
        println!("Got value {}", n);
    }

    if let GeneratorState::Complete(()) = gen.resume() {
        ()
    };
}

// If you've ever wondered why the parameters are called Y and R the naming from
// the original rfc most likely holds the answer
enum GeneratorState<Y, R> {
    Yielded(Y),  // originally called `Yield(Y)`
    Complete(R), // originally called `Return(R)`
}

trait Generator {
    type Yield;
    type Return;
    fn resume(&mut self) -> GeneratorState<Self::Yield, Self::Return>;
}

enum GeneratorA {
    Enter(i32),
    Yield1(i32),
    Exit,
}

impl GeneratorA {
    fn start(a1: i32) -> Self {
        GeneratorA::Enter(a1)
    }
}

impl Generator for GeneratorA {
    type Yield = i32;
    type Return = ();
    fn resume(&mut self) -> GeneratorState<Self::Yield, Self::Return> {
        // lets us get ownership over current state
        match std::mem::replace(self, GeneratorA::Exit) {
            GeneratorA::Enter(a1) => {

          /*----code before yield----*/
                println!("Hello");
                let a = a1 * 2;

                *self = GeneratorA::Yield1(a);
                GeneratorState::Yielded(a)
            }

            GeneratorA::Yield1(_) => {
          /*-----code after yield-----*/
                println!("world!");

                *self = GeneratorA::Exit;
                GeneratorState::Complete(())
            }
            GeneratorA::Exit => panic!("Can't advance an exited generator!"),
        }
    }
}

The yield keyword was discussed first in RFC#1823 and in RFC#1832.

Now that you know that the yield keyword in reality rewrites your code to become a state machine, you'll also know the basics of how await works. It's very similar.

Now, there are some limitations in our naive state machine above. What happens when you have a borrow across a yield point?

We could forbid that, but one of the major design goals for the async/await syntax has been to allow this. These kinds of borrows were not possible using Futures 0.1 so we can't let this limitation just slip and call it a day yet.

Instead of discussing it in theory, let's look at some code.

We'll use the optimized version of the state machines which is used in Rust today. For a more in depth explanation see Tyler Mandry's excellent article: How Rust optimizes async/await

let mut generator = move || {
        let to_borrow = String::from("Hello");
        let borrowed = &to_borrow;
        yield borrowed.len();
        println!("{} world!", borrowed);
    };

We'll be hand-coding some versions of a state-machines representing a state machine for the generator defined above.

We step through each step "manually" in every example, so it looks pretty unfamiliar. We could add some syntactic sugar like implementing the Iterator trait for our generators which would let us do this:

while let Some(val) = generator.next() {
    println!("{}", val);
}

It's a pretty trivial change to make, but this chapter is already getting long. Just keep this in the back of your head as we move forward.

Now what does our rewritten state machine look like with this example?

#![allow(unused)]
fn main() {
enum GeneratorState<Y, R> {
    Yielded(Y),
    Complete(R),
}

trait Generator {
    type Yield;
    type Return;
    fn resume(&mut self) -> GeneratorState<Self::Yield, Self::Return>;
}

enum GeneratorA {
    Enter,
    Yield1 {
        to_borrow: String,
        borrowed: &String, // uh, what lifetime should this have?
    },
    Exit,
}

impl GeneratorA {
    fn start() -> Self {
        GeneratorA::Enter
    }
}

impl Generator for GeneratorA {
    type Yield = usize;
    type Return = ();
    fn resume(&mut self) -> GeneratorState<Self::Yield, Self::Return> {
        // lets us get ownership over current state
        match std::mem::replace(self, GeneratorA::Exit) {
            GeneratorA::Enter => {
                let to_borrow = String::from("Hello");
                let borrowed = &to_borrow; // <--- NB!
                let res = borrowed.len();

                *self = GeneratorA::Yield1 {to_borrow, borrowed};
                GeneratorState::Yielded(res)
            }

            GeneratorA::Yield1 {to_borrow, borrowed} => {
                println!("Hello {}", borrowed);
                *self = GeneratorA::Exit;
                GeneratorState::Complete(())
            }
            GeneratorA::Exit => panic!("Can't advance an exited generator!"),
        }
    }
}
}

If you try to compile this you'll get an error (just try it yourself by pressing play).

What is the lifetime of &String. It's not the same as the lifetime of Self. It's not static. Turns out that it's not possible for us in Rust's syntax to describe this lifetime, which means, that to make this work, we'll have to let the compiler know that we control this correctly ourselves.

That means turning to unsafe.

Let's try to write an implementation that will compile using unsafe. As you'll see we end up in a self-referential struct. A struct which holds references into itself.

As you'll notice, this compiles just fine!

#![allow(unused)]
fn main() {
enum GeneratorState<Y, R> {
    Yielded(Y),
    Complete(R),
}

trait Generator {
    type Yield;
    type Return;
    fn resume(&mut self) -> GeneratorState<Self::Yield, Self::Return>;
}

enum GeneratorA {
    Enter,
    Yield1 {
        to_borrow: String,
        borrowed: *const String, // NB! This is now a raw pointer!
    },
    Exit,
}

impl GeneratorA {
    fn start() -> Self {
        GeneratorA::Enter
    }
}
impl Generator for GeneratorA {
    type Yield = usize;
    type Return = ();
    fn resume(&mut self) -> GeneratorState<Self::Yield, Self::Return> {
            match self {
            GeneratorA::Enter => {
                let to_borrow = String::from("Hello");
                let borrowed = &to_borrow;
                let res = borrowed.len();
                *self = GeneratorA::Yield1 {to_borrow, borrowed: std::ptr::null()};

                // NB! And we set the pointer to reference the to_borrow string here
                if let GeneratorA::Yield1 {to_borrow, borrowed} = self {
                    *borrowed = to_borrow;
                }

                GeneratorState::Yielded(res)
            }

            GeneratorA::Yield1 {borrowed, ..} => {
                let borrowed: &String = unsafe {&**borrowed};
                println!("{} world", borrowed);
                *self = GeneratorA::Exit;
                GeneratorState::Complete(())
            }
            GeneratorA::Exit => panic!("Can't advance an exited generator!"),
        }
    }
}
}

Remember that our example is the generator we created which looked like this:

let mut gen = move || {
        let to_borrow = String::from("Hello");
        let borrowed = &to_borrow;
        yield borrowed.len();
        println!("{} world!", borrowed);
    };

Below is an example of how we could run this state-machine and as you see it does what we'd expect. But there is still one huge problem with this:

pub fn main() {
    let mut gen = GeneratorA::start();
    let mut gen2 = GeneratorA::start();

    if let GeneratorState::Yielded(n) = gen.resume() {
        println!("Got value {}", n);
    }

    if let GeneratorState::Yielded(n) = gen2.resume() {
        println!("Got value {}", n);
    }

    if let GeneratorState::Complete(()) = gen.resume() {
        ()
    };
}
enum GeneratorState<Y, R> {
    Yielded(Y),
    Complete(R),
}

trait Generator {
    type Yield;
    type Return;
    fn resume(&mut self) -> GeneratorState<Self::Yield, Self::Return>;
}

enum GeneratorA {
    Enter,
    Yield1 {
        to_borrow: String,
        borrowed: *const String,
    },
    Exit,
}

impl GeneratorA {
    fn start() -> Self {
        GeneratorA::Enter
    }
}
impl Generator for GeneratorA {
    type Yield = usize;
    type Return = ();
    fn resume(&mut self) -> GeneratorState<Self::Yield, Self::Return> {
            match self {
            GeneratorA::Enter => {
                let to_borrow = String::from("Hello");
                let borrowed = &to_borrow;
                let res = borrowed.len();
                *self = GeneratorA::Yield1 {to_borrow, borrowed: std::ptr::null()};

                // We set the self-reference here
                if let GeneratorA::Yield1 {to_borrow, borrowed} = self {
                    *borrowed = to_borrow;
                }

                GeneratorState::Yielded(res)
            }

            GeneratorA::Yield1 {borrowed, ..} => {
                let borrowed: &String = unsafe {&**borrowed};
                println!("{} world", borrowed);
                *self = GeneratorA::Exit;
                GeneratorState::Complete(())
            }
            GeneratorA::Exit => panic!("Can't advance an exited generator!"),
        }
    }
}

The problem is that in safe Rust we can still do this:

Run the code and compare the results. Do you see the problem?

#![feature(never_type)] // Force nightly compiler to be used in playground
// by betting on it's true that this type is named after it's stabilization date...
pub fn main() {
    let mut gen = GeneratorA::start();
    let mut gen2 = GeneratorA::start();

    if let GeneratorState::Yielded(n) = gen.resume() {
        println!("Got value {}", n);
    }

    std::mem::swap(&mut gen, &mut gen2); // <--- Big problem!

    if let GeneratorState::Yielded(n) = gen2.resume() {
        println!("Got value {}", n);
    }

    // This would now start gen2 since we swapped them.
    if let GeneratorState::Complete(()) = gen.resume() {
        ()
    };
}
enum GeneratorState<Y, R> {
    Yielded(Y),
    Complete(R),
}

trait Generator {
    type Yield;
    type Return;
    fn resume(&mut self) -> GeneratorState<Self::Yield, Self::Return>;
}

enum GeneratorA {
    Enter,
    Yield1 {
        to_borrow: String,
        borrowed: *const String,
    },
    Exit,
}

impl GeneratorA {
    fn start() -> Self {
        GeneratorA::Enter
    }
}
impl Generator for GeneratorA {
    type Yield = usize;
    type Return = ();
    fn resume(&mut self) -> GeneratorState<Self::Yield, Self::Return> {
            match self {
            GeneratorA::Enter => {
                let to_borrow = String::from("Hello");
                let borrowed = &to_borrow;
                let res = borrowed.len();
                *self = GeneratorA::Yield1 {to_borrow, borrowed: std::ptr::null()};

                // We set the self-reference here
                if let GeneratorA::Yield1 {to_borrow, borrowed} = self {
                    *borrowed = to_borrow;
                }

                GeneratorState::Yielded(res)
            }

            GeneratorA::Yield1 {borrowed, ..} => {
                let borrowed: &String = unsafe {&**borrowed};
                println!("{} world", borrowed);
                *self = GeneratorA::Exit;
                GeneratorState::Complete(())
            }
            GeneratorA::Exit => panic!("Can't advance an exited generator!"),
        }
    }
}

Wait? What happened to "Hello"? And why did our code segfault?

Turns out that while the example above compiles just fine, we expose consumers of this API to both possible undefined behavior and other memory errors while using just safe Rust. This is a big problem!

I've actually forced the code above to use the nightly version of the compiler. If you run the example above on the playground, you'll see that it runs without panicking on the current stable (1.42.0) but panics on the current nightly (1.44.0). Scary!

We'll explain exactly what happened here using a slightly simpler example in the next chapter and we'll fix our generator using Pin so don't worry, you'll see exactly what goes wrong and see how Pin can help us deal with self-referential types safely in a second.

Before we go and explain the problem in detail, let's finish off this chapter by looking at how generators and the async keyword is related.

Async and generators

Futures in Rust are implemented as state machines much the same way Generators are state machines.

You might have noticed the similarities in the syntax used in async blocks and the syntax used in generators:

let mut gen = move || {
        let to_borrow = String::from("Hello");
        let borrowed = &to_borrow;
        yield borrowed.len();
        println!("{} world!", borrowed);
    };

Compare that with a similar example using async blocks:

let mut fut = async {
        let to_borrow = String::from("Hello");
        let borrowed = &to_borrow;
        SomeResource::some_task().await;
        println!("{} world!", borrowed);
    };

The difference is that Futures have different states than what a Generator would have.

An async block will return a Future instead of a Generator, however, the way a Future works and the way a Generator work internally is similar.

Instead of calling Generator::resume we call Future::poll, and instead of returning Yielded or Complete it returns Pending or Ready. Each await point in a future is like a yield point in a generator.

Do you see how they're connected now?

Thats why knowing how generators work and the challenges they pose also teaches you how futures work and the challenges we need to tackle when working with them.

The same goes for the challenges of borrowing across yield/await points.

Bonus section - self referential generators in Rust today

Thanks to PR#45337 you can actually run code like the one in our example in Rust today using the static keyword on nightly. Try it for yourself:

Beware that the API is changing rapidly. As I was writing this book, generators had an API change adding support for a "resume" argument to get passed into the generator closure.

Follow the progress on the tracking issue #4312 for RFC#033.

#![feature(generators, generator_trait)]
use std::ops::{Generator, GeneratorState};


pub fn main() {
    let gen1 = static || {
        let to_borrow = String::from("Hello");
        let borrowed = &to_borrow;
        yield borrowed.len();
        println!("{} world!", borrowed);
    };

    let gen2 = static || {
        let to_borrow = String::from("Hello");
        let borrowed = &to_borrow;
        yield borrowed.len();
        println!("{} world!", borrowed);
    };

    let mut pinned1 = Box::pin(gen1);
    let mut pinned2 = Box::pin(gen2);

    if let GeneratorState::Yielded(n) = pinned1.as_mut().resume(()) {
        println!("Gen1 got value {}", n);
    }

    if let GeneratorState::Yielded(n) = pinned2.as_mut().resume(()) {
        println!("Gen2 got value {}", n);
    };

    let _ = pinned1.as_mut().resume(());
    let _ = pinned2.as_mut().resume(());
}

Pin

Overview

  1. Learn how to use Pin and why it's required when implementing your own Future
  2. Understand how to make self-referential types safe to use in Rust
  3. Learn how borrowing across await points is accomplished
  4. Get a set of practical rules to help you work with Pin

Pin was suggested in RFC#2349

Let's jump straight to it. Pinning is one of those subjects which is hard to wrap your head around in the start, but once you unlock a mental model for it it gets significantly easier to reason about.

Definitions

Pin wraps a pointer. A reference to an object is a pointer. Pin gives some guarantees about the pointee (the data it points to) which we'll explore further in this chapter.

Pin consists of the Pin type and the Unpin marker. Pin's purpose in life is to govern the rules that need to apply for types which implement !Unpin.

Yep, you're right, that's double negation right there. !Unpin means "not-un-pin".

This naming scheme is one of Rust's safety features where it deliberately tests if you're too tired to safely implement a type with this marker. If you're starting to get confused, or even angry, by !Unpin it's a good sign that it's time to lay down the work and start over tomorrow with a fresh mind.

On a more serious note, I feel obliged to mention that there are valid reasons for the names that were chosen. Naming is not easy, and I considered renaming Unpin and !Unpin in this book to make them easier to reason about.

However, an experienced member of the Rust community convinced me that there are just too many nuances and edge-cases to consider which are easily overlooked when naively giving these markers different names, and I'm convinced that we'll just have to get used to them and use them as is.

If you want to you can read a bit of the discussion from the internals thread.

Pinning and self-referential structs

Let's start where we left off in the last chapter by making the problem we saw using a self-references in our generator a lot simpler by making some self-referential structs that are easier to reason about than our state machines:

For now our example will look like this:

use std::pin::Pin;

#[derive(Debug)]
struct Test {
    a: String,
    b: *const String,
}

impl Test {
    fn new(txt: &str) -> Self {
        Test {
            a: String::from(txt),
            b: std::ptr::null(),
        }
    }

    fn init(&mut self) {
        let self_ref: *const String = &self.a;
        self.b = self_ref;
    }

    fn a(&self) -> &str {
        &self.a
    }

    fn b(&self) -> &String {
        unsafe {&*(self.b)}
    }
}

Let's walk through this example since we'll be using it the rest of this chapter.

We have a self-referential struct Test. Test needs an init method to be created which is strange but we'll need that to keep this example as short as possible.

Test provides two methods to get a reference to the value of the fields a and b. Since b is a reference to a we store it as a pointer since the borrowing rules of Rust doesn't allow us to define this lifetime.

Now, let's use this example to explain the problem we encounter in detail. As you see, this works as expected:

fn main() {
    let mut test1 = Test::new("test1");
    test1.init();
    let mut test2 = Test::new("test2");
    test2.init();

    println!("a: {}, b: {}", test1.a(), test1.b());
    println!("a: {}, b: {}", test2.a(), test2.b());

}
use std::pin::Pin;
#[derive(Debug)]
struct Test {
    a: String,
    b: *const String,
}

impl Test {
    fn new(txt: &str) -> Self {
        let a = String::from(txt);
        Test {
            a,
            b: std::ptr::null(),
        }
    }

    // We need an `init` method to actually set our self-reference
    fn init(&mut self) {
        let self_ref: *const String = &self.a;
        self.b = self_ref;
    }

    fn a(&self) -> &str {
        &self.a
    }

    fn b(&self) -> &String {
        unsafe {&*(self.b)}
    }
}

In our main method we first instantiate two instances of Test and print out the value of the fields on test1. We get what we'd expect:

a: test1, b: test1
a: test2, b: test2

Let's see what happens if we swap the data stored at the memory location test1 with the data stored at the memory location test2 and vice a versa.

fn main() {
    let mut test1 = Test::new("test1");
    test1.init();
    let mut test2 = Test::new("test2");
    test2.init();

    println!("a: {}, b: {}", test1.a(), test1.b());
    std::mem::swap(&mut test1, &mut test2);
    println!("a: {}, b: {}", test2.a(), test2.b());

}
use std::pin::Pin;
#[derive(Debug)]
struct Test {
    a: String,
    b: *const String,
}

impl Test {
    fn new(txt: &str) -> Self {
        let a = String::from(txt);
        Test {
            a,
            b: std::ptr::null(),
        }
    }

    fn init(&mut self) {
        let self_ref: *const String = &self.a;
        self.b = self_ref;
    }

    fn a(&self) -> &str {
        &self.a
    }

    fn b(&self) -> &String {
        unsafe {&*(self.b)}
    }
}

Naively, we could think that what we should get a debug print of test1 two times like this

a: test1, b: test1
a: test1, b: test1

But instead we get:

a: test1, b: test1
a: test1, b: test2

The pointer to test2.b still points to the old location which is inside test1 now. The struct is not self-referential anymore, it holds a pointer to a field in a different object. That means we can't rely on the lifetime of test2.b to be tied to the lifetime of test2 anymore.

If you're still not convinced, this should at least convince you:

fn main() {
    let mut test1 = Test::new("test1");
    test1.init();
    let mut test2 = Test::new("test2");
    test2.init();

    println!("a: {}, b: {}", test1.a(), test1.b());
    std::mem::swap(&mut test1, &mut test2);
    test1.a = "I've totally changed now!".to_string();
    println!("a: {}, b: {}", test2.a(), test2.b());

}
use std::pin::Pin;
#[derive(Debug)]
struct Test {
    a: String,
    b: *const String,
}

impl Test {
    fn new(txt: &str) -> Self {
        let a = String::from(txt);
        Test {
            a,
            b: std::ptr::null(),
        }
    }

    fn init(&mut self) {
        let self_ref: *const String = &self.a;
        self.b = self_ref;
    }

    fn a(&self) -> &str {
        &self.a
    }

    fn b(&self) -> &String {
        unsafe {&*(self.b)}
    }
}

That shouldn't happen. There is no serious error yet, but as you can imagine it's easy to create serious bugs using this code.

I created a diagram to help visualize what's going on:

Fig 2: Before and after swap swap_problem

As you can see this results in unwanted behavior. It's easy to get this to segfault, show UB and fail in other spectacular ways as well.

Pinning to the stack

Now, we can solve this problem by using Pin instead. Let's take a look at what our example would look like then:

use std::pin::Pin;
use std::marker::PhantomPinned;

#[derive(Debug)]
struct Test {
    a: String,
    b: *const String,
    _marker: PhantomPinned,
}


impl Test {
    fn new(txt: &str) -> Self {
        Test {
            a: String::from(txt),
            b: std::ptr::null(),
            _marker: PhantomPinned, // This makes our type `!Unpin`
        }
    }
    fn init<'a>(self: Pin<&'a mut Self>) {
        let self_ptr: *const String = &self.a;
        let this = unsafe { self.get_unchecked_mut() };
        this.b = self_ptr;
    }

    fn a<'a>(self: Pin<&'a Self>) -> &'a str {
        &self.get_ref().a
    }

    fn b<'a>(self: Pin<&'a Self>) -> &'a String {
        unsafe { &*(self.b) }
    }
}

Now, what we've done here is pinning an object to the stack. That will always be unsafe if our type implements !Unpin.

We use the same tricks here, including requiring an init. If we want to fix that and let users avoid unsafe we need to pin our data on the heap instead which we'll show in a second.

Let's see what happens if we run our example now:

pub fn main() {
    // test1 is safe to move before we initialize it
    let mut test1 = Test::new("test1");
    // Notice how we shadow `test1` to prevent it from being accessed again
    let mut test1 = unsafe { Pin::new_unchecked(&mut test1) };
    Test::init(test1.as_mut());

    let mut test2 = Test::new("test2");
    let mut test2 = unsafe { Pin::new_unchecked(&mut test2) };
    Test::init(test2.as_mut());

    println!("a: {}, b: {}", Test::a(test1.as_ref()), Test::b(test1.as_ref()));
    println!("a: {}, b: {}", Test::a(test2.as_ref()), Test::b(test2.as_ref()));
}
use std::pin::Pin;
use std::marker::PhantomPinned;

#[derive(Debug)]
struct Test {
    a: String,
    b: *const String,
    _marker: PhantomPinned,
}


impl Test {
    fn new(txt: &str) -> Self {
        let a = String::from(txt);
        Test {
            a,
            b: std::ptr::null(),
            // This makes our type `!Unpin`
            _marker: PhantomPinned,
        }
    }
    fn init<'a>(self: Pin<&'a mut Self>) {
        let self_ptr: *const String = &self.a;
        let this = unsafe { self.get_unchecked_mut() };
        this.b = self_ptr;
    }

    fn a<'a>(self: Pin<&'a Self>) -> &'a str {
        &self.get_ref().a
    }

    fn b<'a>(self: Pin<&'a Self>) -> &'a String {
        unsafe { &*(self.b) }
    }
}

Now, if we try to pull the same trick which got us in to trouble the last time you'll get a compilation error.

pub fn main() {
    let mut test1 = Test::new("test1");
    let mut test1 = unsafe { Pin::new_unchecked(&mut test1) };
    Test::init(test1.as_mut());

    let mut test2 = Test::new("test2");
    let mut test2 = unsafe { Pin::new_unchecked(&mut test2) };
    Test::init(test2.as_mut());

    println!("a: {}, b: {}", Test::a(test1.as_ref()), Test::b(test1.as_ref()));
    std::mem::swap(test1.get_mut(), test2.get_mut());
    println!("a: {}, b: {}", Test::a(test2.as_ref()), Test::b(test2.as_ref()));
}
use std::pin::Pin;
use std::marker::PhantomPinned;

#[derive(Debug)]
struct Test {
    a: String,
    b: *const String,
    _marker: PhantomPinned,
}


impl Test {
    fn new(txt: &str) -> Self {
        Test {
            a: String::from(txt),
            b: std::ptr::null(),
            _marker: PhantomPinned, // This makes our type `!Unpin`
        }
    }
    fn init<'a>(self: Pin<&'a mut Self>) {
        let self_ptr: *const String = &self.a;
        let this = unsafe { self.get_unchecked_mut() };
        this.b = self_ptr;
    }

    fn a<'a>(self: Pin<&'a Self>) -> &'a str {
        &self.get_ref().a
    }

    fn b<'a>(self: Pin<&'a Self>) -> &'a String {
        unsafe { &*(self.b) }
    }
}

As you see from the error you get by running the code the type system prevents us from swapping the pinned pointers.

It's important to note that stack pinning will always depend on the current stack frame we're in, so we can't create a self referential object in one stack frame and return it since any pointers we take to "self" are invalidated.

It also puts a lot of responsibility in your hands if you pin an object to the stack. A mistake that is easy to make is, forgetting to shadow the original variable since you could drop the Pin and access the old value after it's initialized like this:

fn main() {
   let mut test1 = Test::new("test1");
   let mut test1_pin = unsafe { Pin::new_unchecked(&mut test1) };
   Test::init(test1_pin.as_mut());
   drop(test1_pin);

   let mut test2 = Test::new("test2");
   mem::swap(&mut test1, &mut test2);
   println!("Not self referential anymore: {:?}", test1.b);
}
use std::pin::Pin;
use std::marker::PhantomPinned;
use std::mem;

#[derive(Debug)]
struct Test {
    a: String,
    b: *const String,
    _marker: PhantomPinned,
}


impl Test {
    fn new(txt: &str) -> Self {
        Test {
            a: String::from(txt),
            b: std::ptr::null(),
            _marker: PhantomPinned, // This makes our type `!Unpin`
        }
    }
    fn init<'a>(self: Pin<&'a mut Self>) {
        let self_ptr: *const String = &self.a;
        let this = unsafe { self.get_unchecked_mut() };
        this.b = self_ptr;
    }

    fn a<'a>(self: Pin<&'a Self>) -> &'a str {
        &self.get_ref().a
    }

    fn b<'a>(self: Pin<&'a Self>) -> &'a String {
        unsafe { &*(self.b) }
    }
}

Pinning to the heap

For completeness let's remove some unsafe and the need for an init method at the cost of a heap allocation. Pinning to the heap is safe so the user doesn't need to implement any unsafe code:

use std::pin::Pin;
use std::marker::PhantomPinned;

#[derive(Debug)]
struct Test {
    a: String,
    b: *const String,
    _marker: PhantomPinned,
}

impl Test {
    fn new(txt: &str) -> Pin<Box<Self>> {
        let t = Test {
            a: String::from(txt),
            b: std::ptr::null(),
            _marker: PhantomPinned,
        };
        let mut boxed = Box::pin(t);
        let self_ptr: *const String = &boxed.as_ref().a;
        unsafe { boxed.as_mut().get_unchecked_mut().b = self_ptr };

        boxed
    }

    fn a<'a>(self: Pin<&'a Self>) -> &'a str {
        &self.get_ref().a
    }

    fn b<'a>(self: Pin<&'a Self>) -> &'a String {
        unsafe { &*(self.b) }
    }
}

pub fn main() {
    let mut test1 = Test::new("test1");
    let mut test2 = Test::new("test2");

    println!("a: {}, b: {}", test1.as_ref().a(), test1.as_ref().b());
    println!("a: {}, b: {}", test2.as_ref().a(), test2.as_ref().b());
}

The fact that it's safe to pin heap allocated data even if it is !Unpin makes sense. Once the data is allocated on the heap it will have a stable address.

There is no need for us as users of the API to take special care and ensure that the self-referential pointer stays valid.

There are ways to safely give some guarantees on stack pinning as well, but right now you need to use a crate like pin_project to do that.

Practical rules for Pinning

  1. If T: Unpin (which is the default), then Pin<'a, T> is entirely equivalent to &'a mut T. in other words: Unpin means it's OK for this type to be moved even when pinned, so Pin will have no effect on such a type.

  2. Getting a &mut T to a pinned T requires unsafe if T: !Unpin. In other words: requiring a pinned pointer to a type which is !Unpin prevents the user of that API from moving that value unless they choose to write unsafe code.

  3. Pinning does nothing special with memory allocation like putting it into some "read only" memory or anything fancy. It only uses the type system to prevent certain operations on this value.

  4. Most standard library types implement Unpin. The same goes for most "normal" types you encounter in Rust. Futures and Generators are two exceptions.

  5. The main use case for Pin is to allow self referential types, the whole justification for stabilizing them was to allow that.

  6. The implementation behind objects that are !Unpin is most likely unsafe. Moving such a type after it has been pinned can cause the universe to crash. As of the time of writing this book, creating and reading fields of a self referential struct still requires unsafe (the only way to do it is to create a struct containing raw pointers to itself).

  7. You can add a !Unpin bound on a type on nightly with a feature flag, or by adding std::marker::PhantomPinned to your type on stable.

  8. You can either pin an object to the stack or to the heap.

  9. Pinning a !Unpin object to the stack requires unsafe

  10. Pinning a !Unpin object to the heap does not require unsafe. There is a shortcut for doing this using Box::pin.

Unsafe code does not mean it's literally "unsafe", it only relieves the guarantees you normally get from the compiler. An unsafe implementation can be perfectly safe to do, but you have no safety net.

Projection/structural pinning

In short, projection is a programming language term. mystruct.field1 is a projection. Structural pinning is using Pin on fields. This has several caveats and is not something you'll normally see so I refer to the documentation for that.

Pin and Drop

The Pin guarantee exists from the moment the value is pinned until it's dropped. In the Drop implementation you take a mutable reference to self, which means extra care must be taken when implementing Drop for pinned types.

Putting it all together

This is exactly what we'll do when we implement our own Future, so stay tuned, we're soon finished.

Bonus section: Fixing our self-referential generator and learning more about Pin

But now, let's prevent this problem using Pin. I've commented along the way to make it easier to spot and understand the changes we need to make.

#![feature(auto_traits, negative_impls)] // needed to implement `!Unpin`
use std::pin::Pin;

pub fn main() {
    let gen1 = GeneratorA::start();
    let gen2 = GeneratorA::start();
    // Before we pin the data, this is safe to do
    // std::mem::swap(&mut gen, &mut gen2);

    // constructing a `Pin::new()` on a type which does not implement `Unpin` is
    // unsafe. An object pinned to heap can be constructed while staying in safe
    // Rust so we can use that to avoid unsafe. You can also use crates like
    // `pin_utils` to pin to the stack safely, just remember that they use
    // unsafe under the hood so it's like using an already-reviewed unsafe
    // implementation.

    let mut pinned1 = Box::pin(gen1);
    let mut pinned2 = Box::pin(gen2);

    // Uncomment these if you think it's safe to pin the values to the stack instead
    // (it is in this case). Remember to comment out the two previous lines first.
    //let mut pinned1 = unsafe { Pin::new_unchecked(&mut gen1) };
    //let mut pinned2 = unsafe { Pin::new_unchecked(&mut gen2) };

    if let GeneratorState::Yielded(n) = pinned1.as_mut().resume() {
        println!("Gen1 got value {}", n);
    }

    if let GeneratorState::Yielded(n) = pinned2.as_mut().resume() {
        println!("Gen2 got value {}", n);
    };

    // This won't work:
    // std::mem::swap(&mut gen, &mut gen2);
    // This will work but will just swap the pointers so nothing bad happens here:
    // std::mem::swap(&mut pinned1, &mut pinned2);

    let _ = pinned1.as_mut().resume();
    let _ = pinned2.as_mut().resume();
}

enum GeneratorState<Y, R> {
    Yielded(Y),
    Complete(R),
}

trait Generator {
    type Yield;
    type Return;
    fn resume(self: Pin<&mut Self>) -> GeneratorState<Self::Yield, Self::Return>;
}

enum GeneratorA {
    Enter,
    Yield1 {
        to_borrow: String,
        borrowed: *const String,
    },
    Exit,
}

impl GeneratorA {
    fn start() -> Self {
        GeneratorA::Enter
    }
}

// This tells us that this object is not safe to move after pinning.
// In this case, only we as implementors "feel" this, however, if someone is
// relying on our Pinned data this will prevent them from moving it. You need
// to enable the feature flag `#![feature(optin_builtin_traits)]` and use the
// nightly compiler to implement `!Unpin`. Normally, you would use
// `std::marker::PhantomPinned` to indicate that the struct is `!Unpin`.
impl !Unpin for GeneratorA { }

impl Generator for GeneratorA {
    type Yield = usize;
    type Return = ();
    fn resume(self: Pin<&mut Self>) -> GeneratorState<Self::Yield, Self::Return> {
        // lets us get ownership over current state
        let this = unsafe { self.get_unchecked_mut() };
            match this {
            GeneratorA::Enter => {
                let to_borrow = String::from("Hello");
                let borrowed = &to_borrow;
                let res = borrowed.len();
                *this = GeneratorA::Yield1 {to_borrow, borrowed: std::ptr::null()};

                // Trick to actually get a self reference. We can't reference
                // the `String` earlier since these references will point to the
                // location in this stack frame which will not be valid anymore
                // when this function returns.
                if let GeneratorA::Yield1 {to_borrow, borrowed} = this {
                    *borrowed = to_borrow;
                }

                GeneratorState::Yielded(res)
            }

            GeneratorA::Yield1 {borrowed, ..} => {
                let borrowed: &String = unsafe {&**borrowed};
                println!("{} world", borrowed);
                *this = GeneratorA::Exit;
                GeneratorState::Complete(())
            }
            GeneratorA::Exit => panic!("Can't advance an exited generator!"),
        }
    }
}

Now, as you see, the consumer of this API must either:

  1. Box the value and thereby allocating it on the heap
  2. Use unsafe and pin the value to the stack. The user knows that if they move the value afterwards it will violate the guarantee they promise to uphold when they did their unsafe implementation.

Hopefully, after this you'll have an idea of what happens when you use the yield or await keywords inside an async function, and why we need Pin if we want to be able to safely borrow across yield/await points.

Implementing Futures - main example

We'll create our own Futures together with a fake reactor and a simple executor which allows you to edit, run an play around with the code right here in your browser.

I'll walk you through the example, but if you want to check it out closer, you can always clone the repository and play around with the code yourself or just copy it from the next chapter.

There are several branches explained in the readme, but two are relevant for this chapter. The main branch is the example we go through here, and the basic_example_commented branch is this example with extensive comments.

If you want to follow along as we go through, initialize a new cargo project by creating a new folder and run cargo init inside it. Everything we write here will be in main.rs

Implementing our own Futures

Let's start off by getting all our imports right away so you can follow along

use std::{
    future::Future, pin::Pin, sync::{ mpsc::{channel, Sender}, Arc, Mutex,},
    task::{Context, Poll, RawWaker, RawWakerVTable, Waker}, mem,
    thread::{self, JoinHandle}, time::{Duration, Instant}, collections::HashMap
};

The Executor

The executors responsibility is to take one or more futures and run them to completion.

The first thing an executor does when it gets a Future is polling it.

When polled one of three things can happen:

  • The future returns Ready and we schedule whatever chained operations to run
  • The future hasn't been polled before so we pass it a Waker and suspend it
  • The futures has been polled before but is not ready and returns Pending

Rust provides a way for the Reactor and Executor to communicate through the Waker. The reactor stores this Waker and calls Waker::wake() on it once a Future has resolved and should be polled again.

Notice that this chapter has a bonus section called A Proper Way to Park our Thread which shows how to avoid thread::park.

Our Executor will look like this:

// Our executor takes any object which implements the `Future` trait
fn block_on<F: Future>(mut future: F) -> F::Output {

    // the first thing we do is to construct a `Waker` which we'll pass on to
    // the `reactor` so it can wake us up when an event is ready.
    let mywaker = Arc::new(MyWaker{ thread: thread::current() });
    let waker = mywaker_into_waker(Arc::into_raw(mywaker));

    // The context struct is just a wrapper for a `Waker` object. Maybe in the
    // future this will do more, but right now it's just a wrapper.
    let mut cx = Context::from_waker(&waker);

    // So, since we run this on one thread and run one future to completion
    // we can pin the `Future` to the stack. This is unsafe, but saves an
    // allocation. We could `Box::pin` it too if we wanted. This is however
    // safe since we shadow `future` so it can't be accessed again and will
    // not move until it's dropped.
    let mut future = unsafe { Pin::new_unchecked(&mut future) };

    // We poll in a loop, but it's not a busy loop. It will only run when
    // an event occurs, or a thread has a "spurious wakeup" (an unexpected wakeup
    // that can happen for no good reason).
    let val = loop {
        match Future::poll(future.as_mut(), &mut cx) {

            // when the Future is ready we're finished
            Poll::Ready(val) => break val,

            // If we get a `pending` future we just go to sleep...
            Poll::Pending => thread::park(),
        };
    };
    val
}

In all the examples you'll see in this chapter I've chosen to comment the code extensively. I find it easier to follow along that way so I'll not repeat myself here and focus only on some important aspects that might need further explanation.

It's worth noting that simply calling thread::park as we do here can lead to both deadlocks and errors. We'll explain a bit more later and fix this if you read all the way to the Bonus Section at the end of this chapter.

For now, we keep it as simple and easy to understand as we can by just going to sleep.

Now that you've read so much about Generators and Pin already this should be rather easy to understand. Future is a state machine, every await point is a yield point. We could borrow data across await points and we meet the exact same challenges as we do when borrowing across yield points.

Context is just a wrapper around the Waker. At the time of writing this book it's nothing more. In the future it might be possible that the Context object will do more than just wrapping a Waker so having this extra abstraction gives some flexibility.

As explained in the chapter about Pin, we use Pin and the guarantees that give us to allow Futures to have self references.

The Future implementation

Futures has a well defined interface, which means they can be used across the entire ecosystem.

We can chain these Futures so that once a leaf-future is ready we'll perform a set of operations until either the task is finished or we reach yet another leaf-future which we'll wait for and yield control to the scheduler.

Our Future implementation looks like this:

// This is the definition of our `Waker`. We use a regular thread-handle here.
// It works but it's not a good solution. It's easy to fix though, I'll explain
// after this code snippet.
#[derive(Clone)]
struct MyWaker {
    thread: thread::Thread,
}

// This is the definition of our `Future`. It keeps all the information we
// need. This one holds a reference to our `reactor`, that's just to make
// this example as easy as possible. It doesn't need to hold a reference to
// the whole reactor, but it needs to be able to register itself with the
// reactor.
#[derive(Clone)]
pub struct Task {
    id: usize,
    reactor: Arc<Mutex<Box<Reactor>>>,
    data: u64,
}

// These are function definitions we'll use for our waker. Remember the
// "Trait Objects" chapter earlier.
fn mywaker_wake(s: &MyWaker) {
    let waker_ptr: *const MyWaker = s;
    let waker_arc = unsafe {Arc::from_raw(waker_ptr)};
    waker_arc.thread.unpark();
}

// Since we use an `Arc` cloning is just increasing the refcount on the smart
// pointer.
fn mywaker_clone(s: &MyWaker) -> RawWaker {
    let arc = unsafe { Arc::from_raw(s) };
    std::mem::forget(arc.clone()); // increase ref count
    RawWaker::new(Arc::into_raw(arc) as *const (), &VTABLE)
}

// This is actually a "helper funtcion" to create a `Waker` vtable. In contrast
// to when we created a `Trait Object` from scratch we don't need to concern
// ourselves with the actual layout of the `vtable` and only provide a fixed
// set of functions
const VTABLE: RawWakerVTable = unsafe {
    RawWakerVTable::new(
        |s| mywaker_clone(&*(s as *const MyWaker)),   // clone
        |s| mywaker_wake(&*(s as *const MyWaker)),    // wake
        |s| (*(s as *const MyWaker)).thread.unpark(), // wake by ref (don't decrease refcount)
        |s| drop(Arc::from_raw(s as *const MyWaker)), // decrease refcount
    )
};

// Instead of implementing this on the `MyWaker` object in `impl Mywaker...` we
// just use this pattern instead since it saves us some lines of code.
fn mywaker_into_waker(s: *const MyWaker) -> Waker {
    let raw_waker = RawWaker::new(s as *const (), &VTABLE);
    unsafe { Waker::from_raw(raw_waker) }
}

impl Task {
    fn new(reactor: Arc<Mutex<Box<Reactor>>>, data: u64, id: usize) -> Self {
        Task { id, reactor, data }
    }
}

// This is our `Future` implementation
impl Future for Task {
    type Output = usize;

    // Poll is the what drives the state machine forward and it's the only
    // method we'll need to call to drive futures to completion.
    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {

        // We need to get access the reactor in our `poll` method so we acquire
        // a lock on that.
        let mut r = self.reactor.lock().unwrap();

        // First we check if the task is marked as ready
        if r.is_ready(self.id) {

            // If it's ready we set its state to `Finished`
            *r.tasks.get_mut(&self.id).unwrap() = TaskState::Finished;
            Poll::Ready(self.id)

        // If it isn't finished we check the map we have stored in our Reactor
        // over id's we have registered and see if it's there
        } else if r.tasks.contains_key(&self.id) {

            // This is important. The docs says that on multiple calls to poll,
            // only the Waker from the Context passed to the most recent call
            // should be scheduled to receive a wakeup. That's why we insert
            // this waker into the map (which will return the old one which will
            // get dropped) before we return `Pending`.
            r.tasks.insert(self.id, TaskState::NotReady(cx.waker().clone()));
            Poll::Pending
        } else {

            // If it's not ready, and not in the map it's a new task so we
            // register that with the Reactor and return `Pending`
            r.register(self.data, cx.waker().clone(), self.id);
            Poll::Pending
        }

        // Note that we're holding a lock on the `Mutex` which protects the
        // Reactor all the way until the end of this scope. This means that
        // even if our task were to complete immidiately, it will not be
        // able to call `wake` while we're in our `Poll` method.

        // Since we can make this guarantee, it's now the Executors job to
        // handle this possible race condition where `Wake` is called after
        // `poll` but before our thread goes to sleep.
    }
}

This is mostly pretty straight forward. The confusing part is the strange way we need to construct the Waker, but since we've already created our own trait objects from raw parts, this looks pretty familiar. Actually, it's even a bit easier.

We use an Arc here to pass out a ref-counted borrow of our MyWaker. This is pretty normal, and makes this easy and safe to work with. Cloning a Waker is just increasing the refcount in this case.

Dropping a Waker is as easy as decreasing the refcount. Now, in special cases we could choose to not use an Arc. So this low-level method is there to allow such cases.

Indeed, if we only used Arc there is no reason for us to go through all the trouble of creating our own vtable and a RawWaker. We could just implement a normal trait.

Fortunately, in the future this will probably be possible in the standard library as well. For now, this trait lives in the nursery, but my guess is that this will be a part of the standard library after some maturing.

We choose to pass in a reference to the whole Reactor here. This isn't normal. The reactor will often be a global resource which let's us register interests without passing around a reference.

Why using thread park/unpark is a bad idea for a library

It could deadlock easily since anyone could get a handle to the executor thread and call park/unpark on our thread. I've made an example with comments on the playground that showcases how such an error could occur. You can also read a bit more about this in issue 2010 in the futures crate.

The Reactor

This is the home stretch, and not strictly Future related, but we need one to have an example to run.

Since concurrency mostly makes sense when interacting with the outside world (or at least some peripheral), we need something to actually abstract over this interaction in an asynchronous way.

This is the Reactors job. Most often you'll see reactors in Rust use a library called Mio, which provides non blocking APIs and event notification for several platforms.

The reactor will typically give you something like a TcpStream (or any other resource) which you'll use to create an I/O request. What you get in return is a Future.

If our reactor did some real I/O work our Task in would instead be represent a non-blocking TcpStream which registers interest with the global Reactor. Passing around a reference to the Reactor itself is pretty uncommon but I find it makes reasoning about what's happening easier.

Our example task is a timer that only spawns a thread and puts it to sleep for the number of seconds we specify. The reactor we create here will create a leaf-future representing each timer. In return the Reactor receives a waker which it will call once the task is finished.

To be able to run the code here in the browser there is not much real I/O we can do so just pretend that this is actually represents some useful I/O operation for the sake of this example.

Our Reactor will look like this:

// The different states a task can have in this Reactor
enum TaskState {
    Ready,
    NotReady(Waker),
    Finished,
}

// This is a "fake" reactor. It does no real I/O, but that also makes our
// code possible to run in the book and in the playground
struct Reactor {

    // we need some way of registering a Task with the reactor. Normally this
    // would be an "interest" in an I/O event
    dispatcher: Sender<Event>,
    handle: Option<JoinHandle<()>>,

    // This is a list of tasks
    tasks: HashMap<usize, TaskState>,
}

// This represents the Events we can send to our reactor thread. In this
// example it's only a Timeout or a Close event.
#[derive(Debug)]
enum Event {
    Close,
    Timeout(u64, usize),
}

impl Reactor {

    // We choose to return an atomic reference counted, mutex protected, heap
    // allocated `Reactor`. Just to make it easy to explain... No, the reason
    // we do this is:
    //
    // 1. We know that only thread-safe reactors will be created.
    // 2. By heap allocating it we can obtain a reference to a stable address
    // that's not dependent on the stack frame of the function that called `new`
    fn new() -> Arc<Mutex<Box<Self>>> {
        let (tx, rx) = channel::<Event>();
        let reactor = Arc::new(Mutex::new(Box::new(Reactor {
            dispatcher: tx,
            handle: None,
            tasks: HashMap::new(),
        })));

        // Notice that we'll need to use `weak` reference here. If we don't,
        // our `Reactor` will not get `dropped` when our main thread is finished
        // since we're holding internal references to it.

        // Since we're collecting all `JoinHandles` from the threads we spawn
        // and make sure to join them we know that `Reactor` will be alive
        // longer than any reference held by the threads we spawn here.
        let reactor_clone = Arc::downgrade(&reactor);

        // This will be our Reactor-thread. The Reactor-thread will in our case
        // just spawn new threads which will serve as timers for us.
        let handle = thread::spawn(move || {
            let mut handles = vec![];

            // This simulates some I/O resource
            for event in rx {
                println!("REACTOR: {:?}", event);
                let reactor = reactor_clone.clone();
                match event {
                    Event::Close => break,
                    Event::Timeout(duration, id) => {

                        // We spawn a new thread that will serve as a timer
                        // and will call `wake` on the correct `Waker` once
                        // it's done.
                        let event_handle = thread::spawn(move || {
                            thread::sleep(Duration::from_secs(duration));
                            let reactor = reactor.upgrade().unwrap();
                            reactor.lock().map(|mut r| r.wake(id)).unwrap();
                        });
                        handles.push(event_handle);
                    }
                }
            }

            // This is important for us since we need to know that these
            // threads don't live longer than our Reactor-thread. Our
            // Reactor-thread will be joined when `Reactor` gets dropped.
            handles.into_iter().for_each(|handle| handle.join().unwrap());
        });
        reactor.lock().map(|mut r| r.handle = Some(handle)).unwrap();
        reactor
    }

    // The wake function will call wake on the waker for the task with the
    // corresponding id.
    fn wake(&mut self, id: usize) {
        self.tasks.get_mut(&id).map(|state| {

            // No matter what state the task was in we can safely set it
            // to ready at this point. This lets us get ownership over the
            // the data that was there before we replaced it.
            match mem::replace(state, TaskState::Ready) {
                TaskState::NotReady(waker) => waker.wake(),
                TaskState::Finished => panic!("Called 'wake' twice on task: {}", id),
                _ => unreachable!()
            }
        }).unwrap();
    }

    // Register a new task with the reactor. In this particular example
    // we panic if a task with the same id get's registered twice
    fn register(&mut self, duration: u64, waker: Waker, id: usize) {
        if self.tasks.insert(id, TaskState::NotReady(waker)).is_some() {
            panic!("Tried to insert a task with id: '{}', twice!", id);
        }
        self.dispatcher.send(Event::Timeout(duration, id)).unwrap();
    }

    // We simply checks if a task with this id is in the state `TaskState::Ready`
    fn is_ready(&self, id: usize) -> bool {
        self.tasks.get(&id).map(|state| match state {
            TaskState::Ready => true,
            _ => false,
        }).unwrap_or(false)
    }
}

impl Drop for Reactor {
    fn drop(&mut self) {
        // We send a close event to the reactor so it closes down our reactor-thread.
        // If we don't do that we'll end up waiting forever for new events.
        self.dispatcher.send(Event::Close).unwrap();
        self.handle.take().map(|h| h.join().unwrap()).unwrap();
    }
}

It's a lot of code though, but essentially we just spawn off a new thread and make it sleep for some time which we specify when we create a Task.

Now, let's test our code and see if it works. Since we're sleeping for a couple of seconds here, just give it some time to run.

In the last chapter we have the whole 200 lines in an editable window which you can edit and change the way you like.

use std::{
    future::Future, pin::Pin, sync::{ mpsc::{channel, Sender}, Arc, Mutex,},
    task::{Context, Poll, RawWaker, RawWakerVTable, Waker}, mem,
    thread::{self, JoinHandle}, time::{Duration, Instant}, collections::HashMap
};

fn main() {
    // This is just to make it easier for us to see when our Future was resolved
    let start = Instant::now();

    // Many runtimes create a global `reactor` we pass it as an argument
    let reactor = Reactor::new();

    // We create two tasks:
    // - first parameter is the `reactor`
    // - the second is a timeout in seconds
    // - the third is an `id` to identify the task
    let future1 = Task::new(reactor.clone(), 1, 1);
    let future2 = Task::new(reactor.clone(), 2, 2);

    // an `async` block works the same way as an `async fn` in that it compiles
    // our code into a state machine, `yielding` at every `await` point.
    let fut1 = async {
        let val = future1.await;
        println!("Got {} at time: {:.2}.", val, start.elapsed().as_secs_f32());
    };

    let fut2 = async {
        let val = future2.await;
        println!("Got {} at time: {:.2}.", val, start.elapsed().as_secs_f32());
    };

    // Our executor can only run one and one future, this is pretty normal
    // though. You have a set of operations containing many futures that
    // ends up as a single future that drives them all to completion.
    let mainfut = async {
        fut1.await;
        fut2.await;
    };

    // This executor will block the main thread until the futures are resolved
    block_on(mainfut);
}
// ============================= EXECUTOR ====================================
fn block_on<F: Future>(mut future: F) -> F::Output {
    let mywaker = Arc::new(MyWaker {
        thread: thread::current(),
    });
    let waker = mywaker_into_waker(Arc::into_raw(mywaker));
    let mut cx = Context::from_waker(&waker);

    // SAFETY: we shadow `future` so it can't be accessed again.
    let mut future = unsafe { Pin::new_unchecked(&mut future) };
    let val = loop {
        match Future::poll(future.as_mut(), &mut cx) {
            Poll::Ready(val) => break val,
            Poll::Pending => thread::park(),
        };
    };
    val
}

// ====================== FUTURE IMPLEMENTATION ==============================
#[derive(Clone)]
struct MyWaker {
    thread: thread::Thread,
}

#[derive(Clone)]
pub struct Task {
    id: usize,
    reactor: Arc<Mutex<Box<Reactor>>>,
    data: u64,
}

fn mywaker_wake(s: &MyWaker) {
    let waker_ptr: *const MyWaker = s;
    let waker_arc = unsafe { Arc::from_raw(waker_ptr) };
    waker_arc.thread.unpark();
}

fn mywaker_clone(s: &MyWaker) -> RawWaker {
    let arc = unsafe { Arc::from_raw(s) };
    std::mem::forget(arc.clone()); // increase ref count
    RawWaker::new(Arc::into_raw(arc) as *const (), &VTABLE)
}

const VTABLE: RawWakerVTable = unsafe {
    RawWakerVTable::new(
        |s| mywaker_clone(&*(s as *const MyWaker)),   // clone
        |s| mywaker_wake(&*(s as *const MyWaker)),    // wake
        |s| (*(s as *const MyWaker)).thread.unpark(), // wake by ref (don't decrease refcount)
        |s| drop(Arc::from_raw(s as *const MyWaker)), // decrease refcount
    )
};

fn mywaker_into_waker(s: *const MyWaker) -> Waker {
    let raw_waker = RawWaker::new(s as *const (), &VTABLE);
    unsafe { Waker::from_raw(raw_waker) }
}

impl Task {
    fn new(reactor: Arc<Mutex<Box<Reactor>>>, data: u64, id: usize) -> Self {
        Task { id, reactor, data }
    }
}

impl Future for Task {
    type Output = usize;
    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
        let mut r = self.reactor.lock().unwrap();
        if r.is_ready(self.id) {
            println!("POLL: TASK {} IS READY", self.id);
            *r.tasks.get_mut(&self.id).unwrap() = TaskState::Finished;
            Poll::Ready(self.id)
        } else if r.tasks.contains_key(&self.id) {
            println!("POLL: REPLACED WAKER FOR TASK: {}", self.id);
            r.tasks.insert(self.id, TaskState::NotReady(cx.waker().clone()));
            Poll::Pending
        } else {
            println!("POLL: REGISTERED TASK: {}, WAKER: {:?}", self.id, cx.waker());
            r.register(self.data, cx.waker().clone(), self.id);
            Poll::Pending
        }
    }
}

// =============================== REACTOR ===================================
enum TaskState {
    Ready,
    NotReady(Waker),
    Finished,
}
struct Reactor {
    dispatcher: Sender<Event>,
    handle: Option<JoinHandle<()>>,
    tasks: HashMap<usize, TaskState>,
}

#[derive(Debug)]
enum Event {
    Close,
    Timeout(u64, usize),
}

impl Reactor {
    fn new() -> Arc<Mutex<Box<Self>>> {
        let (tx, rx) = channel::<Event>();
        let reactor = Arc::new(Mutex::new(Box::new(Reactor {
            dispatcher: tx,
            handle: None,
            tasks: HashMap::new(),
        })));

        let reactor_clone = Arc::downgrade(&reactor);
        let handle = thread::spawn(move || {
            let mut handles = vec![];
            // This simulates some I/O resource
            for event in rx {
                println!("REACTOR: {:?}", event);
                let reactor = reactor_clone.clone();
                match event {
                    Event::Close => break,
                    Event::Timeout(duration, id) => {
                        let event_handle = thread::spawn(move || {
                            thread::sleep(Duration::from_secs(duration));
                            let reactor = reactor.upgrade().unwrap();
                            reactor.lock().map(|mut r| r.wake(id)).unwrap();
                        });
                        handles.push(event_handle);
                    }
                }
            }
            handles.into_iter().for_each(|handle| handle.join().unwrap());
        });
        reactor.lock().map(|mut r| r.handle = Some(handle)).unwrap();
        reactor
    }

    fn wake(&mut self, id: usize) {
        self.tasks.get_mut(&id).map(|state| {
            match mem::replace(state, TaskState::Ready) {
                TaskState::NotReady(waker) => waker.wake(),
                TaskState::Finished => panic!("Called 'wake' twice on task: {}", id),
                _ => unreachable!()
            }
        }).unwrap();
    }

    fn register(&mut self, duration: u64, waker: Waker, id: usize) {
        if self.tasks.insert(id, TaskState::NotReady(waker)).is_some() {
            panic!("Tried to insert a task with id: '{}', twice!", id);
        }
        self.dispatcher.send(Event::Timeout(duration, id)).unwrap();
    }

    fn is_ready(&self, id: usize) -> bool {
        self.tasks.get(&id).map(|state| match state {
            TaskState::Ready => true,
            _ => false,
        }).unwrap_or(false)
    }
}

impl Drop for Reactor {
    fn drop(&mut self) {
        self.dispatcher.send(Event::Close).unwrap();
        self.handle.take().map(|h| h.join().unwrap()).unwrap();
    }
}

I added a some debug printouts so we can observe a couple of things:

  1. How the Waker object looks just like the trait object we talked about in an earlier chapter
  2. The program flow from start to finish

The last point is relevant when we move on the the last paragraph.

There is one subtle thing to note about our example. What happens if we pass in the same id for both events?

let future1 = Task::new(reactor.clone(), 1, 1);
let future2 = Task::new(reactor.clone(), 2, 1);

We'll discuss this a bit more under exercises in the last chapter where we also look at ways to fix it. For now, just make a note of it so you're aware of the problem.

Async/Await and concurrency

The async keyword can be used on functions as in async fn(...) or on a block as in async { ... }. Both will turn your function, or block, into a Future.

These Futures are rather simple. Imagine our generator from a few chapters back. Every await point is like a yield point.

Instead of yielding a value we pass in, we yield the result of calling poll on the next Future we're awaiting.

Our mainfut contains two non-leaf futures which it will call poll on. Non-leaf-futures has a poll method that simply polls their inner futures and these state machines are polled until some "leaf future" in the end either returns Ready or Pending.

The way our example is right now, it's not much better than regular synchronous code. For us to actually await multiple futures at the same time we somehow need to spawn them so the executor starts running them concurrently.

Our example as it stands now returns this:

Future got 1 at time: 1.00.
Future got 2 at time: 3.00.

If these Futures were executed asynchronously we would expect to see:

Future got 1 at time: 1.00.
Future got 2 at time: 2.00.

Note that this doesn't mean they need to run in parallel. They can run in parallel but there is no requirement. Remember that we're waiting for some external resource so we can fire off many such calls on a single thread and handle each event as it resolves.

Now, this is the point where I'll refer you to some better resources for implementing a better executor. You should have a pretty good understanding of the concept of Futures by now helping you along the way.

The next step should be getting to know how more advanced runtimes work and how they implement different ways of running Futures to completion.

If I were you I would read this next, and try to implement it for our example..

That's actually it for now. There as probably much more to learn, this is enough for today.

I hope exploring Futures and async in general gets easier after this read and I do really hope that you do continue to explore further.

Don't forget the exercises in the last chapter 😊.

Bonus Section - a Proper Way to Park our Thread

As we explained earlier in our chapter, simply calling thread::park is not really sufficient to implement a proper reactor. You can also reach a tool like the Parker in crossbeam: crossbeam::sync::Parker

Since it doesn't require many lines of code to create a working solution ourselves we'll show how we can solve that by using a Condvar and a Mutex instead.

Start by implementing our own Parker like this:

#[derive(Default)]
struct Parker(Mutex<bool>, Condvar);

impl Parker {
    fn park(&self) {

        // We aquire a lock to the Mutex which protects our flag indicating if we
        // should resume execution or not.
        let mut resumable = self.0.lock().unwrap();

            // We put this in a loop since there is a chance we'll get woken, but
            // our flag hasn't changed. If that happens, we simply go back to sleep.
            while !*resumable {

                // We sleep until someone notifies us
                resumable = self.1.wait(resumable).unwrap();
            }

        // We immidiately set the condition to false, so that next time we call `park` we'll
        // go right to sleep.
        *resumable = false;
    }

    fn unpark(&self) {
        // We simply acquire a lock to our flag and sets the condition to `runnable` when we
        // get it.
        *self.0.lock().unwrap() = true;

        // We notify our `Condvar` so it wakes up and resumes.
        self.1.notify_one();
    }
}

The Condvar in Rust is designed to work together with a Mutex. Usually, you'd think that we don't release the mutex-lock we acquire in self.0.lock().unwrap(); before we go to sleep. Which means that our unpark function never will acquire a lock to our flag and we deadlock.

Using Condvar we avoid this since the Condvar will consume our lock so it's released at the moment we go to sleep.

When we resume again, our Condvar returns our lock so we can continue to operate on it.

This means we need to make some very slight changes to our executor like this:

fn block_on<F: Future>(mut future: F) -> F::Output {
    let parker = Arc::new(Parker::default()); // <--- NB!
    let mywaker = Arc::new(MyWaker { parker: parker.clone() }); <--- NB!
    let waker = mywaker_into_waker(Arc::into_raw(mywaker));
    let mut cx = Context::from_waker(&waker);

    // SAFETY: we shadow `future` so it can't be accessed again.
    let mut future = unsafe { Pin::new_unchecked(&mut future) };
    loop {
        match Future::poll(future.as_mut(), &mut cx) {
            Poll::Ready(val) => break val,
            Poll::Pending => parker.park(), // <--- NB!
        };
    }
}

And we need to change our Waker like this:

#[derive(Clone)]
struct MyWaker {
    parker: Arc<Parker>,
}

fn mywaker_wake(s: &MyWaker) {
    let waker_arc = unsafe { Arc::from_raw(s) };
    waker_arc.parker.unpark();
}

And that's really all there is to it.

If you checked out the playground link that showcased how park/unpark could cause subtle problems you can check out this example which shows how our final version avoids this problem.

The next chapter shows our finished code with this improvement which you can explore further if you wish.

Our finished code

Here is the whole example. You can edit it right here in your browser and run it yourself. Have fun!

fn main() {
    let start = Instant::now();
    let reactor = Reactor::new();

    let fut1 = async {
        let val = Task::new(reactor.clone(), 1, 1).await;
        println!("Got {} at time: {:.2}.", val, start.elapsed().as_secs_f32());
    };

    let fut2 = async {
        let val = Task::new(reactor.clone(), 2, 2).await;
        println!("Got {} at time: {:.2}.", val, start.elapsed().as_secs_f32());
    };

    let mainfut = async {
        fut1.await;
        fut2.await;
    };

    block_on(mainfut);
}

use std::{
    collections::HashMap,
    future::Future,
    mem,
    pin::Pin,
    sync::{
        mpsc::{channel, Sender},
        Arc, Condvar, Mutex,
    },
    task::{Context, Poll, RawWaker, RawWakerVTable, Waker},
    thread::{self, JoinHandle},
    time::{Duration, Instant},
};
// ============================= EXECUTOR ====================================
#[derive(Default)]
struct Parker(Mutex<bool>, Condvar);

impl Parker {
    fn park(&self) {
        let mut resumable = self.0.lock().unwrap();
        while !*resumable {
            resumable = self.1.wait(resumable).unwrap();
        }
        *resumable = false;
    }

    fn unpark(&self) {
        *self.0.lock().unwrap() = true;
        self.1.notify_one();
    }
}

fn block_on<F: Future>(mut future: F) -> F::Output {
    let parker = Arc::new(Parker::default());
    let mywaker = Arc::new(MyWaker {
        parker: parker.clone(),
    });
    let waker = mywaker_into_waker(Arc::into_raw(mywaker));
    let mut cx = Context::from_waker(&waker);

    // SAFETY: we shadow `future` so it can't be accessed again.
    let mut future = unsafe { Pin::new_unchecked(&mut future) };
    loop {
        match Future::poll(future.as_mut(), &mut cx) {
            Poll::Ready(val) => break val,
            Poll::Pending => parker.park(),
        };
    }
}
// ====================== FUTURE IMPLEMENTATION ==============================
#[derive(Clone)]
struct MyWaker {
    parker: Arc<Parker>,
}
#[derive(Clone)]
pub struct Task {
    id: usize,
    reactor: Arc<Mutex<Box<Reactor>>>,
    data: u64,
}

fn mywaker_wake(s: &MyWaker) {
    let waker_arc = unsafe { Arc::from_raw(s) };
    waker_arc.parker.unpark();
}

fn mywaker_clone(s: &MyWaker) -> RawWaker {
    let arc = unsafe { Arc::from_raw(s) };
    std::mem::forget(arc.clone()); // increase ref count
    RawWaker::new(Arc::into_raw(arc) as *const (), &VTABLE)
}

const VTABLE: RawWakerVTable = unsafe {
    RawWakerVTable::new(
        |s| mywaker_clone(&*(s as *const MyWaker)),   // clone
        |s| mywaker_wake(&*(s as *const MyWaker)),    // wake
        |s| (*(s as *const MyWaker)).parker.unpark(), // wake by ref (don't decrease refcount)
        |s| drop(Arc::from_raw(s as *const MyWaker)), // decrease refcount
    )
};

fn mywaker_into_waker(s: *const MyWaker) -> Waker {
    let raw_waker = RawWaker::new(s as *const (), &VTABLE);
    unsafe { Waker::from_raw(raw_waker) }
}

impl Task {
    fn new(reactor: Arc<Mutex<Box<Reactor>>>, data: u64, id: usize) -> Self {
        Task { id, reactor, data }
    }
}

impl Future for Task {
    type Output = usize;
    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
        let mut r = self.reactor.lock().unwrap();
        if r.is_ready(self.id) {
            *r.tasks.get_mut(&self.id).unwrap() = TaskState::Finished;
            Poll::Ready(self.id)
        } else if let std::collections::hash_map::Entry::Occupied(mut e) = r.tasks.entry(self.id) {
            e.insert(TaskState::NotReady(cx.waker().clone()));
            Poll::Pending
        } else {
            r.register(self.data, cx.waker().clone(), self.id);
            Poll::Pending
        }
    }
}
// =============================== REACTOR ===================================
enum TaskState {
    Ready,
    NotReady(Waker),
    Finished,
}
struct Reactor {
    dispatcher: Sender<Event>,
    handle: Option<JoinHandle<()>>,
    tasks: HashMap<usize, TaskState>,
}

#[derive(Debug)]
enum Event {
    Close,
    Timeout(u64, usize),
}

impl Reactor {
    fn new() -> Arc<Mutex<Box<Self>>> {
        let (tx, rx) = channel::<Event>();
        let reactor = Arc::new(Mutex::new(Box::new(Reactor {
            dispatcher: tx,
            handle: None,
            tasks: HashMap::new(),
        })));

        let reactor_clone = Arc::downgrade(&reactor);
        let handle = thread::spawn(move || {
            let mut handles = vec![];
            for event in rx {
                let reactor = reactor_clone.clone();
                match event {
                    Event::Close => break,
                    Event::Timeout(duration, id) => {
                        let event_handle = thread::spawn(move || {
                            thread::sleep(Duration::from_secs(duration));
                            let reactor = reactor.upgrade().unwrap();
                            reactor.lock().map(|mut r| r.wake(id)).unwrap();
                        });
                        handles.push(event_handle);
                    }
                }
            }
            handles
                .into_iter()
                .for_each(|handle| handle.join().unwrap());
        });
        reactor.lock().map(|mut r| r.handle = Some(handle)).unwrap();
        reactor
    }

    fn wake(&mut self, id: usize) {
        let state = self.tasks.get_mut(&id).unwrap();
        match mem::replace(state, TaskState::Ready) {
            TaskState::NotReady(waker) => waker.wake(),
            TaskState::Finished => panic!("Called 'wake' twice on task: {}", id),
            _ => unreachable!(),
        }
    }

    fn register(&mut self, duration: u64, waker: Waker, id: usize) {
        if self.tasks.insert(id, TaskState::NotReady(waker)).is_some() {
            panic!("Tried to insert a task with id: '{}', twice!", id);
        }
        self.dispatcher.send(Event::Timeout(duration, id)).unwrap();
    }

    fn is_ready(&self, id: usize) -> bool {
        self.tasks
            .get(&id)
            .map(|state| matches!(state, TaskState::Ready))
            .unwrap_or(false)
    }
}

impl Drop for Reactor {
    fn drop(&mut self) {
        self.dispatcher.send(Event::Close).unwrap();
        self.handle.take().map(|h| h.join().unwrap()).unwrap();
    }
}

A little side note

The comments delimiting the Executor, Future implementation and Reactor, emphasize on what is part of the language (Future and Waker) and what is not (runtime specifics). Therefore, the comments associate the Waker with the Future implementation, despite its strong relation with the Executor.

Conclusion and exercises

Congratulations. Good job! If you got this far, you must have stayed with me all the way. I hope you enjoyed the ride!

Remember that you can always leave feedback, suggest improvements or ask questions in the issue_tracker for this book. I'll try my best to respond to each one of them.

I'll leave you with some suggestions for exercises if you want to explore a little further below.

Until next time!

Reader exercises

So our implementation has taken some obvious shortcuts and could use some improvement. Actually, digging into the code and trying things yourself is a good way to learn. Here are some good exercises if you want to explore more:

Avoid wrapping the whole Reactor in a mutex and pass it around

First of all, protecting the whole Reactor and passing it around is overkill. We're only interested in synchronizing some parts of the information it contains. Try to refactor that out and only synchronize access to what's really needed.

I'd encourage you to have a look at how async_std solves this with a global runtime which includes the reactor and how tokio's rutime solves the same thing in a slightly different way to get some inspiration.

  • Do you want to pass around a reference to this information using an Arc?
  • Do you want to make a global Reactor so it can be accessed from anywhere?

Building a better executor

Right now, we can only run one and one future only. Most runtimes have a spawn function which let's you start off a future and await it later so you can run multiple futures concurrently.

As I suggested in the start of this book, visiting @stjepan'sblog series about implementing your own executors is the place I would start and take it from there. You could further examine the source code of smol - A small and fast async runtime which is a good project to learn from.

Create a unique Id for each task

As we discussed at the end of the main example. What happens if the user pass in the same Id for both events?

let future1 = Task::new(reactor.clone(), 1, 1);
let future2 = Task::new(reactor.clone(), 2, 1);

Right now, it will deadlock since our poll method thinks the first poll of future2 is future1 getting polled again and swaps out the waker with the one from future2. This waker never gets called since the task is never registered.

It's probably a bad idea to expose the user to this behavior, so we should have a unique Id for each task which we use internally. There are many ways to solve this. Below is two suggestions to get going:

  1. Let the reactor have a usize which is incremented on every task creation
  2. Use a crate like Guid to generate an unique Id for each task

Further reading

There are many great resources. In addition to the RFCs and articles I've already linked to in the book, here are some of my suggestions:

The official Async book

Tokio tutorial

The async_std book

smol - a small and fast async runtime

Aron Turon: Designing futures for Rust

Steve Klabnik's presentation: Rust's journey to Async/Await

The Tokio Blog

Stjepan's blog with a series where he implements an Executor

Jon Gjengset's video on The Why, What and How of Pinning in Rust

Withoutboats blog series about async/await