How concurrency works

Concurrency or parallelism means doing multiple things at once. Normally, programs execute their whole code sequentially, only ever one thing at a time. There are a few different ways you can make it run concurrently.

Ways to implement concurrency

Multiple processes

Let's get this out of the way because it sort of only halfway counts but it's the most basic. Just spawn multiple processes. Like a command shell does when you pipe (`ls | wc` spawns `ls` and `wc` as separate processes, so they run at the same time), you can spawn new processes to do the things you want to be done at the same time.

Separate processes don't, by default, share any memory or anything. For the use case of `ls | wc`, this makes sense. But if you wanted to implement, say, a chat app that needs to handle the user's keystrokes at the same time as listening for incoming messages, those both need access to the app's data, and they both need to *change* the app's data (such as adding new messages to the history), so it doesn't make sense to use two separate processes there.

Also, every time a processor switches from running one process to another, it has to do a "context switch", which takes time - not individually noticeable to humans, of course, but if the computer is doing this constantly, the overhead can be significant.

Multiple threads

This is pretty similar to multiple processes but makes sense for different use cases. A thread is like a part of a process that can run at the same time as another part. Compared with separate processes:

Threads, by default, share most of their memory with each other (specifically the text segment (instructions), heap, and data segment, but not stack), so it makes sense to use them for two parts of the same app that need to run at the same time.

The processor still has to do a context switch to switch between threads, but it's a less expensive kind of context switch, because there's less context it has to switch out.

Green threads

"Green" threads are like threads, but managed by the process itself instead of by the operating system.

To use green threads, a program sets up a small number of OS threads that actually run things, and any number of green threads which represent all the things it wants to do concurrently, and then it shares the OS threads across the green threads so every green thread gets a turn to run, just like how the kernel shares a few CPU cores across all the running processes (or OS threads).

The point of doing this is that switching between green threads is cheaper than switching between OS threads, since it doesn't involve the kernel.

And before you ask: you don't have to implement this scheduling complexity yourself to use green threads. Usually if you're using green threads it's because you're using a language or library that provides them for you, such as the Go language. In Go, you just use the `go` keyword to spawn a new green thread. The compiler will automatically insert the code for managing these green threads into every executable.

Asynchronous programming and event loops

Asynchronous programming is a style of programming meant for when all the things you want to do concurrently are *IO-bound* rather than *CPU-bound* (that is, you want to listen for incoming data on two sockets at the same time and respond to whichever comes in first, but don't need to do two math problems at the same time). Asynchronous programming doesn't necessarily rely on threads at all.

Generally, asynchronous programming involves an event loop, which is a data structure that keeps track of all the tasks that are running concurrently and runs whichever of them is not stuck waiting on IO.

Asynchronous programming is a natural way to write certain kinds of applications, like network servers or GUI programs.

Synchronization

Threads generally introduce the need for synchronization. I touched on this in the chapter on inter-process communication, but now I'll explain it in more detail.

Writing to memory is a volatile thing. If you have one thread trying to read from a memory address, and another thread trying to write to it at the same time, what happens? Does the reader get the old value, the new value, or some corrupted mixture of the two? Does the write succeed at all? Does the program crash? There is no reliable answer. This situation is called a "data race", and the actual outcome can depend on the hardware and stuff. So, for software purposes, it's called "undefined behavior", because it's not defined what will happen in this situation.

So, when you have two threads or two processes sharing memory, you generally have to take steps to ensure that access is synchronized: that even if the threads are running at the same time, they won't both touch the same memory location at the same time. Specifically:

Multiple threads *reading* from the same memory location is okay, you don't need to synchronize them.

Multiple threads *writing* to the same memory location is not okay, you need to synchronize them.

One thread writing while another thread is reading from the same memory location is also not okay, you need to synchronize them.

I'll talk about two programming techniques used to accomplish this synchronization:

Atomic instructions

Processors support instructions for *atomically* reading or writing memory, which guarantees the read or write happens all at once and not at the same time as anything else. These aren't the default ways of reading and writing memory because they have more overhead and aren't needed for most cases (anything where only one thread is accessing it at a time).

Different programming languages have various ways of offering you this functionality. For example, in Zig there's a thing called `@atomicStore` (store meaning write) which atomically stores a value in a variable, unlike the normal `variable = value` syntax which is not atomic.

Atomic reads/writes have a limitation though: they only work with primitive operations like reading or writing a number to memory. If you need to synchronize something more complex, you need to use locks.

Locks

Locks are a feature provided by operating system kernels. Two threads can share a lock, and before each one tries to access the shared memory, it requests to acquire the lock, and:

If the lock is available (no other thread is holding it), the kernel will give it out and mark it as held.

If the lock is already held, the kernel will suspend the thread asking for it until it's released.

It's important for a thread that uses the lock to release it when it's done, otherwise no other thread will be able to access the memory afterward.

Locks are very difficult to work with in most languages because there's so many ways you can go wrong with them:

You can accidentally access the shared memory without acquiring a lock, causing a data race.

You can forget to release the lock when you're done, which will prevent any thread from ever acquiring the lock again because the kernel thinks it's still held, causing a deadlock (the program freezes as all of its threads are stuck waiting for the lock).

In a complex situation with multiple locks involved, you can have one thread acquire lock A and then try to acquire lock B (in order to do something that involves two different pieces of shared memory), while another thread already acquired lock B and is now trying to acquire lock A. Neither thread can ever get the lock it needs because the other thread will never release it, because it's also waiting on a lock. This is another way to cause a deadlock.

Deadlocks are hard to debug because instead of any error message, the program just freezes, leaving you to search your entire source code for the problem.

Data races are even nastier because they're non-deterministic, so they might for example happen on a user's machine while the program works fine on the developer's machine, or they might happen on the developer's machine but only sporadically, with no reliable way to reproduce the bug, and thus even less information about where the problem might be.

Rust is the only language where locks are actually easy to work with, because it exposes their functionality in such a way that you *can't* access lock-protected data without acquiring the lock, and also can't forget to release the lock when you're done.

There are also multiple kinds of locking: there's exclusive locking, which excludes all other threads from having the lock at the same time, and there's shared locking, which allow other threads to have it at the same time as long as none of them need to have it exclusively. If a thread is only going to be reading from the shared memory, it can just acquire a shared lock so it doesn't block other readers.

Exclusive locks are also called mutexes (MUTual EXclusion).