Using atomic operations to achieve thread safety

Atomic operations are ones which can be executed on otherwise-unprotected shared data as a single indivisible step, where no other thread will be able to compromise the consistency of either that shared data or the operation itself.

In other words, even for operation logic consisting of multiple, compound operations—steps—being atomic means that all those steps will run as an unbreakable whole on the shared data, and multiple threads will not be able to interfere with each other.

Most fundamental atomic operations are actually implemented on the hardware level, and rely on specific CPU instructions to ensure atomicity, but some can also be implemented in higher level code with the help of those atomic primitives.

Because hardware support is needed for the most fundamental primitives, atomic operations will be limited to simple data types commonly up to a size of a pointer. Support for atomic operations for larger data sizes—double pointers or even more—is hardware-dependent, and Delphi currently only supports atomic operations for 32-bit and 64-bit data types on all platforms.

Some atomic operations which are not directly supported by hardware can be implemented by using supported atomic primitives and a loop until the operation succeeds.

This size constraint makes it obvious that atomic operations have limited use, but they are still essential tools for achieving thread safety in multi-threaded code. On the one hand, they actually don't protect that much and might even look useless to the untrained eye; on the other hand, without them there would be no multi-threaded code as we know it.

The simplest atomic operations are atomic load and store (read and write). If the pointer-sized data is naturally aligned, then even a regular read or write of such a variable will be atomic. Because of this, you will rarely need to use dedicated atomic operations for reading and writing.

The next simple operation is an atomic exchange (swap), which unconditionally writes a new value and returns the old value.

And then there is another fundamental atomic operation: compare and exchange (compare and swap—CAS), which compares the current value with some comparand, and writes the new value only if the comparison succeeds, returning the original value of a variable.

Other Delphi-supported atomic operations are: addition and subtraction (including increment and decrement as special cases), and various bitwise operations.

Multiple steps

The purpose of atomic operations is to combine multiple steps which may be required and which could be interrupted when multiple threads operate on the same data into a single, indivisible one.

Showing some examples and how multiple threads can impact shared data in situations which may look simple from the perspective of a higher-level language and code will make it easier to understand the nature of concurrency bugs when performing similar, non-atomic operations, but also how we can use atomic operations and what we can achieve by using them.

Probably the simplest example to understand is incrementing a integer number declared as:

var Number: Integer := 0;

To increment a value (in this case 0) stored in such a variable we can write:

Inc(Number);

From a high-level code perspective, this may seem like a simple operation. But if we dig deeper into it and look at its logical equivalent, we can more easily see that it is not that simple at all:

Number := Number + 1;

To increment a number, we actually need to read the value from that variable (memory location), add one to that value, and then store that result back into the variable.

It is easy to see how multiple threads can interfere with each other in such a situation. If we have two threads incrementing a value, we would logically expect that the end result will be incremented by two. If we have more than two threads doing that, we would expect that it would be incremented by the number of times we called the increment function.

Even if we assume that the variable is naturally aligned, and that the reads and writes alone will be atomic with no data tearing, incrementing a value from multiple threads is not thread-safe, and it may not give us proper results.

How can that happen?

In a multi-threaded scenario, two threads can easily access and read the value from Number at the same time. That means both threads will read the same value. But then they will both increment that same value and write the incremented value back, instead of one thread incrementing and writing the value first, and the other thread incrementing the already incremented value afterwards. As a result, we will have our value incremented only once instead of twice.

Add more threads into the mix, and the results can be even more wrong.

If we want to protect such shared data and have correct results in a multi-threaded scenario, we would have to create a lock, and lock all access—both reading and writing—to such a variable:

var Lock := TCriticalSection.Create; var Number: Integer := 0;
Lock.Enter; try Number := Number + 1; finally Lock.Leave; end;

Creating and maintaining a lock, as well as entering and exiting a lock for protecting variable requires a lot of code and is also rather slow. This is where atomic operations come in handy. Atomically incrementing a variable does not require an additional lock, and will therefore run much faster.

To atomically increment an integer in Delphi, we can call:

AtomicIncrement(Number);

or its equivalent from the static TInterlocked class, belonging to the System.SyncObjs unit:

TInterlocked.Increment(Number);

Atomic increment will perform the whole read-increment-write sequence as a one single, uninterrupted operation, where other threads will not be able to read the value in that memory location (this is a simplified explanation of what actually happens behind the scenes) while some thread is in the middle of doing the atomic increment.

No matter how many threads you have, nor how many times you call AtomicIncrement(Number) the value stored in the Number will be incremented the exact number of times you called AtomicIncrement(Number), no more, no less.

Atomic increment, decrement, addition, and subtraction operations are actually functions which not only perform a particular operation on the target variable, but also return the result of that operation. That way you can know what was the actual value of a variable written at that point in time.

Some use cases

Reference counting

Atomically incrementing or decrementing a number is one of the more commonly used atomic operations. Its primary use case is maintaining a correct strong reference count of an instance in an automatic reference counting mechanism.

While automatic reference counting is also used for strings and dynamic arrays, reference counting for interfaces is delegated to the _AddRef and _Release methods. These methods are implemented in different ways depending on the actual implementing class and its purpose, but the most basic ones would be:

function TInterfacedObject._AddRef: Integer; begin Result := AtomicIncrement(FRefCount); end; function TInterfacedObject._Release: Integer; begin Result := AtomicDecrement(FRefCount); if Result = 0 then Destroy; end;

The above implementation maintains a thread-safe reference count of an object instance. In _Release, we can see how we can use the result of the atomic decrement. If the returned value is zero, that means this particular reference was the last strong reference to that object instance.

In other words, the object instance is no longer accessible through any strong reference at that point, and we can safely destroy such an object. Not only can we destroy the object, but we must destroy it, otherwise we would have a memory leak.

Maintaining a thread-safe reference count in ARC is fundamental for sharing any reference-counted instance across multiple threads. Without it, it would be impossible to maintain a correct count of strong references, and without the correct count, the instance could get prematurely destroyed while it is still in use, or remain in memory after it is no longer used.

So once you have a strong reference to some reference-counted instance, you can safely pass that shared reference to other threads, and they can use it for creating other strong references and passing those around. Once the last strong reference goes out of scope, regardless of which thread it was in or when that happened, such an instance will be safely destroyed.

However, that is all the thread safety you will get with reference counting. Reading the reference, copying it and making another strong reference is thread-safe, but reading and writing the same shared variable from multiple threads is not a thread-safe operation.

So even though automatic reference counting has some thread safety built in, which helps a lot when working with reference-counted instances, that thread safety has some limitations, and if you need to have write access to a variable, you will have to protect it with the additional lock.

Lazy initialization

Another place where atomic operations are commonly used is a thread-safe lazy initialization. You can find plenty of such code within the built-in Delphi frameworks.

If you have a singleton class and you want to lazily initialize the instance, your code might look like the following:

class function TFoo: GetInstance: TFoo; begin if FInstance = nil then FInstance := TFoo.Create; Result := FInstance; end;

The above initialization is not thread-safe. In other words, you would have to make sure that the singleton instance is initialized before being accessed by multiple threads.

This code can be easily converted into a thread-safe form without losing any runtime performance once the initialization is completed. Of course, this is just one of the many possible variants using CompareExchange to safely initialize a singleton instance:

class function TFoo: GetInstance: TFoo; var LInstance: TFoo; begin if FInstance = nil then begin LInstance := TFoo.Create; if TInterlocked.CompareExchange<TFoo>(FInstance, LInstance, nil) <> nil then LInstance.Free; // Free redundant instance end; Result := FInstance; end;

Reading a naturally aligned pointer is thread-safe, so the initial nil check does not require any kind of protection. The worst what can happen is that multiple threads enter the if branch and create multiple TFoo instances. However, by using CompareExchange, only one thread will be able to successfully exchange and write the LInstance value into the FInstance variable.

If the CompareExchange fails, and another thread has already successfully initialized FInstance, we need to release the redundant instance to prevent memory leaks.

Spin locks and guards

Atomic operations can also be used to implement other synchronization objects. For instance, a very basic non-reentrant spin lock can be implemented with something like the following code:

type TSpinLock = class private [volatile] FLocked: Integer; public procedure Enter; procedure Leave; function TryEnter: Boolean; end; procedure TSpinLock.Enter; begin while TInterlocked.Exchange(FLocked, 1) = 1 do TThread.Yield; end; procedure TSpinLock.Leave; begin TInterlocked.Exchange(FLocked, 0); end; function TSpinLock.TryEnter: Boolean; begin Result := TInterlocked.Exchange(FLocked, 1) = 0; end;

We use a simple integer flag to show our lock state: If the value is 0, then the lock is not locked, and the value 1 means that some thread holds the lock.

When entering the lock, we can use a simple atomic Exchange. If the returned, old value, is 0 that means we have successfully entered the lock. After that, and until we call Leave, any other thread which tries to enter the lock will get 1 as the result of the atomic Exchange. If that happens, the thread will yield its processor time and then try setting the flag again, until it succeeds.

This kind of spin lock is not reentrant. If the same thread tries to enter the lock, it will fail, and without any mechanism to break the while, it will cause a deadlock.

If we remove the Enter implementation, the above spin lock will be converted into a simple guard class, which can be used to protecting parts of the code which needs to run only once by only using TryEnter, or where the thread will merely skip running protected code if some other thread is already running it by using a TryEnter/Leave combination.

If you need to maintain multiple guards for multiple states, then instead of using multiple integer variables as guards for each state, you can also pack those into a single integer, where each bit can be used as a guard for a particular state with the help of the TInterlocked.BitTestAndSet and TInterlocked.BitTestAndClear methods.


The above examples show just some possible use cases for atomic operations, but even from those it can be seen that, even though the thread safety they give is very narrow, there are still many situations where atomic operations will be a better, simpler, and much faster choice for achieving thread safety in some code, which would otherwise require more heavyweight and slower locking and synchronization mechanisms.

Comments

Popular posts from this blog

Universal storage for Delphi procedural types

Delphi 12.3 Update

Celebrating 30 Years of Delphi With a New Book: Delphi Quality-Driven Development