Contents

Are Atomic Operations Faster and Better Than a Mutex? It Depends

Recently, while reviewing a pull request, a discussion arose about using sync/atomic versus sync.RWMutex.

It’s a question that comes up often when writing concurrent Go code, so I thought it would make a great post.

TL;DR

For those in a hurry, here are the key takeaways:

  • Atomic Operations: Can be faster when updating a single shared value across multiple threads (or goroutines).
  • Mutexes/RWMutexes: Add overhead, but are better for operations involving multiple steps or fields.
  • Atomic Operations protect one instruction: They do not guarantee correctness across a sequence of operations (read → logic → write).
  • Complexity grows: As logic grows (compare-and-swap (CAS) loops, retries, multi-field consistency), atomic operations become harder to reason about.
  • Mental model:
    • Mutex/RWMutex: Everything inside the lock is one unit.
    • Atomic Operations: Only this one operation is protected.
  • Rule of thumb: Start with a mutex (RWMutex if read-heavy); switch to atomic operations as an optimization when performance requirements and use cases justify it.

Introduction

In this article, we’ll explore atomic operations and mutexes, how they differ in both performance and behavior, and when each one is appropriate.

We’ll start by looking at concurrency controls and how atomic operations and mutexes protect shared data in fundamentally different ways. Then we’ll examine simple benchmarks to understand their performance characteristics. Finally, we’ll evolve the example (like how real‑world applications grow in complexity) to show where atomic‑based designs become fragile.

The goal is not to argue that one tool is always better than the other, but to show why atomic operations can be faster and why that speed often comes with trade-offs in complexity and correctness.

Code Examples

All code examples in this article are written in Go and available on GitHub, including the benchmarks.

https://github.com/madflojo/atomics-v-rwmutex-examples

While Go is used for the examples, the concepts apply to other programming languages as well. There are some Go-specific details, which I will call out as we go.

Concurrency Controls

Managing access to shared data is a fundamental challenge in concurrent programming. Atomic operations and mutexes are two primary tools used to do so safely.

Before diving into their differences, let’s first understand why we need concurrency controls.

Concurrent Data Access

Unprotected concurrent access to shared data in a multithreaded (or otherwise concurrent) application can lead to data races and unpredictable behavior.

A data race occurs when two or more threads (or goroutines) access the same memory location concurrently. While multiple read operations can occur safely, if at least one of those operations is a write, the outcome becomes unpredictable.

In the best-case scenario, the read operation retrieves the most recent write. However, there is no guarantee.

In the worst case, the read operation may observe stale or partially-written data, causing bugs that are hard to reproduce and diagnose. In Go (and other languages), some data races can even lead to panics or crashes, resulting in unstable applications.

Atomic operations and mutexes are two mechanisms to control access to shared data. They solve the same problem but at very different levels of abstraction. Both help coordinate reads and writes across threads (or goroutines), ensuring data integrity and preventing race conditions.

These tools differ in how much coordination they provide, which has important implications for both correctness and performance. Understanding these differences in coordination is key to choosing the right tool. We’ll start with mutexes, which provide a strong and intuitive approach to concurrency control.

Mutex (Mutual Exclusion)

mu := sync.Mutex{}
mu.Lock()
// Perform operations on shared data
mu.Unlock()

A mutex (short for mutual exclusion) is a synchronization primitive that provides exclusive access to shared data. Only one thread (or goroutine) may hold the lock at a time.

The approach is straightforward: acquire the lock, operate on shared data, then release the lock. While Go has runtime optimizations under the hood (which we’ll discuss later), this basic behavior holds true across most programming languages.

With concurrent readers and writers, a mutex provides strong coordination by enforcing exclusive access. Once a thread (or goroutine) acquires the lock, all other threads attempting to acquire the same lock must wait until it is released.

By enforcing exclusive access, mutexes eliminate data races and make it easy to preserve data integrity. The trade-off for this integrity is some contention and overhead, as threads (or goroutines) may need to wait for access to the shared data. However, the overhead is often negligible, and the simplicity and correctness benefits usually outweigh the performance costs.

RWMutex (Read/Write Mutex)

mu := sync.RWMutex{}

// Write lock
mu.Lock()
// Perform write operations
mu.Unlock()

// Read lock
mu.RLock()
// Perform read operations (never write within a read lock)
mu.RUnlock()

A read/write mutex (RWMutex) is a type of mutex that provides separate locks for reading and writing. While the write lock behaves like the standard mutex (exclusive access), the read lock allows multiple threads (or goroutines) to acquire a read lock simultaneously.

As mentioned previously, multiple concurrent reads are safe as long as no writes occur during those reads. By providing separate read and write locks, an RWMutex can optimize performance for read-heavy workloads by allowing readers to proceed concurrently. In Go’s implementation, once a writer is waiting, new readers will queue behind the writer to prevent writer starvation.

If we take our previous example and swap out the standard mutex for an RWMutex, we can see how the behavior changes.

While write locks still enforce exclusive access for a single thread (or goroutine), multiple read locks can be granted simultaneously. For frequently accessed data, such as configuration that is read often but updated rarely, an RWMutex can significantly reduce contention by allowing concurrent reads.

Real‑world Reference
In my 2025 Monster Scale Summit talk, “Scaling Payments Routing at American Express,” I discussed how, within our Global Transaction Router, moving some mutexes (like the one protecting our routing configuration) to RWMutexes provided a significant performance optimization.
Word of Caution

While RWMutexes can be a powerful optimization for read-heavy workloads, they are easy to misuse.

Common mistakes I’ve seen include:

  1. Mismatched lock/unlock pairs.

When using an RWMutex, it’s crucial to match RLock with RUnlock and Lock with Unlock. If you accidentally call Unlock (instead of RUnlock) after RLock, it can lead to deadlocks or panics.

Linters can help catch these mistakes, but I’ve still spent time debugging tests that hang because of this exact issue.

  1. Writing while holding a read lock.

As requirements change, methods that start as a read‑only operation may need to perform writes. If you or the next person touching the code forgets (or doesn’t understand) that a read lock is used, this can lead to data races as described earlier.

Read/write mutexes are powerful tools, but they require careful usage to avoid pitfalls. With that said, mutexes and RWMutexes provide a simple and effective way to coordinate access to shared data. Next, we will explore atomic operations, which tackle the same problem from a different angle.

Atomic Operations

var n atomic.Int64
// Write operation
n.Add(1)

// Read operation
y := n.Load()

Atomic operations enable safe concurrent access by leveraging low‑level CPU instructions and cache‑coherence protocols to ensure that each operation on a single value is atomic (indivisible). They differ from mutexes: atomic operations do not coordinate multiple steps; they make each individual operation atomic.

In Go, mutexes are managed by the runtime. A lock allows a thread (or goroutine) to prevent others from accessing shared data until the lock is released. Atomics work fundamentally differently. Each operation is atomic and cannot be observed (read or written) halfway through. Multiple threads (or goroutines) can operate on the same value, but only one atomic operation on that value can complete at a time.

If we use the previous example of concurrent readers and writers, we can see how atomic operations focus on ordering individual operations rather than coordinating access to shared data.

A simple way to think about atomic operations is that CPU instructions and cache‑coherence protocols ensure that operations on a shared value in memory happen one at a time. There is no concept of multiple readers or exclusive writers; each operation is ordered and atomic.

Because atomic operations operate via low-level hardware instructions, each operation is very fast. Go even leverages atomic operations within its runtime to optimize mutex performance.

The atomicity and speed of atomic operations make them attractive. However, because they protect only a single instruction, they are easy to misuse when correctness depends on more than one operation.

Go Runtime Optimization for Mutexes

Rather than relying solely on operating‑system‑level mutexes, Go’s runtime includes a fast‑path optimization for mutex locks that leverages atomic operations.

The optimization works by attempting to mimic a lock using an atomic operation first. If the atomic operation succeeds, the lock is returned to the caller immediately, avoiding the overhead of a full OS-level mutex lock. If the atomic operation fails, Go will retry it a limited number of times before falling back to an OS-level mutex lock. This works well when contention is short-lived, allowing goroutines to acquire the lock without being parked by the OS.

The diagram below illustrates the sequence of operations when acquiring a mutex lock in Go.

Go is able to optimize mutex locks in this way because it manages goroutines within its own runtime. This means that, in many cases, mutex locks can be acquired and released quickly without the overhead of a full OS‑level mutex lock. However, it’s important to remember that this is a runtime implementation optimization, not a language guarantee, and it may change in future Go versions.

It’s best to assume that, at a per‑operation level, mutex locks are slower than atomic operations. If it’s ever just as fast, consider it a happy accident of the runtime optimizations.

Cheat Sheet: Concurrency Controls

Before moving to the implementation examples, here is a quick reference table summarizing the concurrency controls we’ve discussed.

TypeDescription
MutexExclusive access to shared data. Protects multi-step operations and invariants.
RWMutexAllows multiple concurrent readers but exclusive access for writers. Best for read-heavy workloads.
Atomic OperationsAtomic access to a single shared value using low-level hardware instructions.

Implementations: Simple Example

Now that we’ve explored the theory behind atomic operations and mutexes, let’s implement both approaches using a simple example.

To showcase the performance differences, we will create an oversimplified example package that manages an account balance. The interface is purposefully simple to focus on the concurrency control mechanisms rather than the full complexity of a real-world application.

type Balance interface {
	Balance() int64
	Add(amount int64)
	Subtract(amount int64) error
}

The Balance interface allows us to add and subtract funds, as well as retrieve the current balance. Simple, but effective for demonstrating the differences between atomic operations and mutexes.

In this section, we will implement this interface using both a naive atomic‑operations approach and a read/write mutex approach. We will then run a set of benchmarks to show the performance differences between the two implementations.

Benchmark Test Cases

The two benchmarks we will run are:

Test CaseDescription
BalanceAddConcurrently calls Add() to increment the balance
BalanceAddWithReadConcurrently calls Balance() and Add() to simulate a read-heavy workload with writes

The goal of these tests is primarily to showcase the performance differences between atomic operations and mutexes under contention. Of course, benchmarks should be taken with a grain of salt, as CPU architecture, Go version, and system load can all impact results. But these tests should serve their purpose as a directional guide.

The benchmark tests are available in the GitHub repository linked earlier.

Atomic Operations Implementation

The implementation of the Balance interface using atomic operations is straightforward at first glance.

type AtomicBalance struct {
	value atomic.Int64
}

Within the struct, we maintain the balance as an atomic.Int64. This allows us to perform atomic operations on the balance safely across multiple goroutines.

func (b *AtomicBalance) Balance() int64 {
    return b.value.Load()
}

func (b *AtomicBalance) Add(amount int64) {
    b.value.Add(amount)
}

func (b *AtomicBalance) Subtract(amount int64) error {
	b.value.Add(-amount)
	return nil
}

To add or subtract from the balance, we simply call the Add() method on the atomic.Int64. This implementation is simple, safe for concurrent use, and meets our oversimplified requirements to manage a balance.

RWMutex Implementation

The read/write mutex implementation of the Balance interface is also straightforward, especially for those already familiar with Go.

type RWMutexBalance struct {
	mu    sync.RWMutex
	value int64
}

Within the struct, we create an RWMutex. A read/write mutex is appropriate here because we want to enable multiple readers to check the balance, while still controlling write access when adding or subtracting funds. This is intentionally a bit of a premature optimization, but balance checks are often more frequent than balance updates in real‑world applications. So it’s worth establishing the pattern early.

With the mutex in place, when we want to read or write the balance, we need to acquire the appropriate lock.

func (b *RWMutexBalance) Balance() int64 {
	b.mu.RLock()
	defer b.mu.RUnlock()
	return b.value
}

func (b *RWMutexBalance) Add(amount int64) {
	b.mu.Lock()
	defer b.mu.Unlock()

	b.value += amount
}

While retrieving the balance, we acquire a read lock using RLock(), which allows multiple concurrent readers, improving performance under read-heavy workloads.

While adding to the balance, we acquire a write lock using Lock(), which ensures exclusive access to the balance during the update.

With both implementations complete, we can now run our benchmarks to compare their performance.

Benchmarks

Benchmark Caveats

Benchmarks should always be taken with a grain of salt. Hardware differences, Go version changes, and system load can all impact results. If you run these benchmarks (which you can via the GitHub repository linked earlier), you may see different numbers.

This section is purely meant to illustrate the performance differences between atomic operations and mutexes under contention. Don’t assume your results will match these exactly.

The results in this article were generated using the following specifications:

  • Go version: 1.25.5
  • CPU: ARM64 Apple M2 (Total 8 cores, 4 performance cores + 4 efficiency cores)
  • System: Mac Mini (2023)

These specifications don’t matter, though; ignore the numbers—just focus on the relative differences between atomic operations and mutexes.

The basic implementations we’ve created so far will favor atomic operations in terms of performance. But performance isn’t the only factor to consider when choosing between atomic operations and mutexes.

Also, keep in mind that these benchmarks reflect coordination overhead, and do not account for correctness or properly reflect system throughput under real-world workloads.

Reading the Chart
Higher ns/op values mean each operation takes longer (slower overall throughput), so shorter bars are better.
Test CaseAtomic (ns/op)RWMutex (ns/op)Improvement
BalanceAdd~34 ns/op~110 ns/op~3.2x faster
BalanceAddWithRead~38 ns/op~119 ns/op~3.1x faster

As expected, atomic operations outperform the RWMutex implementation. In both test cases, atomic operations are approximately 3× faster than the RWMutex implementation.

You may be asking yourself: “Why not always use atomic operations if they appear so much faster?”

I’m glad you asked.

In the next section, we’ll see how our simple atomic implementation breaks down once correctness depends on more than a single instruction.

Functional Challenges: Complexity

Our initial implementations were intentionally simple to highlight the performance differences between atomic operations and mutexes. But real-world applications often require more complex business logic, which can increase the code complexity and reduce maintainability with atomic operations.

To showcase some of these challenges, we will evolve our requirements and implementations.

Requirement Changes:

  1. Prevent the balance from going negative when subtracting funds.
  2. Maintain a transaction count and last-updated timestamp alongside the balance.

These changes are simple, but they introduce new complexities that highlight the trade-offs between atomic operations and mutexes.

Avoiding Data Loss

Atomic operations are speedy, but they are also very simple. When you start adding complex logic to the mix, atomic operations can make your code more complicated.

In the oversimplified implementation above, we did not enforce any constraints on the balance. But with our new requirement, we need to ensure that the balance never goes negative. This requires reading the current balance before deciding whether to subtract funds.

If we naively implemented the logic in the same way as before, we could inadvertently allow the balance to go negative.

This Code Has Bugs
The following code has logic bugs; don’t copy and paste.
func (b *AtomicBalance) Subtract(amount int64) error {
	current := b.value.Load()
	// As time passes here, other goroutines may modify b.value
	if current-amount < 0 {
		return ErrInsufficientFunds
	}

	b.value.Add(-amount)
	return nil
}

If we naively read the balance, check if subtracting the amount would result in a negative balance, and then perform the subtraction, we introduce a race condition.

Our logic assumed that the balance would not change between reading it and writing the new value. This is wrong in a highly concurrent application.

Atomic operations ensure that each operation is atomic, which means the read is atomic and the write is atomic, but nothing ensures that no changes occur in the time between these operations (while checking if the future balance is acceptable).

If we mindlessly subtract the value, we could end up with a negative balance. Instead, we need to verify that the balance hasn’t changed since we last read it.

Using CompareAndSwap()

The Compare-And-Swap (CAS) operation is a common pattern when working with atomic operations. It allows us to read a value, compare it to an expected value, and swap it with a new value if it matches.

func (b *AtomicBalance) Subtract(amount int64) error {
    // This may retry indefinitely when there is high contention
    // There should probably be a max retry count to avoid infinite loops
    for {
        current := b.value.Load()
        next := current - amount
        if next < 0 {
            return ErrInsufficientFunds
        }

        // Check if the value is still what we expect, and swap it.
        // Other goroutines may have modified b.value in the meantime.
        // If it was modified, the CAS will be false, and the loop retries.
        if b.value.CompareAndSwap(current, next) {
            return nil
        }
    }
}

With the CompareAndSwap() operation, we can ensure that we only update the balance if it hasn’t changed since we last read it. But each time the CAS fails (because another goroutine modified the balance), we need to retry the entire operation.

The CAS operation ensures correctness, but it also introduces complexity into our code.

  • When do we stop retrying?
  • How do we handle failures after too many retries?

While using atomic operations is faster in this case, that speed comes at the cost of complexity. Once retry logic, timeouts, and failure handling are added, the code becomes harder to read and maintain. Especially for those unfamiliar with atomic operations and CAS patterns.

Using a Mutex Instead

In contrast, the Mutex implementation of subtraction is far less complex.

func (b *RWMutexBalance) Subtract(amount int64) error {
	b.mu.Lock()
	defer b.mu.Unlock()

	if b.value-amount < 0 {
		return ErrInsufficientFunds
	}

	b.value -= amount
	return nil
}

Because we are going to write data, we first acquire a write lock, which prevents any other thread (or goroutine) from reading or writing the balance until we release the lock. This means we can safely read the balance, check if subtracting the amount would result in a negative balance, and then perform the subtraction.

Any other thread (or goroutine) trying to call Balance(), Add(), or Subtract() will block until we release the lock. This ensures correctness not just within the Subtract() method, but across all methods that access the balance.

No retry logic, no timeouts, far less complexity, but at the cost of some contention and overhead.

Keeping Data Synchronized

While switching to a CAS operation helped us maintain correctness when subtracting funds, we still have another requirement to address. We need to maintain a transaction count and last-updated timestamp alongside the balance.

We can easily add these fields to our AtomicBalance struct and update them as needed.

type AtomicBalance struct {
  value   atomic.Int64
  trx     atomic.Int64
  updated atomic.Int64
}

func (b *AtomicBalance) Add(amount int64) {
  b.value.Add(amount)
  b.trx.Add(1)
  b.updated.Store(time.Now().UnixNano())
}

The Add() method now updates the balance, increments the transaction count, and sets the last-updated timestamp. But, due to the nature of atomic operations, these updates occur independently. Meaning, they are not synchronized.

If we are OK with eventual consistency, this may be acceptable—we are talking nanoseconds here. So briefly, a reader may observe a balance that has been updated, but the transaction count and timestamp may not reflect that update yet. For many applications, such as systems that maintain financial records, this lack of synchronization can lead to data integrity concerns.

Using a Mutex for Synchronization

While atomic operations could be used to keep these fields synchronized (for example, by atomically swapping a pointer to an immutable struct), doing so introduces additional complexity: allocation, copy-on-write updates, CAS retry loops, and stricter immutability guarantees.

In contrast, the RWMutex implementation can easily keep these fields synchronized while maintaining code clarity.

func (b *RWMutexBalance) Add(amount int64) {
	b.mu.Lock()
	defer b.mu.Unlock()

	b.value += amount
	b.trx++
	b.updated = time.Now().UnixNano()
}

With the write lock held, we can safely update all three fields together without worrying about other threads (or goroutines) reading or writing to them during the update. This exclusive access ensures that the balance, transaction count, and last-updated timestamp are always in sync.

When performing single operations on individual fields, atomic operations shine. But when multiple fields need to be updated together, mutexes provide a simple and reliable way to ensure data integrity.

Which concurrency control mechanism to use depends on the specific use case and requirements of your application.

Use Case Cheat Sheet

To help summarize when to use atomic operations versus mutexes (or RWMutexes), here is a quick reference table that outlines common scenarios.

ScenarioAtomicRWMutexMutex
Single numeric counter hit constantly✅ Optimal⚠️ Overkill⚠️ Overkill
Read-heavy configuration shared across many goroutines⚠️ Hard to implement correctly✅ Optimal❌ High contention
Dozens of independent metrics counters✅ Optimal⚠️ Overkill⚠️ Overkill
Read-modify-write sequences that must stay atomic❌ Prone to logic bugs⚠️ Use caution—helps when reads dominate but still requires careful RLock/Lock usage✅ Optimal
Related fields that must update together (balance + timestamp + count)❌ Hard to keep consistent⚠️ Use caution—helps when reads dominate but still requires careful RLock/Lock usage✅ Optimal
Write-heavy shared state touched by many goroutines⚠️ Fast if each update is a single atomic instruction❌ Reader bookkeeping adds overhead while writers still block everyone✅ Optimal

Final Thoughts

Both atomic operations and mutexes are excellent tools for controlling concurrent data access. But it’s important to know when to use one vs. the other. Of course, the “when” depends on the problem you are trying to solve and your unique system constraints.

If we use our example, when we started, atomic operations made sense and were significantly faster. But as our requirements evolved (by adding a transaction counter and a last-updated timestamp), the use of atomic operations became more complex, introducing eventual consistency and logic bugs.

Whereas, the mutex implementation remained straightforward and easy to reason about, even as complexity grew. The trade-off, of course, is contention and overhead, but in many cases, the simplicity and correctness benefits outweigh the performance costs.

My personal take is that it’s better to start with a mutex (or RWMutex if read-heavy) and then optimize with atomic operations where you see opportunities (ideally after measuring performance with benchmarks or a profiler). Even if you understand atomic operations well, the next person touching your code may not. Having a bias towards code clarity and maintainability is often worth the slight performance trade-off.