Contents

In-Flight Request Tracking: Lessons from Card Payments and HTTP/2

Every time a credit card is swiped, tapped, or dipped, a correlation problem has to be solved in real time. Payment systems need low latency and high throughput, but they also need correct outcomes.

For many distributed systems, the common approach is still fairly simple.

A request is made, a response is returned, and the caller moves on. But systems that care about latency and throughput cannot afford to serialize all work behind a single in-flight request on one connection.

Instead of sending one request and waiting for the result, they keep connections busy by overlapping requests.

The tradeoff is that once multiple requests are in-flight at the same time, correlating responses becomes a first-class problem. Responses can arrive out of order, one request can time out while others on the same connection succeed, and a timeout may mean only that the request was not matched in time.

Modern multiplexed protocols and transports like HTTP/2 can hide some of that correlation work, but they don’t remove the underlying correctness problems created by timeouts, retries, and late responses.

For high-performance asynchronous systems, in-flight request tracking is a core architectural problem.

This post looks at how payment systems that process ISO8583 card transactions solve that problem, and how HTTP/2 handles much of the transport-level correlation underneath the application.

TL;DR

The core pattern is simple: track state, correlate responses, and design for uncertainty. For those in a hurry, here are the key takeaways:

  • Asynchronous systems trade waiting and simplicity for concurrency: multiple requests can be in-flight at the same time on the same connection.
  • Once requests overlap, correlation is no longer implicit: the system needs identifiers and in-flight state to match responses back to requests.
  • Timeouts become ambiguous: a timeout means “not matched in time,” not necessarily “the operation failed.”
  • Timeouts create uncertainty: if a request times out, the system may need compensating actions such as reversals, retries, or reconciliation without knowing the original outcome.
  • Correctness depends on more than a map: state transitions need to be atomic, and routing needs to preserve locality.
  • HTTP/2 uses stream IDs to solve transport-level correlation on a multiplexed connection.
  • Protocols like HTTP/2 can hide transport correlation, but they do not solve application-level retries, duplicates, reversals, or compensation logic for you.
Disclaimer
This post is a high-level introduction to in-flight request tracking in asynchronous systems. It does not describe any single payment system implementation, but instead distills common patterns and approaches used across the industry.

Why High-Performance Systems Use Asynchronous Communication

At scale, the biggest bottleneck in many systems is not computation. It is waiting.

The Hidden Cost of Synchronous Communication

In a synchronous system, each request blocks on the previous one. You send a request, wait for a response, then send the next.

In this approach, the connection often sits idle while the client queues new requests and waits for responses. If you want to keep more work moving, you need more connections, each with its own overhead to create, manage, and maintain. At small scale, that tradeoff is fine. In systems where latency matters and throughput is high, it becomes a bottleneck, and the cost compounds across service hops.

You’ve probably heard someone say, “microservices don’t work for low-latency systems; the network penalties are too high.” In many cases, the bigger problem is not the network call itself, but the synchronous communication pattern wrapped around it.

Multiplexing Work Instead of Queuing It

To reduce latency and improve throughput, high-performance systems remove waiting wherever possible. Instead of sending one request at a time, they allow multiple requests to be in-flight concurrently. A single connection can carry many active requests, responses are processed as they arrive, and throughput can increase without requiring a proportional increase in connections.

But this approach comes with a catch.

Trading Implicit Guarantees for Performance

Once you allow multiple requests to be in-flight, you lose the guarantees that synchronous communication provides.

  • Ordering is not guaranteed.
  • There is no implicit correlation between requests and responses.
  • A single request can time out while others succeed.

To gain performance, you trade off certainty. But latency-sensitive systems still need correct outcomes. Instead of relying on implicit transport guarantees, they have to compensate for uncertainty by managing request state, identifiers, and routing rules explicitly in the application layer.

In-Flight Request Tracking: The Core Challenge for Asynchronous Systems

Once those guarantees are gone, the next problem is correlation. In a synchronous flow, the response is implicitly the answer to the last request you sent.

That simplicity disappears as soon as requests overlap. Responses can arrive in any order, and failures don’t always line up cleanly with the original request.

  • If a response arrives, which operation does it complete?
  • If a timeout occurs, did the request fail, or is the response just delayed?
  • If you retry, are you issuing a new request, or duplicating work that already succeeded?

Rather than being edge cases to handle, these become core problems to solve.

That is where in-flight tracking comes in. Each request needs an identifier, active state, and a deadline timer so the system can match a late response or recognize that a response was not matched in time.

Correlation with an In-Flight Tracker

From a distance, this sounds like a simple map lookup problem. In reality, it has to stay correct while timeouts, retries, routing, and scale all work against that simplicity.

ISO8583: Payments as a Case Study in In-Flight Tracking

Payment systems have been operating asynchronously for decades, long before protocols like HTTP/2 or QUIC existed.

Many high-throughput card authorization systems process large volumes of transactions where latency is critical. When a card is swiped, the authorization request typically moves hop-by-hop through the seller’s acquirer, the card network, and the cardholder’s issuer.

Card Authorization Path (Simplified)

Every request represents a financial operation, so correctness is paramount, and every millisecond of delay impacts throughput, leaving customers waiting at checkout.

To achieve low-latency and high throughput, many payment switches use asynchronous communication over persistent TCP connections.

  • Once a connection is established, messages are written to the socket as fast as they come in.
  • In these designs, nothing blocks waiting for a response before the next request is sent.
  • Once the request is handed off to the socket layer, the system can already be moving on to the next one.

TCP sockets are bidirectional, so one execution path can write requests while another handles responses as they arrive.

ISO8583 Requests and Responses on One Connection

In these designs, many requests are in-flight at the same time, and responses can arrive in any order. That makes payment systems a useful case study for in-flight tracking and correlation.

Identifiers

The first problem to solve when tracking and correlating in-flight requests is identifying individual requests.

To solve this, ISO8583 messages include fields that help identify and correlate requests. One common example is the System Trace Audit Number (STAN). It is a six-digit number included in the message and returned in the response.

ISO8583
For a more detailed explanation of ISO8583 message structure, check out this article.

The STAN is typically created by the sender when the authorization request is built (i.e., the point of sale or payment gateway). It gives the sender a per-message identifier that is useful for local tracking and correlation.

That does not make STAN an idempotency control. Correlation keys help the system match a response back to the right in-flight request, while idempotency controls are used to prevent the same business operation from being applied more than once across retries or duplicate delivery.

Caveat

STAN is a good example, but it is not a globally unique identifier, and may not be sufficient for correlation on its own. Different payment systems may instead rely on retrieval reference numbers (RRNs), network-specific reference values, message type, timestamps, terminal identifiers, or unique identifiers carried in transport headers. In practice, correlation often depends on a compound key rather than a single field.

The exact identifier set, timeout behavior, duplicate-detection window, and reversal flow vary system to system, but the correlation problem is the same.

The key is not what the identifier is, or where it lives within a request. The key is that the correlation values are unique enough within their operating context, travel with the request, and return with the response. Sometimes that is one field; sometimes it is a compound key built from several values. Having correlation data that meets these characteristics creates a consistent way to match requests and responses.

How In-Flight Tracking Works

In the ISO8583 world, at any given moment, there may be many transactions active on the same connection. Each one has been sent, is waiting for a response, and must be tracked independently.

That state has to live somewhere. In general, it lives in an in-flight tracker.

In payment systems, the tracker stores the request as it is sent, then uses a correlation identifier (STAN in our example) to look up the original request when a response arrives.

ISO8583 Tracking with STAN as a Correlation Identifier

At a high level, the lifecycle is simple: insert state on send, look it up on response, and transition it on timeout. Every request that is sent needs to be tracked somewhere while it is in-flight, whether that is an in-memory structure or a shared store.

When a request is sent:

  1. A unique identifier is assigned (or extracted from the request)
  2. An entry is created in the in-flight tracker
  3. Metadata about the request is also stored alongside it

That metadata might include when the request was sent, where the response should be routed, and how the request should be completed (i.e., who to route it back to).

Generic In-Flight Tracker Lifecycle

Matching and Lookup

When a response arrives, the system extracts the identifier and performs a lookup.

If a matching entry is found:

  • The response is associated with the original request
  • The operation is completed
  • The in-flight entry is removed (and timers are cancelled)

Even on the happy path, correctness depends on a few guarantees: the lookup must be fast, the lookup-plus-state-transition path must be atomic or otherwise locked to prevent concurrent access issues, and the identifier must be unique enough to avoid collisions.

Expiration and Cleanup

Not every request will receive a response. Responses may be delayed, lost in transit, or never emitted because the remote system is unhealthy.

To handle this, the system must actively manage the lifecycle of in-flight entries. Each request needs a timeout, and when that timeout expires the entry usually transitions into a timed-out state rather than disappearing immediately.

A timeout does not necessarily mean the request failed. It means the request was not matched before the local deadline expired.

Timeout Before Response

The tracker tells you only that the request was not matched within the expected time window. It does not tell you whether the remote system failed, is still working, or already succeeded. In many designs, the tracker keeps that timed-out state long enough for a late response, retry, reversal, or reconciliation process to inspect what happened and respond safely.

State Transitions

The tracker usually stores more than just a request ID. It stores enough state to decide what to do when the happy path breaks.

For example, an entry might contain:

  • The original request payload to enable retries or reversals
  • The timestamp when the request was sent to calculate timeouts
  • The number of retries attempted so far to enforce retry limits
  • Routing information to know where to send the response when it arrives

In-Flight Entry State Transitions

Atomicity is critical for managing state within the tracker. Without it, timeouts and responses can both try to update the same entry at the same time, leading to race conditions and, in financial systems, serious consequences.

Reversals and Compensation

In card payments, timeouts and late responses are commonly handled with compensating actions.

If a request times out, the system may treat the transaction as uncertain and initiate a compensating action such as a transaction reversal to try to undo the operation. The exact behavior depends on the payment network, message type, timeout point, and local processing rules. That does not guarantee the original processing is undone, but it gives the system a way to self-correct when the outcome is no longer clear.

That introduces another problem: if the original response arrives after the reversal, as it can in an asynchronous system, the system still has to handle that response correctly. Issuer systems need to be able to identify duplicate requests, unmatched reversals, and out-of-order messages.

Visualizing the issue:

  1. At 12:00:00.000, a payment switch sends authorization STAN 100001 for $67.00 and stores it as pending.
  2. At 12:00:02.000, the local timeout expires before a matching response is received, so the switch marks the request timed_out and sends a reversal.
  3. At 12:00:02.300, the issuer approves the original authorization and sends a response back to the switch.
  4. At 12:00:02.600, the issuer returns a response for the reversal.

Reversal and Late Authorization Response

Compensating actions like reversals provide a way to handle uncertainty, but they also add complexity in both business logic and technical design.

Duplicate and Out-of-Order Messages

In the previous example, a timeout might be handled with a retry instead of a reversal. These retries and reversals introduce the possibility of duplicate and out-of-order messages.

Since retries are blind in asynchronous systems, you can end up with multiple requests for the same operation. In payment processing, this can lead to multiple charges for the same transaction.

This is where identifiers like STAN and other fields still matter, but for a different reason. They enable correlation and can contribute to duplicate detection when combined with other message attributes. That is not the same thing as an explicit idempotency control, which is designed to prevent the same business operation from being applied twice across retries.

If a reversal is issued and later a request arrives with the same correlation values, the system can recognize that it may be related or potentially duplicate traffic and handle it accordingly.

Scaling In-Flight Tracking

Tracking in-flight requests is straightforward on a single instance, where an in-memory tracker is fast and simple within one failure domain. But real systems scale across multiple machines, data centers, and geographic regions. At that point, the problem shifts from managing local lookups to managing state across a distributed system. This section covers three scaling challenges: keeping state local, sharing it regionally, and preventing race conditions.

A Local Tracker That Scales

Distributing request tracking across multiple instances or regions introduces latency, consistency, and network-failure concerns. One common way to avoid those challenges is to keep the in-flight tracker local to each application instance.

If a request is created on one instance, the simplest design is usually to make that instance the owner of the in-flight state and preserve affinity so the response returns to the same place. That keeps tracking fast, avoids extra hops, and reduces the number of systems involved in correctness.

Of course, this is easier said than done, but there are a few common techniques to preserve locality:

  • Connection affinity, where connectivity is structured so that requests and responses stay tied to the same instance, region, or availability zone.
  • Session or shard ownership, where a set of nodes owns the in-flight state for a given slice of traffic.
  • Deterministic routing based on identifiers, where consistent hashing, routing keys, or other techniques help requests reach the owning node, shard, or region and may help return traffic find the same owner.

If your use case allows it and you can preserve locality with routing, you can often keep the in-flight tracker in memory, avoiding the challenges of distributed state management. That is not the only viable design, but it is often the simplest one for latency-sensitive systems.

Local Tracker with Deterministic Routing

In the above design, ownership and affinity keep the request, response, and in-flight state inside the same region and the same application instance. Identifier-based routing is one way to help traffic find that owner consistently. You still have a distributed system, but you avoid turning the tracker into a globally shared dependency on the critical path.

Locality Limitations

In-memory and single-instance trackers work well when infrastructure can preserve locality and tolerate the loss of in-flight state on one instance. But modern applications often run within environments that are meant to be immutable and elastic. Instances come and go as load changes or failures happen.

If the in-flight tracker is in memory on a single instance, a single instance failure can cause all in-flight requests on that instance to be lost. The system therefore needs to be designed to handle that possibility. In practice, recovery usually means retry, reversal, reconciliation, or completion from shared state rather than another instance seamlessly continuing the same in-flight work.

If any instance can send a request and any instance can receive the response, the in-flight tracker often needs to be shared across instances or otherwise made recoverable through some form of owned or durable state. One common option is moving the in-flight tracker to a distributed store.

With a distributed store, requests can be tracked regionally, and responses can be matched regardless of which instance in that region receives them. That lets any local instance complete the request, but it also shifts correctness from per-instance locality into regional routing and shared-state coordination.

Regional Tracker with Shared State

In this design, deterministic routing is still doing critical work. It no longer has to return the response to the exact same instance, but it still has to return it to the same owning region so the response lands near the shared state it depends on.

Using a shared regional store is a common compromise for highly distributed, low-latency systems: share the tracker within one region so any local instance can access it, but avoid putting global coordination on the hot path. Other designs can work too, but they all still have to solve the same locality, latency, and atomicity problems in one way or another.

Avoiding Global Datastores

Modern distributed databases are impressive, and some offer global replication and multi-region consistency, which makes them tempting choices for a shared in-flight tracker. But this is a common trap for low-latency systems, as these databases put cross-region consistency and latency tradeoffs on the critical path of every request.

If replication is asynchronous, a globally replicated tracker store will lead to challenges with correctness. A request might change state (timeout) in one region, but a response might arrive in another region that has not yet seen that change.

If the database coordinates reads and writes synchronously across regions, each database call may pay that cross-region latency tax. Network packets are still bound by the speed of light.

Real World Latency
For a rough sense of the distance involved, WonderNetwork’s public ping data shows that a packet round trip between Los Angeles and New York can be roughly 50-80 ms, while Los Angeles to Sydney can be roughly 150-200 ms, before any application or database work is added: WonderNetwork ping statistics.

A low-latency system waiting for database reads and writes to cross the country or world on every request is often not acceptable.

Even if a system can tolerate the latency tax, cross-region communications and dependencies introduce more failure points that can degrade performance and potentially reduce availability.

Concurrent Updates and Race Conditions

The issue with atomicity and concurrency is not just a regional problem. Even within the same region, a distributed tracker store can reintroduce challenges around atomicity and concurrency.

If a response arrives on one instance while another instance is already processing a timeout for the same request, you can end up with a request that is marked as failed even though it actually succeeded. For a financial system, that might mean the point of sale gets a timeout and tells the customer to try again, but the card was actually charged successfully. Or it might mean the point of sale receives a success response even though the transaction actually failed.

Timeout/Response Race Without Atomicity

In-flight trackers require atomicity. Without it, late responses and timeouts can create conflicting processing paths that lead to incorrect outcomes. Selecting a distributed tracker store with support for atomic compare-and-set operations is critical to ensure that only one actor can change state at a time.

Timeout/Response Race With Atomicity

At small scale, in-flight tracking is mostly about speed and timeout handling on one instance. At large scale, it becomes a routing and consistency problem: responses need to arrive where their state lives, and only one actor can be allowed to win each state transition.

HTTP/2: Correlation at the Transport Layer

We have seen how payment systems solve correlation in application code and operational workflows. Now it is useful to look at how a modern transport protocol solves the same core problem differently, and where that abstraction stops helping.

The problem of asynchronous communication and in-flight tracking is not unique to payment systems. HTTP/2 solves the same core problem at a different layer, with different failure semantics. It enables multiplexed communication, but it handles correlation at the transport layer rather than in application logic.

Comparing ISO8583 and HTTP/2 is useful because the mechanics of correlation are similar, even though the layers of the stack are different.

  • ISO8583 generally solves correlation in the application layer.
  • HTTP/2 hides correlation inside the client or runtime.
  • ISO8583 timeouts often lead to reversals, reconciliation, or duplicate-detection flows.
  • HTTP/2 can cancel a stream, but that does not guarantee the remote system has not already processed the request or will change the business outcome.
DimensionISO8583 / payment systemsHTTP/2
Correlation layerApplication/message layerTransport/runtime layer
IdentifierSTAN, RRN, or compound business/message keysStream ID
In-flight state ownerApplication-owned tracker or shared tracker storeClient/runtime stream tracker
Timeout meaningOutcome may be uncertain and may require reconciliationLocal wait expired; business outcome may still be unknown
Compensation modelRetries, reversals, reconciliation, duplicate handlingStream cancel/reset at transport layer; business compensation stays in the application
Locality requirementOften explicit: routing and ownership must keep state reachableUsually local to one connection, hidden by the client/runtime

The contrast is useful because it shows how the same underlying problem of correlation and compensation exists across different layers of the stack, and how different protocols choose to expose or hide that complexity.

HTTP/1 and Ordered Responses

HTTP/1 feels simple because, in its most common usage, work is effectively handled one request at a time per connection.

HTTP/1
This reflects the dominant mental model of HTTP/1, not every legal optimization; connection reuse and pipelining exist, but pipelining is not universally supported.

HTTP/1 Ordered Responses

Multiplexing

HTTP/2 changed this by allowing multiple concurrent requests on a single connection.

With HTTP/2:

  • Multiple requests can be in-flight at the same time
  • Responses can arrive in any order
  • Work can be overlapped instead of queued

From a transport perspective, this is a major improvement. But HTTP/2 also needs to solve the same correlation problem that payment systems solve with in-flight tracking.

HTTP/2 Stream Multiplexing

Local In-Flight Tracking

Just like ISO8583 uses identifiers such as STAN, HTTP/2 uses stream identifiers. A stream ID is assigned when the request is created, and the HTTP/2 implementation uses that ID to match frames and responses back to the waiting caller.

The difference is where the tracker lives. With HTTP/2, most of the transport-level in-flight tracking is handled inside the client or runtime rather than in application code. The caller usually sees a simple client API that feels synchronous even though the underlying implementation is asynchronous and multiplexed.

HTTP/2 Local Stream Tracking

Even though the layer is different, the mechanics of correlation are the same: assign an identifier, track the request while it is active, and use that identifier to match the response back to the original request.

What the Protocol Doesn’t Solve

HTTP/2 solves transport-level correlation, but transport-level correlation is not the same as business-level correctness.

  • Timeouts still happen.
  • Retries still happen.
  • Responses can still arrive late and out of order.
  • Problems like duplicate messages, reversals, and compensation still need to be handled by the application.

If a request times out, the client may attempt to cancel the stream, but that does not mean the server has not already processed the request and committed changes. That cancellation is also best-effort: if connectivity is degraded or the timeout is local, the reset may never reach the server. A cancelled stream doesn’t necessarily undo the work.

If the request was a read-only operation like GET /user/123, retrying it is usually safe because the operation is read-only. But if the request alters state, like POST /charge, retrying it blindly can lead to duplicate charges or other unintended consequences.

HTTP/2 Cancel vs. Server-Side Completion

HTTP/2 makes asynchronous, multiplexed communication feel synchronous to the caller, but the underlying challenges around correctness still exist. It solves correlation inside one multiplexed connection; it does not solve business-level correctness when retries, cancellations, and late outcomes enter the picture.

Designing Asynchronous Systems

Asynchronous systems are performant, but they require careful design to ensure correctness. To achieve better throughput and lower latency, you are usually accepting complexity somewhere else in the system. It’s important to understand those tradeoffs and design for them explicitly.

Correlation Must Be Explicit

Every request needs a correlation key that travels across systems and boundaries. If you cannot reliably match a request to a response, the system cannot reliably determine what happened.

In-Flight State Must Be Reliable & Atomic

The in-flight tracker is not just a performance optimization. It is the authoritative correlation state while the request is active. That is different from the durable business record, ledger, or system of record that ultimately determines the business outcome. If the tracker is lost, corrupted, or inconsistent, the system can no longer reason about the request safely in real time.

Routing Must Preserve Locality

If request state is stored locally, responses must return to the same location. At scale, correctness depends as much on routing as it does on the data structure holding the state.

Design for Uncertainty

Asynchronous systems operate without perfect information. Timeouts do not necessarily mean failure, retries can create duplicates, and responses can arrive after the system has already taken compensating action. Correctness depends on how safely the system handles that uncertainty.

Design Checklist

If you are working on an asynchronous system, here are some questions to ask yourself:

  • What correlation key identifies the request, and is it unique enough?
  • Where does the in-flight state live while the request is active, and how does it survive instance failure?
  • What guarantees mutual exclusion for in-flight state, and how are race conditions prevented?
  • How do responses get routed back to the node, shard, or region that owns the state?
  • What level of in-flight state loss is acceptable, and how does the system recover from instance or regional failure?
  • If the outcome becomes uncertain, what compensating action, reconciliation flow, or idempotency control keeps the business operation safe?

Designing distributed systems that balance correctness, performance, and scale is hard, but understanding correlation and in-flight tracking is a critical part of handling the realities of asynchronous communication.

Further Reading