Ehi! What's up? Feedback to users' requests in distributed systems

One of the most asked questions during my distributed systems related presentations or training is something like:

How can we handle feedback to users’ requests if the processing of these requests is asynchronous?

In essence, the problem is the following:

A user sends a request from the application user interface, let’s say a web UI
The request is dispatched by the receiving web server to a queue as a message
A backend service dequeues the request message and processes it
Once the backend has completed processing the request message, it publishes a response message to the queuing system

In this scenario, how do we correlate the original incoming web request back to the message published later by the backend service? And once we can do that, how can we send this information back to the user interface to notify the user that initiated the task? Let’s dive into both problems; they are related, and they hide many nuances. When moving from node to node, from endpoint to endpoint, or from service to service, asynchronous requests lose their context. For example, when a web application user sends a request, the receiving web server keeps the user context until it completes the request processing. Once completed, e.g., by dispatching the same request as a message onto a queue, the incoming request HTTP context is lost. To reply to the originating user, the system needs to track the context when messages and data move across endpoints. Tracking context can be done using a technique called message correlation. The first endpoint in the system that generates a message, the web server in our sample, appends a new correlation ID to the outgoing message. The promise is that each processing endpoint copies the same correlation ID to messages sent or published in the context of the incoming message. Using the sample at the beginning of this article, the flow would be:

the web server
- receives the user request
- generates a new correlation ID, e.g., new GUID
- converts the user request into a message
- stores the newly generated correlation ID into a message header
- sends the message
- sends the generated correlation ID back to the user to track the request
the backend service
- receives the message
- processes the request
- creates the response message
- copies the correlation ID from the incoming message headers to the outgoing message headers
- sends the response message

Now, suppose this process of coping the correlation ID to the outgoing messages is automatically performed by every endpoint handling messages in the system. In that case, it doesn’t matter how many endpoints process requests before the web server handles a final response. The web server will always correlate a response message to a user request that originated that flow of messages. Web servers fail all the time and cannot rely on information stored in memory. For this reason, the correlation ID generated by the web server needs to be persistent. At the same time, the persistence storage used by the web servers needs to be shared across web server’s scaled out instances; any instance could be the one processing the response message.

Q: Is it enough to persist the correlation ID at the webserver level? A: Yes, it is.

What’s riskier is to generate the correlation ID at the web server. We want to avoid the situation where a user request cannot be correlated back to a user. How can that happen? It might be an edge case; the user sends the first request and never receives an ACK response with the generated correlation ID. The system can process the request, but users have no correlation ID to understand if and when the processing completes. The correlation ID generation needs to move to the client. For example, in a web application, the correlation ID generation must happen in the browser or the web application client. By generating and storing the correlation ID in the client application, the user can always correlate any response to issued requests. In any case, it’s essential to store the correlation ID at the web server level too, because web servers can be scaled out, and users cannot.

Whatchu talkin’ ‘bout?

Let’s move our point of view to the user. The user issued a request and is now waiting for a response. The user holds the correlation ID and can use it to gather information about the ongoing backend processing. The simplest thing the user can do is continuously poll the web servers asking for the status of the request identified by the correlation ID they initially generated and stored. Polling might be inefficient but let’s assume for now it is the only option. The polling request can be handled by any web server instance, not necessarily the one that accepted the first request; for this reason, web servers need to keep track of the request correlation ID by storing it in shared persistence storage. On the other hand, users are not scaled out into user instances. A user is a single user, in which case it’s enough to store the correlation ID as a cookie or in the browser’s local storage. It doesn’t need to be replicated. In a more advanced scenario where the user can start a request on device A and finish it on device B, a physically different device, the correlation ID can be stored in the user’s profile on the server. Still, it’s a single user.

This being said, the fact that web servers can be scaled out, and users are single instances opens up another interesting problem to handle. In most cases, we don’t want to use polling techniques to check the status of a request; this is because of polling’s inefficiency. Imagine a request that takes 10 minutes to be fulfilled and a polling interval of 10 seconds; we’re going to “annoy” the backend infrastructure 59 times before getting the answer we’re looking for. These days the de facto solution is to use WebSockets. It’s the web server’s job to call you back with information, instead of you continually pinging them for information. Our sample scenario means that when the backend system completes the request processing, it sends a notification to the user device, e.g., the browser, using a WebSocket message. Web servers are scaled out into instances, and users are not. WebSockets are linked to users, not to web servers. Let’s see what the implications of this relationship are:

The web server accepts the user request and, after performing all its ceremonies, sends a message to the backend service
The backend service completes the request processing and sends a response message back to the web server.

Q: To which web server instance?

If web servers are scaled out and use a queue to communicate with backend services, there is no guarantee that the backend service response message goes to the same web server instance that sent out the initial request. If there was no user, and no WebSockets, involved we had no problems; scaled out instances can be designed to be stateless, and thus, the particular instance that handles a message is not important. But a device using WebSockets connects directly to a specific web server instance. In this scenario, if the user sends the initial request to web server instance A, and web server instance B is handling the response message from the backend service, there is no user’s WebSocket connected to instance B. The status is lost because web server instance B is missing some context or information to talk back to the user’s device.

Backplanes to the rescue.

The solution to this second set of issues is to use a backplane. Backplanes are a side communication channel, a set of scaled out service that instances can use to communicate among themselves. From the main communication channel perspective, the one used to share information across the entire system, a set of scaled out instances wants to appear as a single logical unit. For this reason, they are usually referred to as a logical service. From a sender’s perspective, it doesn’t really matter which instance will handle a request; they route the request to the logical destination. The same happens from a recipient’s perspective. It doesn’t matter which instance sent the request; most of the time, it’s the logical sender that is the only thing that matters. There are cases, such as the WebSockets one, in which the physical instances matter a lot. A failure in addressing the correct instance breaks the whole communication chain.

A simple solution would be to address the response to the web server instance that initially handled the user request. However, that’s not so simple. The web server instance might not exist anymore; the user WebSocket connection might have dropped and reconnected to a different web server instance. Many things can happen to make routing the response to a specific instance less effective than expected. Not to mention that allowing senders to route to a specific instance highlights one of the fallacies of distributed systems: topology doesn’t change. Topology changes all the time, and there is no easy and cheap way to guarantee that recipients will be where senders expect them to be.

Backplanes hide the recipients’ topology from senders. When one of the scaled-out instances receives a message from the system, it notifies all the other instances through the backplane. The backplane could be using the exact same mechanism the system uses to communicate to peers in the logical service; it could be using a message/queue-based approach or a completely different one, for example, Redis pub/sub.

Once the backplane is in place, the full communication flow is this:

The client:
- generates a unique identifier to use as a correlation ID for a request
- creates the request payload and appends the identifier
- stores the identifier locally
- sends the request to the web server
the web server instance
- receives the user request
- converts the user request into a message
- stores the received correlation ID into message headers alongside the connected WebSocket client ID
- sends the message to its respective queue/handler
- sends an ACK message back to the user acknowledging the request (e.g., 202 Accepted)
the backend service instance
- receives the message
- processes the request
- creates the response message
- copies the correlation ID from the incoming message headers to the outgoing message headers
- sends the response message back to the web server
the (possibly different) web server instance:
- gets the correlation ID from the incoming message
- looks for the stored correlation ID and client WebSocket ID
  - if it cannot find it, someone else has already dealt with the response, and this attempt is probably a duplicate message and can be discarded
- checks if the client is connected:
  - if it is, uses the WebSocket connection to inform the client about the status
  - if it’s not, uses the backplane to propagate the received information to other web server instances.

The elephant in the room

One aspect that we haven’t touched is how to handle eventual consistency results at the user interface level. This probably deserves its own set of articles but my colleague Dennis Van Der Stelt does an excellent job in a presentation he gave at Oredev.

Wrap up

As you’ve probably noticed by now, a lot is going on. Handling a request asynchronously and, at the same time, providing feedback to the user brings a lot of complexity into the system. It’s essential to evaluate if all this complexity is required; all the moving pieces need to be governed and maintained over time. One of the biggest lessons I’ve learned in my career is that using the same technique, architecture, or even technology in the entire system is the worst thing we can do. We’ll end up with complexity where we don’t need it and simplicity where too simple is harmful. Even if it might seem more complicated initially, it’s worth evaluating each scenario and deciding how to design that specific scenario in isolation. I’m sure there will be a few scenarios where asynchronicity is required and many others where a much simpler approach is a better fit.

Photo by Celpax on Unsplash