Idempotency
Intent
Idempotency, or idempotence, is a mesmerising word the first time you come across it. It is a concept rooted from abstract algebra in mathematics. Even though it may help understanding from a mathematical perspective, idempotency can be simply put as:
The net result of multiple applications of the same method remains the same.
Why is idempotency important, or relevant at all for RESTful APIs? First, let us start by inspecting a real world example.
Imagine an online payment scenario, the website suddenly becomes unresponsive and you have no idea if the payment you just made was successful or not. The temptation is to keep pressing the payment button, despite being advised not to. Soon you will find yourself swarmed by deliveries and a long bill.
Because networks are fundamentally unreliable in a distributed world, disruptions to connections will happen. In the circumstances that an API consumer sends a request but is unable to receive a response, it is impossible for the consumer to know if it is due to:
The initial connection failed to connect to the server, or
The request failed halfway while the server is fulfilling the operation, or
The request is completed successfully, but the connection breaks down before a server response can reach back to the consumer.
In theory, “exactly-once” delivery is impossible, a distributed systems challenge illustrated perfectly by the Byzantine Generals Problem. Therefore, to make sure the request is processed, the consumer has to choose the second best option “at least once” by resending the request.
In order for an API service to function correctly and consistently in a distributed environment, i.e. to be idempotent, the service needs to handle retry requests safely while guaranteeing the result is the same.
Idempotent APIs
Idempotent APIs, meaning the APIs can be called any number of times while still guaranteeing the effect only occurs once, allows consumers of the API to assure the convergence of its own state with the server by retrying upon receiving a signal of error. It also gives clients the confidence of being able to retry safely.
There are a number of measures to ensure the idempotency of an API: 1) use of idempotency key, 2) consistent implementation of idempotent and safe methods and 3) relevant and definitive response code.
Figure 1. API Request Response Flow with Idempotency-Key
Idempotency Key
As proposed to IETF, there are a few key elements to implement idempotency, starting with the idempotency key. This is a unique ID generated by the client, and sent as part of the request header, to help identify the particular operation. With an idempotency key, the server can know if the request is idempotent or not by keeping track of the keys sent by the client.
Here is a list of key considerations to take full advantage of the idempotency key:
Use an HTTP header idempotency-key
Validate requests with the same idempotency key or different requests with the same idempotency key (bad behaving idempotency key), to prevent server from making repeated operations
Invalidate idempotency key after a set period of time, so that a request can be treated as a new non-idempotent request
Idempotent and Safe Methods
When considering implementing functionalities, some HTTP methods need to be idempotent by default according to HTTP semantics, including PUT and DELETE. To put it into context, the reason why methods like PUT needs to be implemented idempotently is because: it signals a target resource should be created or replaced entirely with the requested content. Hence, an expensive operation that should avoid being executed repeatedly.
Whereas a safe method means the HTTP method should not cause changes that will alter the state of the server, typically a read-only operation, such as GET. All safe methods are idempotent, but not all idempotent methods are safe.
You can find a comprehensive list of HTTP request methods and their safe, idempotent definitions following Mozilla documentation.
Response Code
To make APIs idempotent complete, it needs to be complemented with correct response code to give clients the correct signal. An example of that would be: the initial call of a DELETE will likely return 200, while the subsequent ones will likely return 404.
Idempotent RESTful API Interactions
Here are a few examples of idempotent API interactions:
GET /trades HTTP/1.1 is idempotent. The same result should be returned and result in no changes on the data or server, when calling the method several times in a row:
GET /trades HTTP/1.1
GET /trades HTTP/1.1
GET /trades HTTP/1.1
POST /addTrade HTTP/1.1 is not idempotent; if it is called several times, it adds several trades with exactly the same detail:
POST /addTrade HTTP/1.1
POST /addTrade HTTP/1.1 -> Adds a 2nd trade
POST /addTrade HTTP/1.1 -> Adds a 3rd trade
DELETE /tradeId/delete HTTP/1.1 is idempotent, even if the returned status code may change:
DELETE /tradeId/delete HTTP/1.1 -> Returns 200 if trade ID exists
DELETE /tradeId/delete HTTP/1.1 -> Returns 404 as the entry just got deleted
DELETE /tradeId/delete HTTP/1.1 -> Returns 404
Idempotency in Event Driven Architecture
As event-driven architecture is a distributed system itself, which inherited the fallacies of distributed computing, namely unreliable networks. But worry not, this conundrum can also be overcome with the same principle by including a unique ID for the broker or services to deduplicate retries (different retry patterns will be covered below), or a combination of multiple identifiers, such as message ID, timestamp, etc.
Popular messaging systems, such as Kafka, have the built-in capability to eliminate duplicate messages. Specific to Kafka, the idempotent producer will emit the message with a combination of PID (Producer ID) and SqNo (a monotonically increasing sequence number) to help consumers decide whether to take or reject the message. In principle, if a lower sequence number is received by the consumer, the message will be discarded to avoid duplications.
Figure 2. Idempotent Kafka Message with Producer-ID and Sequence Number
Retry Patterns
We will explore how APIs should be made idempotent to ensure the “exactly once” operation on the server side, now let us take a look at how the client needs to leverage the capability.
To deal with the transient failures in distributed systems, it is recommended that the client should retry its request to get to a successful response, without worrying about duplicated operations on the server side.
Simple Retry
Even though it may be obvious that: the client can choose to keep retrying, at a set interval, with delays between each retry, this is not necessarily the best solution and could be seen as a bad behaving client, or even a DDoS attack.
Exponential Backoff
Exponential Backoff has a long and interesting history in computer networking, it is a mechanism for a client to block for a brief initial wait time on the first failure. If the operation continues to fail, it waits for a time period proportional to 2^N (two to the power of N, where N is the number of failures.
In the circumstances that failures were caused by overload or connection failure, backoff policy may not work as it should, because at each retry, the server will still have to deal with the exact same amount of traffic that has caused the meltdown. Therefore, adding a jitter to each of the retries is another improvement on the server side to cope with client retries gracefully without being hammered at the same time.
Figure 3. Exponential backoff with jitter
Considerations
Now we have explored a couple of ways to deal with failure API calls, it is time to put some conditions and considerations on when to use them.
The retry patterns will only work if dealing with transient failures, i.e.
momentary loss of connectivity to services,
temporary unavailability of the service,
or timeouts due to service being busy.
The patterns will not be as useful, if:
The error is likely to be long lasting. The client will end up wasting time waiting for a response or operation to complete, which will have knock-on effects on its ability to serve upstream requests. Consider using the Circuit Breaker pattern instead in this case.
The error is caused by a logical error in the service, in which case, the error should be handled as an exception and reported.
Retry is not a mechanism to circumvent scalability issues in a system. It is often a signal there are significant resource constraints, and you should consider scaling up.
Comments
Post a Comment