Rate limiting with PostgreSQL / YugabyteDB (token buckets function)

When you provide a service to users, but want to prevent spam, attacks slowing down your system, you need to implement an API rate limiting function to throttle the usage by a user after a specified number of calls per second. This can use counters in an in-memory database, like Redis. You can find good descriptions here and here. When your service is distributed, a shared counter is an anti-pattern for scalability. In this series, I'll show solution with PostgreSQL, and have it scale-out in YugabyteDB.

Token Buckets

The simplest solution is to store a counter per user, as a number of tokens. Each call will require a token, and the counter will be decreased. Every second, tokens will be added from the defined rate of tokens per second. Storing the last call timestamp is sufficient. The maximum tokens is based on the refill rate during, lets say, one minute. The first call for a user will initialize tokens with this maximum (the rate applied to this one minute).

Here is the PostgreSQL table to store it:

create table rate_limiting_token_bucket (
 id text primary key,
 tk bigint not null,
 ts timestamptz not null);
  • id identifies the user
  • tk is the amount of available tokens for this user
  • ts records the last time a token was retrieved In YugabyteDB the table is sharded on a hash of the first column, so this is distributed per user.

When there's no existing row for a user, I'll initialize with an INSERT to set the tokens tk to window_seconds*refill_per_sec-1 where window_seconds is the maximum (I'll use 60 seconds) and refill_per_sec is the number of tokens allowed per second. -1 is the token taken for the current call.

When there's already a row with a previous amount of tokens, I'll UPDATE to set tk to tk-1+refill_per_sec*extract(epoch from clock_timestamp()-ts). tk-1 takes the token for the current call. extract(epoch from clock_timestamp()-ts) is the number of seconds since the last call for this user, which I multiply with the rate of tokens acquired per second refill_per_sec. The new value of tk will be bound between -1 (there's no debt) and the maximum window_seconds*refill_per_sec with greatest() and least() functions (there's no unlimited refill).

With INSERT and UPDATE I'll use RETURNING to get the remaining number of tokens. A value of -1 means that we have no remaining tokens. The application may refuse the next calls, or wait to get new tokens refilled. Postitive calues allows the API access for the user identified by id because the requested token was acquired.

As we can have concurrent calls for the same id this reat-then-write operation must be atomic. On a call, I will attempt the UPDATE first (the most probable) and, when row is not found, will insert it. This must run in a SERIALIZABLE transaction. Be careful if you implement this in another database, Oracle does not provide this isolation level. I've written the first post of this series to clarify this.

Here is my function:

create or replace function rate_limiting_token_bucket_request
 (rate_id text, refill_per_sec int default 2
 , window_seconds int default 60) 
 returns int as $$
declare
 new_tk int:=null; -- stores the remaining token to return
begin
    -- take 1 token and add refilled ones 
    -- (returns the remaining tokens, or NULL if id is not found)
    update rate_limiting_token_bucket
     set ts=now(), tk=greatest(least(
      tk-1+refill_per_sec*extract(epoch from clock_timestamp()-ts)
      ,window_seconds*refill_per_sec),-1)
     where rate_limiting_token_bucket.id=rate_id 
     returning tk into new_tk;
    -- fill initial window if first call if id was not found
    if new_tk is null then
     insert into rate_limiting_token_bucket (id, tk,ts) 
      values (rate_id,window_seconds*refill_per_sec-1,clock_timestamp())
      returning tk into new_tk;
    end if;
    -- return the remaining tokens
    return new_tk;
end; $$ language plpgsql;

If we got caught by the myth that stored procedures are bad, please keep in mind that this is not business logic. It is a purely technical function that encapsulates the operation that can be done on the table.

Here is an example where I take a token every 100 milliseconds, with a refill rate of one token per second, and an initial fill of 10 seconds:

yugabyte=truncate table rate_limiting_token_bucket;
TRUNCATE TABLE
yugabyte=# set default_transaction_isolation=serializable
SET
yugabyte=# select rate_limiting_token_bucket_request('user1',1,10);
\watch 0.1

 rate_limiting_token_bucket_request
------------------------------------
                                  9
(1 row)

Mon 03 Jan 2022 10:52:58 AM GMT (every 0.1s)

 rate_limiting_token_bucket_request
------------------------------------
                                  8
(1 row)

Mon 03 Jan 2022 10:52:58 AM GMT (every 0.1s)

 rate_limiting_token_bucket_request
------------------------------------
                                  7
(1 row)

Mon 03 Jan 2022 10:52:58 AM GMT (every 0.1s)

 rate_limiting_token_bucket_request
------------------------------------
                                  6
(1 row)
                                                                                                                                                    [105/9827]
Mon 03 Jan 2022 10:52:58 AM GMT (every 0.1s)

 rate_limiting_token_bucket_request
------------------------------------
                                  5
(1 row)

Mon 03 Jan 2022 10:52:58 AM GMT (every 0.1s)

 rate_limiting_token_bucket_request
------------------------------------
                                  4
(1 row)

Mon 03 Jan 2022 10:52:58 AM GMT (every 0.1s)

 rate_limiting_token_bucket_request
------------------------------------
                                  3
(1 row)

Mon 03 Jan 2022 10:52:59 AM GMT (every 0.1s)

 rate_limiting_token_bucket_request
------------------------------------
                                  2
(1 row)

Mon 03 Jan 2022 10:52:59 AM GMT (every 0.1s)

 rate_limiting_token_bucket_request
------------------------------------
                                  1
(1 row)

Mon 03 Jan 2022 10:52:59 AM GMT (every 0.1s)

 rate_limiting_token_bucket_request
------------------------------------
                                  0
(1 row)

Mon 03 Jan 2022 10:52:59 AM GMT (every 0.1s)

 rate_limiting_token_bucket_request
------------------------------------
                                 -1
(1 row)

Mon 03 Jan 2022 10:52:59 AM GMT (every 0.1s)

 rate_limiting_token_bucket_request
------------------------------------
                                 -1
(1 row)

Mon 03 Jan 2022 10:52:59 AM GMT (every 0.1s)

 rate_limiting_token_bucket_request
------------------------------------
                                 -1
(1 row)

^CCancel request sent

yugabyte=# -- waiting 4 seconds

yugabyte=# \watch 0.1
 rate_limiting_token_bucket_request

------------------------------------
                                  3
(1 row)

Mon 03 Jan 2022 10:53:05 AM GMT (every 0.1s)

 rate_limiting_token_bucket_request
------------------------------------
                                  2
(1 row)

Mon 03 Jan 2022 10:53:05 AM GMT (every 0.1s)

 rate_limiting_token_bucket_request
------------------------------------
                                  1
(1 row)

Mon 03 Jan 2022 10:53:05 AM GMT (every 0.1s)

 rate_limiting_token_bucket_request
------------------------------------
                                  0
(1 row)

Mon 03 Jan 2022 10:53:05 AM GMT (every 0.1s)

 rate_limiting_token_bucket_request
------------------------------------
                                 -1
(1 row)

The initial tokens were 10 (10 seconds at 1 per second) so 9 were remaining after my first call. When they were exhausted (remaining -1) I canceled the calls, waited 4 seconds, and started again the loop. During that time, 4 tokens were refilled, so 3 remained after the first call...

Race condition in PostgreSQL

What if an attack runs multiple calls in parallel? Thanks to SQL isolation level SERIALIZABLE, and the enforced PRIMARY KEY constraint, this can scale while keeping exact logic. The calls from one user will either wait their turn to get the token, or fail if they don't want to wait.

I've been running this on PostgreSQL from 8 concurrent sessions getting a token every millisecond:

postgres=# set default_transaction_isolation=serializable;
postgres=# select rate_limiting_token_bucket_request('user1',1000,3600);
postgres=# \watch 0.001

This scales with all sessions connected to the same server. PostgreSQL uses pessimistic locking to serialize the concurrent UPDATE statements. It can wait (but this is fast operation) but will not fail. And data will always be consistent without the need for additional code or test. Of course, you must stress test concurrent transactions for performance, but the consistency of data is guaranteed.

Race condition in YugabyteDB

If you need an API rate limiter, you probably have a highly available service that you want to scale. A monolithic DB limits the write activity to one node only. And even if reads can be offloaded to standby nodes, ACID properties are not enforced across nodes. And in case of any failure on the writer node, it takes time to failover to the standby. This recovery time is not exactly the rate limiter that you wanted šŸ˜‰

This is where you will need a distributed database. The same code works on YugabyteDB.

yugabyte=# set default_transaction_isolation=serializable;
yugabyte=# select rate_limiting_token_bucket_request('user1',1000,3600);
yugabyte=# \watch 0.001

But there's a difference when executing the same from my 8 sessions:
many sessions with Transaction aborted
Only one "watch" loop remains running after a while. The others have failed at some point with Operation expired: Transaction ... expired or aborted by a conflict: 40001 or ERROR: All transparent retries exhausted. Operation failed. Try again: Value write after transaction start or Operation expired: Transaction aborted: kAborted

Those errors are expected. Here it failed because my \watch command stops on first exception but an application should be able to retry when encountering those errors. The YugabyteDB server or client driver may be able to retry itself, but some transaction conflicts must be managed by the application. The reason for that is that we don't want to wait with pessimistic locking. This cannot scale on a distributed database where the goal is to avoid single point of contention. And the probability of transaction conflict (two user calls getting the token at the exact same time) has a low probability of happening. It must be handled when encountered, but without slowing down all other calls with pessimistic locking.

This is where de design of the rate limiting service must consider the distributed SQL database. We will see that in the following posts on this series. I'm taking this example to show the general design considerations for a distributed SQL database. And to show the power of SQL, of PostgreSQL, of YugabyteDB, to solve a problem where you may think that NoSQL and in-memory databases like Redis may be more appropriate. In the next post I'll call the function from Java.

38