Serverless at Superbet

TL;DR - We've used AWS Lambda for sports modelling at Superbet for over three years now. This is the story of how we came to use Lambda, and why we're likely to extend our usage of it in the near future. AWS managed services offer incredible cost and productivity benefits, and dev teams should be aggressively exploring how they could be used in different areas.

I thought I would talk about the use of serverless computing at Superbet.

In general Superbet is an AWS + Kubernetes + Kafka shop, as befits an organisation which serves hundreds of thousands of customers - I've heard Kubernetes described as an organisational pattern which allows large companies to ship products at scale, and I think that's a useful definition. But development at Superbet isn't a monoculture - there are groups which are using different technologies, within the overarching AWS framework, groups which use different stacks, as their particular use case may warrant. That includes the modelling group, where we've used AWS Lambda successfully since 2018.

So how did we end up using serverless technologies, and what has our experience been like ?

I should talk a little about what serverless computing means, and also what it is not.

The first thing is that it doesn't mean there are no servers - that would be ridiculous! Lambda uses servers just like EC2 and other AWS compute frameworks do, in fact they probably sit alongside one another in an AWS data centre somewhere. The difference lies in the fact that whilst EC2 makes you responsible for managing your servers, AWS will manage your Lambda instances for you. Specifically, if you send a request to Lambda, AWS will spin up an instance for you, process your request, and then shut that instance down.

Think about that for a moment.

It's quite different from a standard server, which will need to exist before you send it a request, and which will continue to exist after that request has been served (as long as you don't crash it!). That small difference has quite profound consequences for application development, however. We'll get into the pros and cons of this approach shortly, but for now, simply note that a Lambda is off by default - nothing exists until you actually send the service a request.

In 2016 I co- founded a startup called ioSport, focused on providing "algorithms as a service" for the sports betting industry.

The idea was that a third party, particularly one with experience of financial derivatives and options trading, might be able to provide bookmakers with better product pricing and design than they had access to internally. We were lucky enough to be introduced to Sacha Dragic, who felt we would be a good fit for his then- fledgling online business, and in 2019 Superbet acquired the ioSport team.

When we joined, we found the tech landscape inside Superbet to be very different from what we had formerly used as an independent B2B supplier. ioSport had invested heavily in Python for core modelling, and Erlang for the engineering around those models. Superbet were starting from scratch with a Kafka, Docker Swarm (latterly Kubernetes) and Golang stack. There were a number of commonalities, but also a number of big differences, notably the scale at which we were now expected to run.

How were we going to marry the two stacks together ?

I should talk a bit about what a "model" means in this context.

A model, in the sports betting space, is an algorithm to which you give relatively small amounts of input information (for example in football - recent results, recent market prices, relative strengths of teams) and in return have it spit out a large amount of "correct"(*) probabilities for different markets and selections - for each team winning/drawing/losing, for different score outcomes, for the total numbers of goals scored, and many more. Bookmakers take these probabilities, add margins and make them available to customers.

But having a model alone isn't sufficient to run a bookmaking operation. You need a lot of engineering around a model to make it work in production - to run models in parallel when lots of matches are on at the same time, to pass the live state of matches into models, to visualise how models are performing, to handle model errors - this "boring" code can be more than 80% of the total codebase for a full end-to-end solution.

(*) - "correct" is a tricky term here. Does it mean "probabilities that other bookmakers would agree with for this match ?" Or "the real world probabilities for this match, if it were played thousands and thousands of times ?" One for a separate blog post!

Why did ioSport choose Python and Erlang ?

Python was a relatively easy choice for the core models, as in 2016 it was already the lingua franca of the data science community. It's not the fastest language but it has perhaps the biggest sweet spot of all languages, in terms of the different problem domains it can successfully tackle. Its performance is often criticised, but matrix and stats libraries such as numpy and scipy (whose cores are written in C) have taken a lot of this pain away; and if this is not sufficient, a commonly used strategy is often to prototype an algo in Python and then rewrite in C or Go for better performance.

Erlang was a more controversial choice. We would have liked to write everything in a single language for the sake of productivity, but life is rarely that simple. And indeed Python's concurrency story - its ability to do stuff in parallel - was fairly abysmal in 2016 (it may have improved since then). Erlang, by contrast, excels at concurrency - it was designed by Ericsson to run telephone exchanges, to handle tens of thousands of phone calls in parallel. So although having a quirky syntax and small developer community, Erlang's potential ability to run large numbers of models in parallel made it a very attractive option.

All this kit was hosted on large individual EC2 machines - which worked, but primarily because we only had to offer prices in the top leagues to our pre- Superbet clients. At Superbet there would be no such artificial limits, we would be expected to price anything and everything.

How were we going to outgrow the limits imposed by our single machine architecture ?

The first thing we figured out was that we had a different kind of scalability problem to other areas within Superbet.

If you're working on the core betting engine, your scalability problem is customer- led - how to handle hundreds, thousands, millions of customers. The thing about this kind of traffic is that it increases relatively predictably - tens of thousands of customers don't appear overnight, and even if they do (for example a big event like the Champions League final), a container based system works pretty well because you just configure more nodes (and partitions) to handle the extra volume.

The model scaling problem is slightly different. For one thing there are a finite number of teams in the world playing a finite number of games, which puts a theoretical ceiling on the amount of resources you need. For another, there are well defined "epochs" within the lifecycle of a match.

Pre- event - before the game has started - match prices don't tend to change that much. The probability of team X winning might drift up and down a couple of percentage points in response to team news, but that's about it. That translates into relatively few pre- event model calls.

Once a match goes in- play (when the referee blows the kickoff whistle), everything changes - prices whip around in response to goals being scored, cards being awarded, really in response to any kind of match event information coming from the state feed. Even if there are no events coming in, prices must be recalculated constantly as they "decay" (like option prices) with the tick of the match clock. So the number of in- play model calls might be tens or even hundreds of times bigger than the equivalent pre- match number.

You might call this kind of scalability "burst" scalability. And yes, you can achieve it by aggressively auto- scaling EC2 machines, but in 2016 this was something of a black art. Even in 2021, with scaling responsibilities delegated to container management layers such as Kubernetes, it remains "not simple". But Lambda, with its "off by default" nature, offered the possibility of simple out-the-box auto- scaling as far back as its inception in 2014.

It was too tempting not to try.

Now at this point people tend to mention "cold starts". A Lambda process can't be spun up instantly (they say), there is always a delay whilst your request is processed, your machine provisioned, configured etc. Doesn't this impact your response time, and your customer experience in turn ?

It's important to recognise that cold starts are indeed an issue - it's impossible for Lambda to compete directly with a traditional server which is ready and waiting to receive requests. But one should also acknowledge significant steps AWS have made to reduce the scale of the cold starts problem over the last couple of years.

The first was the introduction of Provisioned Concurrency. You can now configure your Lambdas (at a cost!) so that a portion of them are "warmed up" and ready to receive requests. Lambda isn't literally one-process-per-request; processes "hang around" rather than being immediately killed, in case an initial request signals the start of a flood. Which is great, but when you think about it, provisioned concurrency is really only a band aid over the cold starts problem - if one goes down the route of having processes constantly warmed up, how is the product any different from a regular server ?

The real goal should be to get the speed of spinning up a new Lambda as close to zero as possible. The Erlang virtual machine is an interesting benchmark here. Erlang uses the concept of microprocesses, which exist independent of the main spawning process, and which can be spun up at a rate of thousands per second. Lambda is slightly different because you are talking about spinning up entire new OS processes across a fleet of machines, OS processes being slower to start than microprocesses.

AWS took an interesting step in Erlang's direction in 2019 with the introduction of Firecracker (upon which Lambda, and similar services such as Fargate, are now based). From the launch blurb -

...Firecracker (is) a new virtualization technology that makes use of KVM. You can launch lightweight micro-virtual machines (microVMs) in non-virtualized environments in a fraction of a second, taking advantage of the security and workload isolation provided by traditional VMs and the resource efficiency that comes along with containers.

Now I am in no way an expert in KVMs (kernel virtual machines). But I have run simple Cloudformation demos in which hundreds of Lambdas are spawned in parallel to perform map- reduce style calculations. And whilst not quite at Erlang levels, the speed and scale of Firecracker in spinning up Lambda processes is impressive - I've clocked hundreds in a couple of seconds. And the nice thing is that because Lambda is an AWS managed service (and because cold starts remain a high profile issue for customers) it's quite likely that Firecracker performance will continue to improve in the future as AWS tweak the product - all of which you get for free as a Lambda user.

The Python parts of our stack could now be migrated from the older "managed directly by Erlang" pattern, to Lambda. We had to migrate some of our Python dependencies to Lambda layers, to satisfy the quirks of Lambda's package management system, but otherwise the process was fairly smooth, The Erlang engineering was then able to call and trigger Lambda models via an AWS library called erlcloud. So that just left the core Erlang engineering.

This is where most of the interesting migration work lay. In any large Erlang application you tend to have a lot of supervision trees (very similar to Kubernetes - here's an interesting article by Jose Valim, author of Elixir, on the similarities between the two) spawning a lot of independent Erlang processes, all communicating using native Erlang messaging. This model didn't translate directly into our cloud environment, because there we had Docker Swarm/Kubernetes doing the process management, and Kafka doing the messaging. And Kafka messages were being partitioned as part of the scaling process, which wasn't something we had considered before.

So effectively we had to rip out the node management and messaging part of our applications, to be left solely with the core business logic. And in the process of doing so, there was no longer any "glue" left to bind this logic together as a single application - instead we were left with multiple independent applications, which it now made sense to deploy as multiple instances (one per partition), and store in independent repos. But still in Erlang - one of the nice things about the modern cloud environment is that you can pretty much choose whatever runtime you wish.

And this setup has served us very well for the past two years. We have had a lot of commercial success with the launch of the Superbets product in Romania and Poland this year, and one of the things I am most proud about is that we have had little to no downtime in the service (famous last words) - Erlang in particular has gone some substantial way towards confirming its nine nines uptime reputation.

But these two years have also given us the chance to reflect on the pros and cons of our stack; on what we might change if we could do things differently. Because no piece of software lasts forever, technical debt always accumulates, and the best medicine here is usually constant low levels of refactoring to keep things in shape.

So which bits of the stack do we intend to keep and what are we looking at changing ?

The first thing to say is that developers, particularly the quants (the modelling team), really like Lambda's ease of use. If you know a little Cloudformation and bash you can have a Lambda function up and running in a couple of minutes, complete with logging and performance metrics. It's not all perfect - testing locally remains difficult because you tend to need to mock AWS production primitives with a library like moto. But startups such as SST and Dashbird are working on the pain points of the development experience, and the outlook looks bright.

The second thing is that the "off-by-default" nature of Lambda really shines in our monthly bills. Despite millions, possibly tens of millions of Lambda calls in a single month, the Lambda proportion of our overall AWS bill is barely into double digits, percentage- wise.

The flip side of that last statement is however to shine a light on other areas of our stack, where we have thus far failed to implement auto- scaling as aggressively as we might like.

It is, for example, a constant frustration to see a large, expensive "floor" to our EC2 bill, in days or weeks or months where there is little sporting activity in the calendar. As I mentioned earlier, auto- scaling isn't that easy, even with the introduction of Kubernetes. In our initial experiments we've found that many of our core apps don't like being aggressively auto- scaled - they fail to start properly or can't find the data they need on startup. Like unfit bodies being asked to do yoga, they groan and complain - it's clear that a number of them weren't designed with auto- scaling in mind. So that's something we will be looking to rectify in 2022.

Another thing we're starting to consider is whether Erlang is really the right language for our engineering middleware. As mentioned earlier, although Erlang is generally considered a language, its unique virtual machine gives it more in common with container management systems such as Kubernetes than with other languages such as Python. In many ways you can consider the Erlang VM as a "cloud in a box", capable of running cloud- scale systems, before the big public clouds became viable deployment options.

But equally, strip away those cloud- like features (process spawning, messaging - because those tasks are now delegated to Kubernetes and Kafka) and what are you left with ? A somewhat quirky scripting language, in which you're running single threaded code - you're not even taking advantage of Erlang's famed multi- threading! And so whilst I like Erlang a lot, it may be time to consider alternative languages for our middleware logic - ones which might give us a productivity pickup, ones for which there may exist deeper pools of development talent.

Allied to this thought, one of the most interesting trends (IMO) of the past two years has been the growth of the serverless ecosystem. People are used to thinking about Lambda as functions-as-a-service, operating in a stateless manner. But AWS has been quietly building and deploying direct Lambda bindings for many of its core products - S3 and DynamoDB in the storage space, SQS for queues, SNS and Eventbridge for messaging. This opens up the possibility of doing more and more engineering directly in the serverless space, even of building full serverless applications comprising messaging and state management. And in many cases the scale that these managed serverless services can handle far outstrips what you might be able to do with self- hosted services. If DynamoDB can throughput tens of thousands of transactions a second, why are you bothering to self- host Redis on EC2 ?

All in all I consider serverless to have been a big boon for the Superbet modelling team. It has allowed us to go to market and iterate our products quickly, and effectively outsource a lot of the DevOps work associated with scaling models to AWS. We kept our quant team happy and were able to manage with a small- ish engineering team as a result. Along the way it has shined a light on the cost benefits of having an aggressively auto- scaled solution, and has given us a lot of ideas of how we might bring these benefits to other parts of the engineering stack. It's been interesting to see how the serverless ecosystem has evolved over the past three years, with AWS spawning new features and products at an impressive rate. Having started using Lambda as simple functions-as-a-service three years ago, we're poised to start looking at fully event driven Lambda systems in 2022.

If you're interested in working on sports models in Python, or in distributed engineering at scale with AWS/Kafka/Kubernetes/Erlang then ping me!

23