Explorations in League of Legends Data

In this article, I’m going to discuss a recent classification project I worked on relating to the massively popular video game, League of Legends, as well as enhancements and other insights that could be gained from this data.

The Setup

As a part of my Data Science curriculum, I was instructed to devise a scenario in which we attempt to model some type of classification problem. Unlike other projects, I was not given a data set or starting point. Instead, the only instructions were to find something I was passionate about, as long as it resulted in classification. After searching Kaggle for inspiration, I found a data set that really excited me: League of Legends data. It was from this find that I realized it would make a perfect classification problem; a variety of metrics were “snapshotted” at the 10 minute mark, as well as the end result (win or loss) for the same game.

At this point, I created my hypothetical business problem: I have been hired by the esports organization Cloud9 as a player coach/analyst for the professional League of Legends team. They are competing at the top level and are looking to win every game they possibly can, as there is a lot of money on the line. My job is to help them determine the most important factors in winning League of Legends games. I am to investigate what I should be advising our players to focus on in the first 10 minutes of each game to provide the highest chance to win the game.

Caveat

As a fair warning, from this point I’m going to discuss some of my findings from the perspective of a player to give as much of a “deep dive” into this data as possible. If you’re not familiar with the game and some of the terminology, it will be hard to follow along with the rest of this article. If you’re in this boat, I highly recommend you check out my project on GitHub here. The notebook (technical audience) and corresponding pdf presentation (non-technical audience) gives a much higher-level look at my findings than I will discuss here.

Investigation

After taking an initial look at the data available to me, I sought out answers to three questions, which I believe could be answered effectively.

  • What is the single most important determining factor in winning a game?
  • What objectives should our players prioritize?
  • What objectives should our players ignore?

After some initial cleaning, feature engineering, and iteration over multiple models- my feature importances graph for an XGBoost model showed some interesting results.

Obviously, the single most important factor in determining game outcome in professional games is the gold differential at 10 minutes. I only say this because anything worth doing in League of Legends gives you gold. During professional game broadcasts, the total gold of each team is shown on the scoreboard (along with total kills and towers). However, any decent player knows that this is just a fundamental part of the game. Having more gold means you have more items, more items means that you have more stats on your character, more stats means you’re more likely to win fights and accrue more gold more quickly. So ultimately, this doesn’t really provide much insight to a high-level player. It is mostly just confirming a basic fundamental of the game that one learns from playing. At this point in my data science adventure, this was extremely reassuring, because I knew the context of what this meant from a technical perspective, as well as a player; my model was working properly.

From this visual, one can also see that the number of dragons taken is the next highest determining factor in predicting outcome for our model. This is where we start to get actual insights for professional players. League is a game of many choices, and as a team you have to decide what objectives to take. There’s only a limited amount of time in the game to accomplish things. Each player can really only be in one place at one time, and it takes time to move across the map. For example, if Red team decides to send four players for dragon, but blue team can only have two players there to contest it, the red team will almost assuredly secure the dragon while potentially killing the blue players in a 4vs2 scenario. In return, however, the remaining blue members will be getting waves and potential towers on the other side of the map. This model seems to place high importance on making sure to secure those dragons, even if it is at the cost of losing towers and minions. Similarly, the model places the absolute lowest value on taking Rift Heralds, as it is not a heavy contributing factor to wins in these games.

Model Weaknesses and Potential Future Work

Obviously, we were able to answer our questions laid out at the start, but I definitely question how useful this would be for players within an organization. Firstly, here are definitely some issues with the data. The data was taken from 10,000 professional games; however, this is over the course of two years’ worth of matches. Something not taken into account is the fact that the game developer, Riot Games, introduces balance patches every two weeks to keep the game/meta feeling different and “fresh”. The game itself is in a constant state of change; one week they may just suddenly decide to change the gold value that turrets provide, or the amount of stats given by a dragon type. Therefore, our model cannot easily take this into account.

Another drawback to only these features is the fact that none of the champion picks are taken into account. In my opinion, this can sometimes be more important that the actual game state itself, as some teams build around sacrificing early objectives so that their late game “scaling comp” can reach their power spikes. There are over 150 champions in League of Legends, and the team combinations are nearly endless. Although professionals can generally get a good grasp on what the strongest picks are for each role, sometimes the strongest individual picks do not synergize well with other “strongest” picks in other roles. Sadly, the data I had available does not take this into account, but I think it’s an extremely important part of the game.

If I was an actual analyst for a professional team, I think one of the most valuable questions I could answer would be: “Which champion would be best to pick in this game, for this particular patch?” Although the data in my project is not able to answer a question like this, there are some interesting resources I’ve found which help me make my own champion selections in games. For example, this site shows Draven’s win rate against every other champion over the past 30 days for the top 10% of players, still bringing in a huge sample size of nearly 300,000 games. Let’s say the enemy team already picked Draven; we can consult the win rate matrix per role to determine which one of our own picks can give us the best chance to win. In the future, expanding the project to take champion selections and win rate matrices into account for a classification model would be difficult but extremely exciting.

17