Indexing MySQL For High Performance: A High-level Overview

Introduction

If you have ever worked with MySQL, Percona Server, or MariaDB, you have probably wondered how you can improve the performance of your database instances. If you have sought out advice on this subject, you have likely heard about indexes.

Indexes in MySQL can be categorized into a few types:

  1. Balanced tree (B-Tree) indexes - the most frequently used type of index. This index type can be used together with search queries that use the =, >, >=, <, <= and BETWEEN keywords, also with LIKE queries.
  2. Spatial (R-Tree) indexes - can be used together with MySQL geometric data types to index geographical objects.
  3. Hash indexes - usually used only with queries that use the = or <=> search operators. Very fast but can be used only when the MEMORY storage engine is in use.
  4. Covering indexes - indexes that cover all of the columns required for a query to complete.
  5. Clustered indexes - such indexes store row data. Usually PRIMARY KEYs or, if they do not exist, UNIQUE indexes.
  6. Multicolumn (composite) indexes - indexes that are created on multiple columns.
  7. Prefix indexes - such indexes allow you only to index a prefix of a column. As such indexes do not index the full value of a column, they are frequently used to save space.

B-Tree Indexes and MySQL


A B-tree index is the most commonly used index in MySQL, MariaDB, and Percona Server. When such an index is in use, MySQL can make use of search queries that use the =, >, >=, <, <= or BETWEEN keywords, also LIKE queries. One of the primary reasons such an index may be added to a column is to speed up search queries - when a b-tree index is in use, the storage engine does not scan through the whole table to find relevant rows.

Here are a few examples of queries that might make use of such an index:

  1. SELECT * FROM demo_table WHERE column [ = | > | >= | < | <= ] 'value';
  2. SELECT * FROM demo_table WHERE column_a BETWEEN 100 AND 200;
  3. SELECT * FROM demo_table WHERE column LIKE 'value%';

Spatial Indexes and MySQL


Spatial indexes (also sometimes referred to as R-Tree indexes) are used to access spatial (geographical) objects - to make use of such indexes, you must use the GIS functions provided by MySQL: some of these functions include MBRContains, MBRCovers, or MBREquals. Such functions can indicate whether the minimum bounding rectangles of one parameter contain, cover, or equals the minimum bounding rectangles of another parameter.

To add such an index to a table, use the SPATIAL keyword like so:

  1. CREATE SPATIAL INDEX spatial_idx ON demo_table (demo_column); or
  2. ALTER TABLE demo_table ADD SPATIAL INDEX(spatial_idx);

Here are a few examples of queries that might make use of such type of index:

  1. SELECT MBRContains(@variable_1, @variable_2);
  2. SELECT MBRWithin(@variable_2, @variable_1);

Keep in mind that before using variables like variable_1 or variable_2 in any capacity, you should define them first (the WKT value parameter represents a well-known text-formatted value that represents geometry objects):

SET @variable_1 = ST_GeomFromText('WKT value');

Hash Indexes and MySQL


Hash indexes in MySQL have a few distinct advantages and disadvantages: the main upside of hash indexes is that they are very fast, but the downside is that they are only used for equality comparisons and they only work on the MEMORY storage engine. It also does not support range searches. To create such an index on a table, make use of the USING HASH option at the end of your query:

CREATE INDEX idx_name ON demo_table (demo_column) USING HASH;

The reason why hash indexes are so fast is that hash indexes are usually very compact and their primary key only accesses elements in a hash table. However, it also means that you cannot employ range searches - hash indexes should only be used with = or <=> operators.

Covering Indexes in MySQL


When working with MySQL, sometimes you might encounter some corner cases that might require the use of an index in a different way - here's where covering indexes come in.

Covering indexes, simply put, are indexes that cover all fields required for a query to execute successfully. Thus, when a covering index is in use, a query can retrieve results from the index itself rather than accessing the disk saving disk I/O. For example, say you have a query like so:

SELECT column_1, column_2 FROM demo_table WHERE column_3 = 'value';

If you use an index that already contains the values of the columns you want to search through, MySQL won't access the disk - instead, it will provide results directly from the index.

Clustered Indexes in MySQL


Clustered indexes aren't a separate MySQL index type per se - it's simply a different approach to storing data. Such indexes store row data: if your table has a PRIMARY KEY, the PRIMARY KEY is the clustered index. If your table does not have a PRIMARY KEY, the clustered index is the first UNIQUE INDEX with all of its key columns defined as NOT NULL.

When such an index is in use, all rows in the table are stored and sorted based on the index key values. Since the rows can only have one sort order, your tables cannot have more than one clustered index.

Clustered indexes are easy to distinguish - in MySQL, golden keys usually represent such indexes. In contrast, balanced tree indexes are represented by keys outlined in silver (in this case, a column with a "golden key" is the column with a clustered index while a column with the "silver key" is the column with a balanced tree index):

To define a clustered index when creating a table, add an AUTO_INCREMENT to your column and define it as the PRIMARY KEY. To define a balanced tree index when creating a table, add an INDEX to your column and specify the column you want to index (you can also specify the name of the index if you so desire):

CREATE TABLE arctype ( clustered_index INT(255) NOT NULL 
AUTO_INCREMENT PRIMARY KEY, demo_index VARCHAR(255) NOT 
NULL, INDEX idx_name(demo_index));


A UNIQUE INDEX with all of its key columns defined as NOT NULL can also be a clustered index - as the name suggests, in that case, all values in a column with a UNIQUE INDEX will be unique (i.e. there will be no duplicates).

Composite Indexes in MySQL


A composite index is an index that spans multiple columns. It's pretty self-explanatory: if we have an index on multiple columns, we have a composite index. Here's how adding such an index to a table might look like (in this case, c1, c2, and c3 are column names):

CREATE INDEX composite_idx ON demo_table(c1,c2,c3);

Composite indexes can be beneficial if we deal with queries that might use indexed fields for selecting, joining, or filtering operations. For example, a composite index on the columns c1 and c2 might be useful if you want to satisfy a query like so:

SELECT * FROM demo_table WHERE c1 = 5 AND c2 = 10;

In MySQL, composite indexes can consist of up to 16 columns, though bear in mind that MySQL uses indexes from left to right and not vice versa.

Prefix Indexes in MySQL


Sometimes, you might encounter situations where you might need to index lengthy columns, making your indexes very large. In those cases, you can also index the first few characters of a column instead of indexing the whole value - such indexes are called prefix indexes.

To create a prefix index on a table in MySQL, use a query like so:

CREATE INDEX prefix_idx ON demo_table(column_name(length));

In this case, prefix_idx is the name of the index, demo_table is the name of the table, column_name is the name of the column and length is the length of the index.

When using prefix indexes, it's vital to remember that you should index columns in such a way that gives decent selectivity and saves space at the same time. So let's try to do that.

First, we should find the selectivity of the column - we can issue a query like so (we divide the distinct (unique) values in the column by all of the values in the table - an ideal index selectivity is a value of 1):

SELECT COUNT(DISTINCT column) / COUNT(*) FROM demo_table;

Now we should try to make the selectivity of the prefix index as close to that value as possible. To do that, we could issue a query like so:

SELECT COUNT(DISTINCT LEFT(column, 5)) / COUNT(*) AS selectivity_5;

The value that will be displayed is the selectivity of the index that has the prefix of 5 characters (to evaluate different character lengths, adjust the query).

Summary


In this blog post, we have gone through some of the use cases related to certain indexes in MySQL. Choosing a good index for your use case might indeed be a hassle - after all, you have so many options to choose from - but once you understand the requirements of your project and the upsides and downsides of using certain index options in MySQL, choosing the right fit becomes easier.

After all, hopefully, this blog post has provided you with some valuable insight regarding indexes in MySQL - stick around for more content from the MySQL world.

Lukas is an ethical hacker, a MySQL database administrator, and a frequent conference speaker. Since 2014 Lukas has found and responsibly disclosed security flaws in some of the most visited websites in Lithuania and abroad including advertising, gift-buying, gaming, hosting websites as well as some websites of government institutions. Lukas runs one of the biggest & fastest data breach search engines in the world — BreachDirectory.com and frequently blogs in multiple places educating people about information security and other topics. He also runs his own blog over at lukasvileikis.com

16