41
What are database indexes?
SQL queries are used to retrieve data from a database. We use the SELECT statement to retrieve data and usually a WHERE clause to filter the data.
Let's see how SELECT statements work.
Let's see how SELECT statements work.
First let's define our table. We have a
posts
table defined by the below schema:CREATE TABLE posts (
id INTEGER PRIMARY KEY AUTOINCREMENT,
user_id INTEGER,
title TEXT,
body TEXT,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
updated_at DATETIME DEFAULT CURRENT_TIMESTAMP
FOREIGN KEY (user_id) REFERENCES users(id)
);
Let's say we want to retrieve all posts. We can do this by using the SELECT statement.
SELECT * FROM posts;
This will return all the posts in the
posts
table. Let's take this one level up. We want to filter all posts where the title contains the word "Java" (since its my favorite language).
SELECT * FROM posts WHERE title = "Java";
This will return all the posts where the title is "Java". The
WHERE
clause is used to filter the data. All is pretty convenient till now. Queries can be much complex but this should be enough to understand what is happening behind the scenes.
Let's examine how our query works. What does the system do in the background?
posts
table.title
column mentioned there. Now it knows that it needs to filter the results based on the title
column. If there were multiple columns in the WHERE clause, it would filter the results based on all the columns.title
column of each row and find the rows that contain the word "Java".Let's say this table is part of a huge database. Since it goes through the entire database, it will take a long time to run.
This is where the concept of indexing comes in.
This is where the concept of indexing comes in.
What if the rows were sorted alphabetically by the title column?
Would it be faster to query the database?
Would it be faster to query the database?
We can achieve this by creating an index on the title column.
Let's see how we can create an index on the
title
column.CREATE INDEX posts_title_index ON posts (title);
This will create an index on the
The index is basically a lookup table. It has two parts:
title
column. The index is basically a lookup table. It has two parts:
title
.post
Note: It is also possible to index multiple columns together. For example, if we want to find all the posts that contain the word "Java" and the word "Python", we can index the
Example:
title
column and the body
column together.Example:
CREATE INDEX posts_title_body_index ON posts (title, body);
Here the index will have two keys. The first key is the
title
column and the second key is the body
column. When both the keys match, the index will point to the row.So if we run the query again, it will look up the key in the index and return the row id. The row id can then be used to return the entire row.
In our case, the query optimizer will find the index on the
For example,
title
column. It will use the index to find the rows. Now the speed depends on how the index is implemented.For example,
There are many more ways to implement indexes and each one has its own advantages and disadvantages. To read further refer to - https://en.wikipedia.org/wiki/Index_(database)
While it is possible to create an index for each column, it is not necessary. It has a performance cost. Wait, why?
We saw that the index makes querying faster but what about writing to the database?
For optimization purposes, the database engine will automatically create indexes for some columns.
There are many factors that can influence the speed of indexing and it is important to choose the columns that are most likely to be queried.
Decision criteria:
As you can see, the decision criteria are not always obvious but the general idea is to choose the columns that are most likely to be queried. Keeping this in mind, we should decide our querying strategy and indexing strategy together. Keep as few columns in queries as possible and look at the possibility of those columns having their own index.
Thanks for reading. This should give you an idea about indexes in SQL databases and how they can be an important part of your SQL performance. If you want to connect with me, you can find me on Twitter @abh1navv
41