Many of our users, developers and database administrators, keep asking our team about EverSQL’s indexing recommendations algorithm.
So, we decided to write about it.
This tutorial won’t detail all the internals of the algorithm, but rather try to lay down the most important aspects of indexing, in simple terms.
Also, and most importantly, we’ll present practical examples for properly indexing your tables and queries by relying on a set of rules, rather than on guessing.
Our focus in this tutorial is on MySQL, MariaDB and PerconaDB databases. This information may be relevant for other database vendors as well, but in some cases may not.
Which indexes should I create for my SQL query?
As a general rule of thumb, you can follow these steps for building compound indexes, when trying to optimize your SQL query:
- Start by listing all tables used in your query. Create a separate list for each of the subqueries in your query.
So if you have one SELECT query with 2 SELECT subqueries inside, you should have 3 lists, one for each of them, containing the tables referenced in them.
At the end of this process, you will potentially add a list of columns for each of these tables, in each of the query lists.
- The left-most columns in any of your indexes should match the columns in the query’s equality comparisons (i.e, age = 25).
You can add several columns, as long as all of them are compared to a constant with an equality operator.
- Then, you should choose a single column which will be the ‘range column’. MySQL only supports one range column in each index.
Therefore, you should look at all the comparisons with a range operator (<>, >, <, IN(), BETWEEN, LIKE) and choose the one that will filter the most rows.
Add that column as the next column in your index for that table.
You can get some more information here about the reasons for adding the equality columns before range columns (the slide is written by a team member of MySQL’s optimizer team).
- If no range columns exist in the query, you can add the columns from the GROUP BY clause.
- If no range columns exist in the query and no GROUP BY clause, you can add the columns from the ORDER BY clause.
- In some cases, it makes sense to also create a separate index that holds the ORDER BY clause’s columns, as MySQL sometimes chooses to use it. Please note though that for this to happen, the index should contain all columns from the ORDER BY clause, they should all be specified in the ORDER BY clause with the same order (ASC / DESC). This doesn’t guarantee that the database’s optimizer will pick this index rather than the WHERE compound index, but it’s worth a try.
- At the end, add relevant columns from the SELECT clause, which might allow the MySQL to use the index as a covering index. A covering index is an index that contains all columns in both filtering and selection clauses of the query. Such an index allows the database to run the query solely by using the index, without having the need to access the table. In many cases this approach is significantly faster.
Let’s look at an example to clarify:
SELECT id, first_name, last_name, age from employees where first_name = ‘John’ AND last_name = ‘Brack’ and age > 25 ORDER BY age ASC;
For this query, we’ll start with adding the columns first_name and last_name, which are compared with an equality operator. Then, we’ll add the age column which is compared with a range condition. No need to have the ORDER BY clause indexed here, as the age column is already in the index. Last but not least, we’ll add id from the SELECT clause to the index to have a covering index.
So to index this query properly, you should add the index:
employees (first_name, last_name, age, id).
The above is a very simplified pseudo-algorithm that will allow you to build simple indexes for rather simple SQL queries.
If you’re looking for a way to automate this process, while also adding the benefit of a proprietary indexing algorithm and query optimization, you can try out EverSQL Query Optimizer which does all the heavy lifting for you.
What not to do when indexing (or writing SQL queries)?
We gathered some of the most common mistakes we see programmers and database administrators do when writing queries and indexing their tables.
Indexing each and every column in the table separately
In most cases, MySQL won’t be able to use more than one index for each table in the query.
Therefore, when creating a separate index for each column in the table, the database is bound to perform only one of the search operations using an index, and the rest of them will be significantly slower, as the database can’t use an index to execute them.
We recommend using compound indexes (explained later in this article) rather than single-column indexes.
The OR operator in filtering conditions
Consider this query:
SELECT a, b FROM tbl WHERE a = 3 OR b = 8.
In many cases, MySQL won’t be able to use an index to apply an OR condition, and as a result, this query is not index-able.
Therefore, we recommend to avoid such OR conditions and consider splitting the query to two parts, combined with a UNION DISTINCT (or even better, UNION ALL, in case you know there won’t be any duplicate results)
The order of columns in an index is important
Let’s say I hand you my contacts phone book which is ordered by the contact’s first name and ask you to count how many people are there named “John” in the book. You’ll grab the book in both hands and say “no problem”. You will navigate to the page that holds all names starting with John, and start counting from there.
Now, let’s say I change the assignment and hand you a phone book that is ordered by the contact’s last name, but ask you to still count all contacts with the first name “John”. How would you approach that? Well, the database scratches his head in this situation as well.
Now lets look at an SQL query to demonstrate the same behavior with the MySQL optimizer:
SELECT first_name, last_name FROM contacts WHERE first_name = ‘John’;
Having the index contacts (first_name, last_name) is ideal here, because the index starts with our filtering condition and ends with another column in the SELECT clause.
But, having the reverse index contacts (last_name, first_name) is rather useless, as the database can’t use the index for filtering, as the column we need is second in the index and not first.
The conclusion from this example is that the order of columns in an index is rather important.
Adding redundant indexes
Indexes are magnificent when trying to optimize your SQL queries and they can improve performance significantly.
But, they come with a downside as well. Each index you’re creating should be kept updated and in sync when changes occur in your databases. So for each INSERT / UPDATE / DELETE in your databases, all relevant indexes should be updated. This update can take sometime, especially with large tables / indexes.
Therefore, do not create indexes unless you know you’ll need them.
Also, we highly recommend to analyze your database once in a while, searching for any redundant indexes that can be removed.