Using Hints to Analyze Queries

Hints to Analyze Queries

Hints to Analyze QueriesIn this blog post, we’ll look at using hints to analyze queries.

There are a lot of things that you can do wrong when writing a query, which means that there a lot of things that you can do to make it better. From my personal experience there are two things you should review first:

  1. The table join order
  2. Which index is being used

Why only those two? Because many other alternatives that are more expensive, and at the end query optimization is a cost-effectiveness analysis. This is why we must start with the simplest fixes. We can control this with the hints “straight_join” and “force index”. These allow us to execute the query with the plan that we would like to test.

Join Order

In a query where we use multiple tables or subqueries, we have some particular fields that we are going to use to join the tables. Those fields could be the Primary Key of the table, the first part of a secondary index, neither or both. But before we analyze possible scenarios, table structure or indexes, we need to establish what is the best order for that query to join the tables.

When we talked about join order and the several tables to join, one possible scenario is that a table is using a primary key to join a table, and another field to join to other tables. For instance:

select
  table_a.id, table_b.value1, table_c.value1
from
  table_a join
  table_b on table_a.id = table_b.id join
  table_c on table_b.id_c = table_c.id
where
  table_a.value1=10;

We get this explain:

+----+-------------+---------+--------+----------------+---------+---------+------------------------------------+------+-------------+
| id | select_type | table   | type   | possible_keys  | key     | key_len | ref                                | rows | Extra       |
+----+-------------+---------+--------+----------------+---------+---------+------------------------------------+------+-------------+
|  1 | SIMPLE      | table_a | ref    | PRIMARY,value1 | value1  | 5       | const                              |    1 | Using index |
|  1 | SIMPLE      | table_b | eq_ref | PRIMARY        | PRIMARY | 4       | bp_query_optimization.table_a.id   |    1 | Using where |
|  1 | SIMPLE      | table_c | eq_ref | PRIMARY        | PRIMARY | 4       | bp_query_optimization.table_b.id_c |    1 | NULL        |
+----+-------------+---------+--------+----------------+---------+---------+------------------------------------+------+-------------+

It is filtering by value1 on table_a, which joins with table_b with the primary key, and table_c uses the value of id_c which it gets from table_b.

But we can change the table order and use straight_join:

select straight_join
  table_a.id, table_b.value1, table_c.value1
from
  table_c join
  table_b on table_b.id_c = table_c.id join
  table_a on table_a.id = table_b.id
where
  table_a.value1=10;

The query is semantically the same, but now we get this explain:

+----+-------------+---------+--------+----------------+---------+---------+----------------------------------+------+-------------+
| id | select_type | table   | type   | possible_keys  | key     | key_len | ref                              | rows | Extra       |
+----+-------------+---------+--------+----------------+---------+---------+----------------------------------+------+-------------+
|  1 | SIMPLE      | table_c | ALL    | PRIMARY        | NULL    | NULL    | NULL                             |    1 | NULL        |
|  1 | SIMPLE      | table_b | ref    | PRIMARY,id_c   | id_c    | 5       | bp_query_optimization.table_c.id |    1 | NULL        |
|  1 | SIMPLE      | table_a | eq_ref | PRIMARY,value1 | PRIMARY | 4       | bp_query_optimization.table_b.id |    1 | Using where |
+----+-------------+---------+--------+----------------+---------+---------+----------------------------------+------+-------------+

In this case, we are performing a full table scan over table_c, which then joins with table_b using index over id_c to finally join table_a using the primary key.

Sometimes the optimizer chooses the incorrect join order because of bad statistics. I found myself reviewing the first query with the second explain plan, where the only thing that I did to find the query problem was to add “STRAIGHT_JOIN” to the query.

Taking into account that the optimizer could fail on this task, we found a practical way to force it to do what we want (change the join order).

It is also useful to find out when an index is missing. For example:

SELECT costs.id as cost_id, spac_types.id as spac_type_id
FROM
spac_types INNER JOIN
costs_spac_types ON costs_spac_types.spac_type_id = spac_types.id INNER JOIN
costs ON costs.id = costs_spac_types.cost_id
WHERE spac_types.place_id = 131;

The explain plan shows:

+----+-------------+------------------+--------+----------------------------------------------------+----------------------------------------------------+---------+-----------------------------------+-------+-------------+
| id | select_type | table            | type  | possible_keys                                       | key                                                | key_len | ref                               | rows  | Extra       |
+----+-------------+------------------+--------+----------------------------------------------------+----------------------------------------------------+---------+-----------------------------------+-------+-------------+
|  1 | SIMPLE      | costs_spac_types | index  | index_costs_spac_types_on_cost_id_and_spac_type_id | index_costs_spac_types_on_cost_id_and_spac_type_id | 8       | NULL                              | 86408 | Using index |
|  1 | SIMPLE      | spac_types       | eq_ref | PRIMARY,index_spac_types_on_place_id_and_spac_type | PRIMARY                                            | 4       | pms.costs_spac_types.spac_type_id |     1 | Using where |
|  1 | SIMPLE      | costs            | eq_ref | PRIMARY                                            | PRIMARY                                            | 4       | pms.costs_spac_types.cost_id      |     1 | Using index |
+----+-------------+------------------+--------+----------------------------------------------------+----------------------------------------------------+---------+-----------------------------------+-------+-------------+

It is starting with costs_spac_types and then using the clustered index for the next two tables. The explain doesn’t look bad!

However, it was taking longer than this:

SELECT STRAIGHT_JOIN costs.id as cost_id, spac_types.id as spac_type_id
FROM
spac_types INNER JOIN
costs_spac_types ON costs_spac_types.spac_type_id = spac_types.id INNER JOIN
costs ON costs.id = costs_spac_types.cost_id
WHERE spac_types.place_id = 131;

0.17 sec versus 0.09 sec. This is the explain plan:

+----+-------------+------------------+--------+----------------------------------------------------+----------------------------------------------------+---------+------------------------------+-------+-----------------------------------------------------------------+
| id | select_type | table            | type   | possible_keys                                      | key                                                | key_len | ref                          | rows  | Extra                                                           |
+----+-------------+------------------+--------+----------------------------------------------------+----------------------------------------------------+---------+------------------------------+-------+-----------------------------------------------------------------+
|  1 | SIMPLE      | spac_types       | ref    | PRIMARY,index_spac_types_on_place_id_and_spac_type | index_spac_types_on_place_id_and_spac_type         | 4      | const                         |    13 | Using index                                                     |
|  1 | SIMPLE      | costs_spac_types | index  | index_costs_spac_types_on_cost_id_and_spac_type_id | index_costs_spac_types_on_cost_id_and_spac_type_id | 8      | NULL                          | 86408 | Using where; Using index; Using join buffer (Block Nested Loop) |
|  1 | SIMPLE      | costs            | eq_ref | PRIMARY                                            | PRIMARY                                            | 4      | pms.costs_spac_types.cost_id  |     1 | Using index                                                     |
+----+-------------+------------------+--------+----------------------------------------------------+----------------------------------------------------+---------+------------------------------+-------+-----------------------------------------------------------------+

Reviewing the table structure:

CREATE TABLE costs_spac_types (
  id int(11) NOT NULL AUTO_INCREMENT,
  cost_id int(11) NOT NULL,
  spac_type_id int(11) NOT NULL,
  PRIMARY KEY (id),
  UNIQUE KEY index_costs_spac_types_on_cost_id_and_spac_type_id (cost_id,spac_type_id)
) ENGINE=InnoDB AUTO_INCREMENT=172742 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

I saw that the unique index was over cost_id and then spac_type_id. After adding this index:

ALTER TABLE costs_spac_types ADD UNIQUE KEY (spac_type_id,cost_id);

Now, the explain plan without STRIGHT_JOIN is:

+----+-------------+------------------+--------+-----------------------------------------------------------------+--------------------------------------------+---------+------------------------------+------+-------------+
| id | select_type | table            | type   | possible_keys                                                   | key                                        | key_len | ref                          | rows | Extra       |
+----+-------------+------------------+--------+-----------------------------------------------------------------+--------------------------------------------+---------+------------------------------+------+-------------+
|  1 | SIMPLE      | spac_types       | ref    | PRIMARY,index_spac_types_on_place_id_and_spac_type              | index_spac_types_on_place_id_and_spac_type | 4      | const                         |   13 | Using index |
|  1 | SIMPLE      | costs_spac_types | ref    | index_costs_spac_types_on_cost_id_and_spac_type_id,spac_type_id | spac_type_id                               | 4      | pms.spac_types.id             |   38 | Using index |
|  1 | SIMPLE      | costs            | eq_ref | PRIMARY                                                         | PRIMARY                                    | 4      | pms.costs_spac_types.cost_id  |    1 | Using index |
+----+-------------+------------------+--------+-----------------------------------------------------------------+--------------------------------------------+---------+------------------------------+------+-------------+

Which is much better, as it is scanning fewer rows and the query time is just 0.01 seconds.

Indexes

The optimizer has the choice of using a clustered index, a secondary index, a partial secondary index or no index at all, which means that it uses the clustered index.

Sometimes the optimizer ignores the use of an index because it thinks reading the rows directly is faster than an index lookup:

mysql> explain select * from table_c where id=1;
+----+-------------+---------+-------+---------------+---------+---------+-------+------+-------+
| id | select_type | table   | type  | possible_keys | key     | key_len | ref   | rows | Extra |
+----+-------------+---------+-------+---------------+---------+---------+-------+------+-------+
|  1 | SIMPLE      | table_c | const | PRIMARY       | PRIMARY | 4       | const |    1 | NULL  |
+----+-------------+---------+-------+---------------+---------+---------+-------+------+-------+
mysql> explain select * from table_c where value1=1;
+----+-------------+---------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table   | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+---------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | table_c | ALL  | NULL          | NULL | NULL    | NULL |    1 | Using where |
+----+-------------+---------+------+---------------+------+---------+------+------+-------------+

In both cases, we are reading directly from the clustered index.

Then, we have secondary indexes that are partially used or/and that are partially useful for the query. This means that we are going to scan the index and then we are going to lookup in the clustered index. YES! TWO STRUCTURES WILL BE USED! We usually don’t realize any of this, but this is like an extra join between the secondary index and the clustered index.

Finally, the covering index, which is simple to identify as “Using index” in the extra column:

mysql> explain select value1 from table_a where value1=1;
+----+-------------+---------+------+---------------+--------+---------+-------+------+-------------+
| id | select_type | table   | type | possible_keys | key    | key_len | ref   | rows | Extra       |
+----+-------------+---------+------+---------------+--------+---------+-------+------+-------------+
|  1 | SIMPLE      | table_a | ref  | value1        | value1 | 5       | const |    1 | Using index |
+----+-------------+---------+------+---------------+--------+---------+-------+------+-------------+

Index Analysis

As I told you before, this is a cost-effectiveness analysis from the point of view of query performance. Most of the time it is faster to use covering indexes than secondary indexes, and finally the clustered index. However, usually covering indexes are more expensive for writes, as you need more fields to cover the query needs. So we are going to use a secondary index that also uses the clustered index. If the amount of rows is not large and it is selecting most of the rows, however, it could be even faster to perform a full table scan. Another thing to take into account is that the amount of indexes affects the write rate.

Let’s do an analysis. This is a common query:

mysql> explain select * from table_index_analisis_1 t1, table_index_analisis_2 t2 where t1.id = t2.value1;
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------+------+-------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref                             | rows | Extra       |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------+------+-------------+
|  1 | SIMPLE      | t2    | ALL    | NULL          | NULL    | NULL    | NULL                            |   64 | Using where |
|  1 | SIMPLE      | t1    | eq_ref | PRIMARY       | PRIMARY | 4       | bp_query_optimization.t2.value1 |    1 | NULL        |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------+------+-------------+

It is using all the fields of each table.

This is more restrictive:

mysql> explain select t1.id, t1.value1, t1.value2, t2.value2 from table_index_analisis_1 t1, table_index_analisis_2 t2 where t1.id = t2.value1;
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------+------+-------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref                             | rows | Extra       |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------+------+-------------+
|  1 | SIMPLE      | t2    | ALL    | NULL          | NULL    | NULL    | NULL                            |   64 | Using where |
|  1 | SIMPLE      | t1    | eq_ref | PRIMARY       | PRIMARY | 4       | bp_query_optimization.t2.value1 |    1 | NULL        |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------+------+-------------+

But it is performing a full table scan over t2, and then is using t2.value1 to lookup on t1 using the clustered index.

Let’s add an index on table_index_analisis_2 over value1:

mysql> alter table table_index_analisis_2 add key (value1);
Query OK, 0 rows affected (0.02 sec)
Records: 0  Duplicates: 0  Warnings: 0

The explain shows that it is not being used, not even when we force it:

mysql> explain select * from table_index_analisis_1 t1, table_index_analisis_2 t2 where t1.id = t2.value1;
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------+------+-------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref                             | rows | Extra       |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------+------+-------------+
|  1 | SIMPLE      | t2    | ALL    | value1        | NULL    | NULL    | NULL                            |   64 | Using where |
|  1 | SIMPLE      | t1    | eq_ref | PRIMARY       | PRIMARY | 4       | bp_query_optimization.t2.value1 |    1 | NULL        |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------+------+-------------+
mysql> explain select * from table_index_analisis_1 t1, table_index_analisis_2 t2 force key (value1) where t1.id = t2.value1;
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------+------+-------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref                             | rows | Extra       |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------+------+-------------+
|  1 | SIMPLE      | t2    | ALL    | value1        | NULL    | NULL    | NULL                            |   64 | Using where |
|  1 | SIMPLE      | t1    | eq_ref | PRIMARY       | PRIMARY | 4       | bp_query_optimization.t2.value1 |    1 | NULL        |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------+------+-------------+

This is because the optimizer considers performing a full table scan better than using a part of the index.

Now we are going to add an index over value1 and value2:

mysql> alter table table_index_analisis_2 add key (value1,value2);
Query OK, 0 rows affected (0.02 sec)
Records: 0  Duplicates: 0  Warnings: 0
mysql> explain select t1.id, t1.value1, t1.value2, t2.value2 from table_index_analisis_1 t1, table_index_analisis_2 t2 where t1.id = t2.value1;
+----+-------------+-------+--------+-----------------+----------+---------+---------------------------------+------+--------------------------+
| id | select_type | table | type   | possible_keys   | key      | key_len | ref                             | rows | Extra                    |
+----+-------------+-------+--------+-----------------+----------+---------+---------------------------------+------+--------------------------+
|  1 | SIMPLE      | t2    | index  | value1,value1_2 | value1_2 | 10      | NULL                            |   64 | Using where; Using index |
|  1 | SIMPLE      | t1    | eq_ref | PRIMARY         | PRIMARY  | 4       | bp_query_optimization.t2.value1 |    1 | NULL                     |
+----+-------------+-------+--------+-----------------+----------+---------+---------------------------------+------+--------------------------+

We can see that now it is using the index, and in the extra column says “Using index” — which means that it is not using the clustered index.

Finally, we are going to add an index over table_index_analisis_1, in the best way that it is going to be used for this query:

mysql> alter table table_index_analisis_1 add key (id,value1,value2);
Query OK, 0 rows affected (0.02 sec)
Records: 0  Duplicates: 0  Warnings: 0
mysql> explain select t1.id, t1.value1, t1.value2, t2.value2 from table_index_analisis_1 t1, table_index_analisis_2 t2 where t1.id = t2.value1;
+----+-------------+-------+--------+-----------------+----------+---------+---------------------------------+------+--------------------------+
| id | select_type | table | type   | possible_keys   | key      | key_len | ref                             | rows | Extra                    |
+----+-------------+-------+--------+-----------------+----------+---------+---------------------------------+------+--------------------------+
|  1 | SIMPLE      | t2    | index  | value1,value1_2 | value1_2 | 10      | NULL                            |   64 | Using where; Using index |
|  1 | SIMPLE      | t1    | eq_ref | PRIMARY,id      | PRIMARY  | 4       | bp_query_optimization.t2.value1 |    1 | NULL                     |
+----+-------------+-------+--------+-----------------+----------+---------+---------------------------------+------+--------------------------+
2 rows in set (0.00 sec)

However, it is not selected by the optimizer. That is why we need to force it:

mysql> explain select t1.id, t1.value1, t1.value2, t2.value2 from table_index_analisis_1 t1 force index(id), table_index_analisis_2 t2 where t1.id = t2.value1;
+----+-------------+-------+-------+-----------------+----------+---------+---------------------------------+------+--------------------------+
| id | select_type | table | type  | possible_keys   | key      | key_len | ref                             | rows | Extra                    |
+----+-------------+-------+-------+-----------------+----------+---------+---------------------------------+------+--------------------------+
|  1 | SIMPLE      | t2    | index | value1,value1_2 | value1_2 | 10      | NULL                            |   64 | Using where; Using index |
|  1 | SIMPLE      | t1    | ref   | id              | id       | 4       | bp_query_optimization.t2.value1 |    1 | Using index              |
+----+-------------+-------+-------+-----------------+----------+---------+---------------------------------+------+--------------------------+
2 rows in set (0.00 sec)

Now, we are just using the secondary index in both cases.

Conclusions

There are many more hints to analyze queries we could review, like handler used, table design, etc. However, in my opinion, it is useful to focus on these at the beginning of the analysis.

I will also like to point out that using hints is not a long-term solution! Hints should be used just in the analysis phase.

The post Using Hints to Analyze Queries appeared first on Percona Database Performance Blog.

关注dbDao.com的新浪微博

扫码加入微信Oracle小密圈,了解Oracle最新技术下载分享资源

TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569