mirror of
https://github.com/LCTT/TranslateProject.git
synced 2025-01-13 22:30:37 +08:00
commit
079676ca48
@ -1,108 +0,0 @@
|
||||
[#]: subject: (Tune your MySQL queries like a pro)
|
||||
[#]: via: (https://opensource.com/article/21/5/mysql-query-tuning)
|
||||
[#]: author: (Dave Stokes https://opensource.com/users/davidmstokes)
|
||||
[#]: collector: (lujun9972)
|
||||
[#]: translator: (unigeorge)
|
||||
[#]: reviewer: ( )
|
||||
[#]: publisher: ( )
|
||||
[#]: url: ( )
|
||||
|
||||
Tune your MySQL queries like a pro
|
||||
======
|
||||
Optimizing your queries isn't a dark art; it's just simple engineering.
|
||||
![woman on laptop sitting at the window][1]
|
||||
|
||||
Many people consider tuning database queries to be some mysterious "dark art" out of a Harry Potter novel; with the wrong incantation, your data turns from a valuable resource into a pile of mush.
|
||||
|
||||
In reality, tuning queries for a relational database system is simple engineering and follows easy-to-understand rules or heuristics. The query optimizer translates the query you send to a [MySQL][2] instance, and then it determines the best way to get the requested data using those heuristics combined with what it knows about your data. Reread the last part of that: _"what it knows about your data_." The less the query optimizer has to guess about where your data is located, the better it can create a plan to deliver your data.
|
||||
|
||||
To give the optimizer better insight about the data, you can use indexes and histograms. Used properly, they can greatly increase the speed of a database query. If you follow the recipe, you will get something you will like. But if you add your own ingredients to that recipe, you may not get what you want.
|
||||
|
||||
### Cost-based optimizer
|
||||
|
||||
Most modern relational databases use a cost-based optimizer to determine how to retrieve your data out of the database. That cost is based on reducing very expensive disk reads as much as possible. The query optimizer code inside the database server keeps statistics on getting that data as it is encountered, and it builds a historical model of what it took to get the data.
|
||||
|
||||
But historical data can be out of date. It's like going to the store to buy your favorite snack and being shocked at a sudden price increase or that the store closed. Your server's optimization process may make a bad assumption based on old information, and that will produce a poor query plan.
|
||||
|
||||
A query's complexity can work against optimization. The optimizer wants to deliver the lowest-cost query of the available options. Joining five different tables means that there are five-factorial or 120 possible combinations about which to join to what. Heuristics are built into the code to try to shortcut evaluating all the possible options. MySQL wants to generate a new query plan every time it sees a query, while other databases such as Oracle can have a query plan locked down. This is why giving detailed information on your data to the optimizer is vital. For consistent performance, it really helps to have up-to-date information for the query optimizer to use when making query plans.
|
||||
|
||||
Also, rules are built into the optimizer with assumptions that probably do not match the reality of your data. The query optimizer will assume all the data in a column is evenly distributed among all the rows unless it has other information. And it will default to the smaller of two possible indexes if it sees no alternative. While the cost-based model for an optimizer can make a lot of good decisions, you can smack into cases where you will not get an optimal query plan.
|
||||
|
||||
### A query plan?
|
||||
|
||||
A query plan is what the optimizer will generate for the server to execute from the query. The way to see the query plan is to prepend the word `EXPLAIN` to your query. For example, the following query asks for the name of a city from the city table and the name of the corresponding country table, and the two tables are linked by the country's unique code. This case is interested only in the top five cities alphabetically from the United Kingdom:
|
||||
|
||||
|
||||
```
|
||||
SELECT city.name AS 'City',
|
||||
country.name AS 'Country'
|
||||
FROM city
|
||||
JOIN country ON (city.countrycode = country.code)
|
||||
WHERE country.code = 'GBR'
|
||||
LIMIT 5;
|
||||
```
|
||||
|
||||
Prepending `EXPLAIN` in front of this query will give the query plan generated by the optimizer. Skipping over all but the end of the output, it is easy to see the optimized query:
|
||||
|
||||
|
||||
```
|
||||
SELECT `world`.`city`.`Name` AS `City`,
|
||||
'United Kingdom' AS `Country`
|
||||
FROM `world`.`city`
|
||||
JOIN `world`.`country`
|
||||
WHERE (`world`.`city`.`CountryCode` = 'GBR')
|
||||
LIMIT 5;
|
||||
```
|
||||
|
||||
The big changes are that `country.name as 'Country'` was changed to `'United Kingdom' AS 'Country'` and the `WHERE` clause went from looking in the country table to the city table. The optimizer determined that these two changes will provide a faster result than the original query.
|
||||
|
||||
### Indexes
|
||||
|
||||
You will hear indexes and keys used interchangeably in the MySQL-verse. However, indexes are made up of keys, and keys are a way to identify a record, hopefully uniquely. If a column is designed as a key, the optimizer can search a list of those keys to find the desired record without having to read the entire table. Without an index, the server has to start at the first row of the first column and read through every row of data. If the column was created as a unique index, then the server can go to that one row of data and ignore the rest. The more unique the value of the index (also known as its cardinality), the better. Remember, we are looking for faster ways of getting to the data.
|
||||
|
||||
The MySQL default InnoDB storage engine wants your table to have a primary key and will store your data in a B+ tree by that key. A recently added MySQL feature is invisible columns—columns that do not return data unless the column is explicitly named in the query. For example, `SELECT * FROM foo;` doesn't provide any columns that are designated as hidden. This feature provides a way to add a primary key to older tables without recoding all the queries to include that new column.
|
||||
|
||||
To make this even more complicated, there are many types of indexes, such as functional, spatial, and composite. There are even cases where you can create an index that will provide all the requested information for a query so that there is no need to access the data table.
|
||||
|
||||
Describing the various indexes is beyond the scope of this article, so just think of an index as a shortcut to the record or records you desire. You can create an index on one or more columns or part of those columns. My physician's system can look up my records by the first three letters of my last name and birthdate. Using multiple columns requires using the most unique field first, then the second most unique, and so forth. An index on year-month-day works for year-month-day, year-month, and year searches, but it doesn't work for day, month-day, or year-day searches. It helps to design your indexes around how you want to use your data.
|
||||
|
||||
### Histograms
|
||||
|
||||
A histogram is a distribution of your data. If you were alphabetizing people by their last name, you could use a "logical bucket" for the folks with last names starting with the letters A to F, then another for G to J, and so forth. The optimizer assumes that the data is evenly distributed within the column, but this is rarely the case in practical use.
|
||||
|
||||
MySQL provides two types of histograms: equal height, where all the data is divided equally among the buckets, and singleton, where a single value is in a bucket. You can have up to 1,024 buckets. The amount of buckets to choose for your data column depends on many factors, including how many distinct values you have, how skewed your data is, and how high your accuracy really needs to be. After a certain amount of buckets, there are diminishing returns.
|
||||
|
||||
This command will create a histogram of 10 buckets on column c1 of table t:
|
||||
|
||||
|
||||
```
|
||||
`ANALYZE TABLE t UPDATE HISTOGRAM ON c1 WITH 10 BUCKETS;`
|
||||
```
|
||||
|
||||
Imagine you sell small, medium, and large socks, and each size has its own bin for storage. To find the size you need, you go to the bin for that size. MySQL has had histograms since MySQL 8.0 was released three years ago, yet they are not as well-known as indexes. Unlike indexes, there is no overhead for inserting, updating, or deleting a record. To update an index, an `ANALYZE TABLE` command must be updated. This is a good approach when the data does not churn very much and frequent changes to the data will reduce the efficiency.
|
||||
|
||||
### Indexes or histograms?
|
||||
|
||||
Use indexes for unique items where you need to access the data directly. There is overhead for updates, deletes, and inserts, but you get speedy access if your data is properly architected. Use histograms for data that does not get updated frequently, such as quarterly results for the last dozen years.
|
||||
|
||||
### Parting thoughts
|
||||
|
||||
This article grew out of a recent presentation at the [Open Source 101 conference][3]. And that presentation grew out of a workshop at a [PHP UK Conference][4]. Query tuning is a complex subject, and each time I present on indexes and histograms, I find ways to refine my presentation. But each presentation also shows that many folks in the software world are not well-versed on indexes and tend to use them incorrectly. Histograms have not been around long enough (I hope) to have been misused similarly.
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
via: https://opensource.com/article/21/5/mysql-query-tuning
|
||||
|
||||
作者:[Dave Stokes][a]
|
||||
选题:[lujun9972][b]
|
||||
译者:[unigeorge](https://github.com/unigeorge)
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||
|
||||
[a]: https://opensource.com/users/davidmstokes
|
||||
[b]: https://github.com/lujun9972
|
||||
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/lenovo-thinkpad-laptop-window-focus.png?itok=g0xPm2kD (young woman working on a laptop)
|
||||
[2]: https://www.mysql.com/
|
||||
[3]: https://opensource101.com/
|
||||
[4]: https://www.phpconference.co.uk/
|
105
translated/tech/20210608 Tune your MySQL queries like a pro.md
Normal file
105
translated/tech/20210608 Tune your MySQL queries like a pro.md
Normal file
@ -0,0 +1,105 @@
|
||||
[#]: subject: (Tune your MySQL queries like a pro)
|
||||
[#]: via: (https://opensource.com/article/21/5/mysql-query-tuning)
|
||||
[#]: author: (Dave Stokes https://opensource.com/users/davidmstokes)
|
||||
[#]: collector: (lujun9972)
|
||||
[#]: translator: (unigeorge)
|
||||
[#]: reviewer: ( )
|
||||
[#]: publisher: ( )
|
||||
[#]: url: ( )
|
||||
|
||||
如老手一般玩转 MySQL 查询
|
||||
======
|
||||
优化查询语句不过是一项简单的工程,而非什么高深的黑魔法。
|
||||
![woman on laptop sitting at the window][1]
|
||||
|
||||
许多人将数据库查询语句的调优视作哈利波特小说中某种神秘的“黑魔法”;使用错误的咒语,数据就会从宝贵的资源变成一堆糊状物。
|
||||
|
||||
实际上,对关系数据库系统的查询调优是一项简单的工程,其遵循的规则或启发式方法很容易理解。查询优化器会将你发送的查询指令转换为 [MySQL][2] 实例,然后将这些启发式方法和优化器已知的数据信息结合使用,确定获取所请求数据的最佳方式。再读一下最后一部分:_“优化器已知的数据信息_。”查询优化器需要对数据所在位置的猜测越少(即已知信息越多),它就可以越好地制定交付数据的计划。
|
||||
|
||||
为了让优化器更好地了解数据,你可以考虑使用索引和直方图。正确使用索引和直方图可以大大提高数据库查询的速度。这就像如果你按照食谱做菜,就可以得到你喜欢吃的东西;但是假如你随意在该食谱中添加材料,最终得到的东西可能就不那么尽如人意了。
|
||||
|
||||
### 基于成本的优化器
|
||||
|
||||
大多数现代关系型数据库使用基于成本的优化器(cost-based optimizer)来确定如何从数据库中检索数据。该成本方案是基于尽可能减少非常耗费资源的磁盘读取过程。数据库服务器上的查询优化器代码会在遇到数据时保留有关获取该数据的统计信息,并构建获取数据所需时间的历史模型。
|
||||
|
||||
但历史数据是可能会过时的。这就好像你去商店买你最喜欢的零食,然后突然发现零食涨价或者商店关门了。服务器的优化进程可能会根据旧信息做出错误的假设,进而制定出低效的查询计划。
|
||||
|
||||
查询的复杂性可能会影响优化。优化器希望提供可用的最低成本查询方式。连接五个不同的表就意味着有 5 的阶乘即 120 种可能的连接顺序组合。代码中内置了启发式方法,以尝试对所有可能的选项进行快捷评估。 MySQL 每次看到查询时都希望生成一个新的查询计划,而其他数据库(例如 Oracle)则可以锁定查询计划。这就是向优化器提供有关数据的详细信息至关重要的原因。要想获得稳定的性能,在制定查询计划时为查询优化器提供最新信息确实很有效。
|
||||
|
||||
此外,优化器中内置的规则可能与数据的实际情况并不相符。没有更多有效信息的情况下,查询优化器会假设列中的所有数据均匀分布在所有行中。没有其他选择依据时,它会默认选择两个可能索引中较小的一个。虽然基于成本的优化器模型可以制定出很多好的决策,但最终查询计划并不是最佳方案的情况也是有可能的。
|
||||
|
||||
### 查询计划是神马?
|
||||
|
||||
查询计划(query plan)是指优化器基于查询语句产生的,提供给服务器执行的计划内容。查看查询计划的方法是在查询语句前加上 `EXPLAIN` 关键字。例如,以下查询要从城市表(city)和相应的国家表(country)中获得城市名称(和所属国家名称),城市表和国家表通过国家唯一代码连接。本例中仅查询了英国的字母顺序前五名的城市:
|
||||
|
||||
```
|
||||
SELECT city.name AS 'City',
|
||||
country.name AS 'Country'
|
||||
FROM city
|
||||
JOIN country ON (city.countrycode = country.code)
|
||||
WHERE country.code = 'GBR'
|
||||
LIMIT 5;
|
||||
```
|
||||
|
||||
在查询语句前加上 `EXPLAIN` 可以看到优化器生成的查询计划。跳过除输出末尾之外的所有内容,可以看到优化后的查询:
|
||||
|
||||
```
|
||||
SELECT `world`.`city`.`Name` AS `City`,
|
||||
'United Kingdom' AS `Country`
|
||||
FROM `world`.`city`
|
||||
JOIN `world`.`country`
|
||||
WHERE (`world`.`city`.`CountryCode` = 'GBR')
|
||||
LIMIT 5;
|
||||
```
|
||||
|
||||
看下比较大的几个变化, `country.name as 'Country'` 改成了 `'United Kingdom' AS 'Country'`,`WHERE` 子句从在国家表中查找变成了在城市表中查找。优化器认为这两个改动会提供比原始查询更快的结果。
|
||||
|
||||
### 索引
|
||||
|
||||
在MySQL世界中,你会听到索引或键的概念。不过,索引是由键组成的,键是一种识别记录的方式,并且大概率是唯一的。如果将列设计为键,优化器可以搜索这些键的列表以找到所需的记录,而无需读取整个表。如果没有索引,服务器必须从第一列的第一行开始读取每一行数据。如果该列是作为唯一索引创建的,则服务器可以直接读取该行数据并忽略其余数据。索引的值(也称为基数)唯一性越强越好。请记住,我们在寻找更快获取数据的方法。
|
||||
|
||||
MySQL 默认的 InnoDB 存储引擎希望你的表有一个主键,并将通过该键将你的数据存储在 B+ 树中。不可见列是 MySQL 最近添加的功能——除非在查询中明确指明该不可见列,否则不会返回该列数据。例如,`SELECT * FROM foo;` 就不会返回任何不可见列。这个功能提供了一种向旧表添加主键的方法,且无需为了包含该新列而重写所有查询语句。
|
||||
|
||||
更复杂的是,有多种类型的索引,例如函数索引、空间索引和复合索引。甚至在某些情况下,你还可以创建这样一个索引:该索引可以为查询提供所有请求的信息,从而无需再去访问数据表。
|
||||
|
||||
本文不会详细讲解各种索引类型,你只需将索引看作指向要查询的数据记录的快捷方式。你可以在一个或多个列或这些列的一部分上创建索引。我的医师系统就可以通过我姓氏的前三个字母和出生日期来查找我的记录。使用多列时要注意首选唯一性最强的字段,然后是第二强的字段,依此类推。年-月-日的索引可用于年-月-日、年-月和年搜索,但不适用于日、月-日或年-日搜索。考虑这些因素有助于你围绕如何使用数据这一出发点来设计索引。
|
||||
|
||||
### 直方图
|
||||
|
||||
直方图就是数据的分布式。如果你将人名按其姓氏的字母顺序排序,就可以对姓氏以字母 A 到 F 开头的人放到一个“逻辑桶”中,然后将 G 到 J 开头的放到另一个中,依此类推。优化器会假定数据在列内均匀分布,但实际使用时多数情况并不是均匀的。
|
||||
|
||||
MySQL 提供两种类型的直方图:所有数据在桶中平均分配的等高型,以及单个值在单个桶中的等宽型。最多可以设置 1,024 个存储桶。 数据存储桶数量的选择取决于许多因素,包括去重后的数值量、数据倾斜度以及需要的结果准确度。如果桶的数量超过某个阈值,桶机制带来的收益就会开始递减。
|
||||
|
||||
以下命令将在表 t 的列 c1 上创建 10 个桶的直方图:
|
||||
|
||||
```
|
||||
ANALYZE TABLE t UPDATE HISTOGRAM ON c1 WITH 10 BUCKETS;
|
||||
```
|
||||
|
||||
想象一下你在售卖小号、中号和大号袜子,每种尺寸的袜子都放在单独的储物箱中。如果你想找某个尺寸的袜子,就可以直接去对应尺寸的箱子里找。MySQL 自从三年前发布 MySQL 8.0 以来就有了直方图功能,但该功能却并没有像索引那样广为人知。与索引不同,使用直方图插入、更新或删除记录都不会产生额外开销。而如果更新索引,就必须更新 `ANALYZE TABLE` 命令。 当数据变动不大并且频繁更改数据会降低效率时,直方图是一种很好的方法。
|
||||
|
||||
### 选择索引还是直方图?
|
||||
|
||||
对需要直接访问的且具备唯一性的数据项目使用索引。虽然修改、删除和插入操作会产生额外开销,但如果数据架构正确,索引就可以方便你快速访问。对不经常更新的数据则建议使用直方图,例如过去十几年的季度结果。
|
||||
|
||||
### 结语
|
||||
|
||||
本文源于最近在 [Open Source 101 会议][3] 上的一次报告。报告的演示文稿源自 [PHP UK Conferenc][4] 的研讨会。查询调优是一个复杂的话题,每次我就索引和直方图作报告时,我都会找到新的可改进点。但是每次报告反馈也表明很多软件界中的人并不精通索引,并且时常使用错误。我想直方图大概由于出现时间较短,还没有出现像索引这种使用错误的情况。
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
via: https://opensource.com/article/21/5/mysql-query-tuning
|
||||
|
||||
作者:[Dave Stokes][a]
|
||||
选题:[lujun9972][b]
|
||||
译者:[unigeorge](https://github.com/unigeorge)
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||
|
||||
[a]: https://opensource.com/users/davidmstokes
|
||||
[b]: https://github.com/lujun9972
|
||||
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/lenovo-thinkpad-laptop-window-focus.png?itok=g0xPm2kD (young woman working on a laptop)
|
||||
[2]: https://www.mysql.com/
|
||||
[3]: https://opensource101.com/
|
||||
[4]: https://www.phpconference.co.uk/
|
Loading…
Reference in New Issue
Block a user