mirror of
https://github.com/LCTT/TranslateProject.git
synced 2025-03-12 01:40:10 +08:00
commit
4eaaea8969
@ -1,3 +1,4 @@
|
||||
Translating by qhwdw
|
||||
Scaling the GitLab database
|
||||
============================================================
|
||||
|
||||
|
@ -1,3 +1,4 @@
|
||||
Translating by qhwdw
|
||||
How to set up a Postgres database on a Raspberry Pi
|
||||
============================================================
|
||||
|
||||
|
@ -1,124 +0,0 @@
|
||||
Translating by qhwdw
|
||||
# A tour of Postgres Index Types
|
||||
|
||||
At Citus we spend a lot of time working with customers on data modeling, optimizing queries, and adding [indexes][3] to make things snappy. My goal is to be as available for our customers as we need to be, in order to make you successful. Part of that is keeping your Citus cluster well tuned and [performant][4] which [we take care][5]of for you. Another part is helping you with everything you need to know about Postgres and Citus. After all a healthy and performant database means a fast performing app and who wouldn’t want that. Today we’re going to condense some of the information we’ve shared directly with customers about Postgres indexes.
|
||||
|
||||
Postgres has a number of index types, and with each new release seems to come with another new index type. Each of these indexes can be useful, but which one to use depends on 1\. the data type and then sometimes 2\. the underlying data within the table, and 3\. the types of lookups performed. In what follows we’ll look at a quick survey of the index types available to you in Postgres and when you should leverage each. Before we dig in, here’s a quick glimpse of the indexes we’ll walk you through:
|
||||
|
||||
* B-Tree
|
||||
|
||||
* Generalized Inverted Index (GIN)
|
||||
|
||||
* Generalized Inverted Seach Tree (GiST)
|
||||
|
||||
* Space partitioned GiST (SP-GiST)
|
||||
|
||||
* Block Range Indexes (BRIN)
|
||||
|
||||
* Hash
|
||||
|
||||
Now onto the indexing
|
||||
|
||||
### In Postgres, a B-Tree index is what you most commonly want
|
||||
|
||||
If you have a degree in Computer Science, then a B-tree index was likely the first one you learned about. A [B-tree index][6] creates a tree that will keep itself balanced and even. When it goes to look something up based on that index it will traverse down the tree to find the key the tree is split on and then return you the data you’re looking for. Using an index is much faster than a sequential scan because it may only have to read a few [pages][7] as opposed to sequentially scanning thousands of them (when you’re returning only a few records).
|
||||
|
||||
If you run a standard `CREATE INDEX` it creates a B-tree for you. B-tree indexes are valuable on the most common data types such as text, numbers, and timestamps. If you’re just getting started indexing your database and aren’t leveraging too many advanced Postgres features within your database, using standard B-Tree indexes is likely the path you want to take.
|
||||
|
||||
### GIN indexes, for columns with multiple values
|
||||
|
||||
Generalized Inverted Indexes, commonly referred to as [GIN][8], are most useful when you have data types that contain multiple values in a single column.
|
||||
|
||||
From the Postgres docs: _“GIN is designed for handling cases where the items to be indexed are composite values, and the queries to be handled by the index need to search for element values that appear within the composite items. For example, the items could be documents, and the queries could be searches for documents containing specific words.”_
|
||||
|
||||
The most common data types that fall into this bucket are:
|
||||
|
||||
* [hStore][1]
|
||||
|
||||
* Arrays
|
||||
|
||||
* Range types
|
||||
|
||||
* [JSONB][2]
|
||||
|
||||
One of the beautiful things about GIN indexes is that they are aware of the data within composite values. But because a GIN index has specific knowledge about the data structure support for each individual type needs to be added, as a result not all datatypes are supported.
|
||||
|
||||
### GiST indexes, for rows that overlap values
|
||||
|
||||
GiST indexes are most useful when you have data that can in some way overlap with the value of that same column but from another row. The best thing about GiST indexes: if you have say a geometry data type and you want to see if two polygons contained some point. In one case a specific point may be contained within box, while another point only exists within one polygon. The most common datatypes where you want to leverage GiST indexes are:
|
||||
|
||||
* Geometry types
|
||||
|
||||
* Text when dealing with full-text search
|
||||
|
||||
GiST indexes have some more fixed constraints around size, whereas GIN indexes can become quite large. As a result, GiST indexes are lossy. From the docs: _“A GiST index is lossy, meaning that the index might produce false matches, and it is necessary to check the actual table row to eliminate such false matches. (PostgreSQL does this automatically when needed.)”_ This doesn’t mean you’ll get wrong results, it just means Postgres has to do a little extra work to filter those false positives before giving your data back to you.
|
||||
|
||||
_Special note: GIN and GiST indexes can often be beneficial on the same column types. One can often boast better performance but larger disk footprint in the case of GIN and vice versa for GiST. When it comes to GIN vs. GiST there isn’t a perfect one size fits all, but the broad rules above apply_
|
||||
|
||||
### SP-GiST indexes, for larger data
|
||||
|
||||
Space partitioned GiST indexes leverage space partitioning trees that came out of some research from [Purdue][9]. SP-GiST indexes are most useful when your data has a natural clustering element to it, and is also not an equally balanced tree. A great example of this is phone numbers (at least US ones). They follow a format of:
|
||||
|
||||
* 3 digits for area code
|
||||
|
||||
* 3 digits for prefix (historically related to a phone carrier’s switch)
|
||||
|
||||
* 4 digits for line number
|
||||
|
||||
This means that you have some natural clustering around the first set of 3 digits, around the second set of 3 digits, then numbers may fan out in a more even distribution. But, with phone numbers some area codes have a much higher saturation than others. The result may be that the tree is very unbalanced. Because of that natural clustering up front and the unequal distribution of data–data like phone numbers could make a good case for SP-GiST.
|
||||
|
||||
### BRIN indexes, for larger data
|
||||
|
||||
Block range indexes can focus on some similar use cases to SP-GiST in that they’re best when there is some natural ordering to the data, and the data tends to be very large. Have a billion record table especially if it’s time series data? BRIN may be able to help. If you’re querying against a large set of data that is naturally grouped together such as data for several zip codes (which then roll up to some city) BRIN helps to ensure that similar zip codes are located near each other on disk.
|
||||
|
||||
When you have very large datasets that are ordered such as dates or zip codes BRIN indexes allow you to skip or exclude a lot of the unnecessary data very quickly. BRIN additionally are maintained as smaller indexes relative to the overall datasize making them a big win for when you have a large dataset.
|
||||
|
||||
### Hash indexes, finally crash safe
|
||||
|
||||
Hash indexes have been around for years within Postgres, but until Postgres 10 came with a giant warning that they were not WAL-logged. This meant if your server crashed and you failed over to a stand-by or recovered from archives using something like [wal-g][10] then you’d lose that index until you recreated it. With Postgres 10 they’re now WAL-logged so you can start to consider using them again, but the real question is should you?
|
||||
|
||||
Hash indexes at times can provide faster lookups than B-Tree indexes, and can boast faster creation times as well. The big issue with them is they’re limited to only equality operators so you need to be looking for exact matches. This makes hash indexes far less flexible than the more commonly used B-Tree indexes and something you won’t want to consider as a drop-in replacement but rather a special case.
|
||||
|
||||
### Which do you use?
|
||||
|
||||
We just covered a lot and if you’re a bit overwhelmed you’re not alone. If all you knew before was `CREATE INDEX` you’ve been using B-Tree indexes all along, and the good news is you’re still performing as well or better than most databases that aren’t Postgres :) As you start to use more Postgres features consider this a cheatsheet for when to use other Postgres types:
|
||||
|
||||
* B-Tree - For most datatypes and queries
|
||||
|
||||
* GIN - For JSONB/hstore/arrays
|
||||
|
||||
* GiST - For full text search and geospatial datatypes
|
||||
|
||||
* SP-GiST - For larger datasets with natural but uneven clustering
|
||||
|
||||
* BRIN - For really large datasets that line up sequentially
|
||||
|
||||
* Hash - For equality operations, and generally B-Tree still what you want here
|
||||
|
||||
If you have any questions or feedback about the post feel free to join us in our [slack channel][11].
|
||||
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
via: https://www.citusdata.com/blog/2017/10/17/tour-of-postgres-index-types/
|
||||
|
||||
作者:[Craig Kerstiens ][a]
|
||||
译者:[译者ID](https://github.com/译者ID)
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||
|
||||
[a]:https://www.citusdata.com/blog/2017/10/17/tour-of-postgres-index-types/
|
||||
[1]:https://www.citusdata.com/blog/2016/07/14/choosing-nosql-hstore-json-jsonb/
|
||||
[2]:https://www.citusdata.com/blog/2016/07/14/choosing-nosql-hstore-json-jsonb/
|
||||
[3]:https://www.citusdata.com/blog/2017/10/11/index-all-the-things-in-postgres/
|
||||
[4]:https://www.citusdata.com/blog/2017/09/29/what-performance-can-you-expect-from-postgres/
|
||||
[5]:https://www.citusdata.com/product/cloud
|
||||
[6]:https://en.wikipedia.org/wiki/B-tree
|
||||
[7]:https://www.8kdata.com/blog/postgresql-page-layout/
|
||||
[8]:https://www.postgresql.org/docs/10/static/gin.html
|
||||
[9]:https://www.cs.purdue.edu/spgist/papers/W87R36P214137510.pdf
|
||||
[10]:https://www.citusdata.com/blog/2017/08/18/introducing-wal-g-faster-restores-for-postgres/
|
||||
[11]:https://slack.citusdata.com/
|
||||
[12]:https://twitter.com/share?url=https://www.citusdata.com/blog/2017/10/17/tour-of-postgres-index-types/&text=A%20tour%20of%20Postgres%20Index%20Types&via=citusdata
|
||||
[13]:https://www.linkedin.com/shareArticle?mini=true&url=https://www.citusdata.com/blog/2017/10/17/tour-of-postgres-index-types/
|
123
translated/tech/20171017 A tour of Postgres Index Types.md
Normal file
123
translated/tech/20171017 A tour of Postgres Index Types.md
Normal file
@ -0,0 +1,123 @@
|
||||
# Postgres 索引类型探索之旅
|
||||
|
||||
在 Citus上,为让事情做的更好,我们与客户一起在数据建模、优化查询、和增加 [索引][3]上花了一些时间。我的目标是为客户需要提供更好的服务,让你更成功。部分原因是[我们持续][5]为你的 Citus 集群保持良好的优化和 [高性能][4]。 另外部分是帮你了解你所需要的关于 Postgres and Citus的一切。毕竟,一个健康和高性能的数据库意味着 app 执行的更快,并且谁不愿意这样呢? 今天,我们简化一些内容,与客户仅分享关于 Postgres 索引的一些信息。
|
||||
|
||||
Postgres 有几种索引类型, 并且每个新版本都似乎增加一些新的索引类型。每个索引类型都是有用的,但是使用那种类型取决于 1\. (有时是)数据类型 2\. 表中的底层数据、和 3\. 执行的查找类型。 接下来的内容我们将介绍在 Postgres 中你可用的索引类型,以及你何时使用何种索引类型。在开始之前,这里有一个我们将带你亲历的索引类型列表:
|
||||
|
||||
* B-Tree
|
||||
|
||||
* Generalized Inverted Index (GIN)
|
||||
|
||||
* Generalized Inverted Seach Tree (GiST)
|
||||
|
||||
* Space partitioned GiST (SP-GiST)
|
||||
|
||||
* Block Range Indexes (BRIN)
|
||||
|
||||
* Hash
|
||||
|
||||
现在开始介绍索引
|
||||
|
||||
### 在 Postgres 中, 一个 B-Tree 索引是你使用的最普遍的索引
|
||||
|
||||
如果你有一个计算机科学的学位,那么 B-Tree 索引可能是你学会的第一个索引。一个 [B-tree 索引][6] 创建一个保持自身平衡的一棵树。当它根据索引去查找某个东西时,它会遍历这棵树去找到键,然后返回你要查找的数据。使用一个索引是大大快于顺序扫描的,因为相对于顺序扫描成千上万的记录,它可以仅需要读几个 [页][7] (当你仅返回几个记录时)。
|
||||
|
||||
如果你运行一个标准的 `CREATE INDEX` ,它将为你创建一个 B-tree 索引。 B-tree 索引在大多数的数据类型上是很有价值的,比如 text、numbers、和 timestamps。如果你正好在你的数据库中使用索引, 并且不在你的数据库上使用太多的 Postgres 的高级特性,使用标准的 B-Tree 索引可能是你最好的选择。
|
||||
|
||||
### GIN 索引,用于多值列
|
||||
|
||||
Generalized Inverted Indexes,一般称为 [GIN][8],大多适用于当单个列中包含多个值的数据类型
|
||||
|
||||
在 Postgres 文档中: _“GIN 是设计用于处理被索引的条目是复合值的情况的, 并且由索引处理的查询需要搜索在复合条目中出现的值。例如,这个条目可能是文档,并且查询可以搜索文档中包含的指定字符。”_
|
||||
|
||||
包含在这个范围内的最常见的数据类型有:
|
||||
|
||||
* [hStore][1]
|
||||
|
||||
* Arrays
|
||||
|
||||
* Range types
|
||||
|
||||
* [JSONB][2]
|
||||
|
||||
关于 GIN 索引中最让人满意的一件事是,它们知道索引的数据在复合值中。但是,因为一个 GIN 索引有一个关于对需要被添加的每个单独的类型支持的数据结构的特定的知识,因此,GIN 索引并不是支持所有的数据类型。
|
||||
|
||||
### GiST 索引, 用于有重叠值的行
|
||||
|
||||
GiST 索引多适用于当你的数据与同一列的其它行数据重叠时。关于 GiST 索引最好的用处是:如果你声明一个几何数据类型,并且你希望去看两个多边型包含的一些点。在一个例子中一个特定的点可能被包含在一个 box 中,而与此同时,其它的点仅存在于一个多边形中。你想去使用 GiST 索引的常见数据类型有:
|
||||
|
||||
* 几何类型
|
||||
|
||||
* 当需要进行全文搜索的文本类型
|
||||
|
||||
GiST 索引在大小上有很多的限制,否则,GiST 索引可能会变的特别大。最后导致 GiST 索引产生损害。从官方文档中: _“一个 GiST 索引是有损害的,意味着索引可能产生虚假的匹配,并且需要去检查真实的表行去消除虚假的匹配。 (当需要时 PostgreSQL 会自动执行这个动作)”_ 这并不意味着你会得到一个错误的结果,它正好说明了在 Postgres 给你返回数据之前,做了一个很小的额外的工作去过滤这些虚假结果。
|
||||
|
||||
_特别提示: GIN 和 GiST 索引可能经常在相同的数据类型上有益处的。其中之一是可能经常有很好的性能表现,但是,使用 GIN 可能占用很大的磁盘空间,并且对于 GiST 反之亦然。说到 GIN vs. GiST 的比较,并没有一个完美的大小去适用所有案例,但是,以上规则应用于大部分常见情况。_
|
||||
|
||||
### SP-GiST 索引,用于大的数据
|
||||
|
||||
空间分区的 GiST 索引利用来自 [Purdue][9] 研究的一些空间分区树。 SP-GiST 索引经常用于,当你的数据有一个天然的聚集因素并且还不是一个平衡树的时候。 电话号码是一个非常好的例子 (至少 US 的电话号码是)。 它们有如下的格式:
|
||||
|
||||
* 3 位数字的区域号
|
||||
|
||||
* 3 位数字的前缀号 (与以前的电话交换机有关)
|
||||
|
||||
* 4 位的线路号
|
||||
|
||||
这意味着第一组前三位处有一个天然的聚集因素, 接着是第二组三位, 然后的数字才是一个均匀的分布。但是,在电话号码的一些区域号中,存在一个比其它区域号更高的饱合状态。结果可能导致树非常的不平衡。因为前面有一个天然的聚集因素,并且像电话号码一样数据到数据的不对等分布,可能会是 SP-GiST 的一个很好的案例。
|
||||
|
||||
### BRIN 索引, 用于大的数据
|
||||
|
||||
BRIN 索引可以专注于一些类似使用 SP-GiST 的案例,当数据有一些自然的排序,并且往往数据量很大时,它们的性能表现是最好的。如果有一个以时间为序的 10 亿条的记录, BRIN 可能对它很有帮助。如果你正在查询一组很大的有自然分组的数据,如有几个 zip 代码的数据,BRIN 能帮你确保类似的 zip 代码在磁盘上位于它们彼此附近。
|
||||
|
||||
当你有一个非常大的比如以日期或 zip 代码排序的数据库, BRIN 索引可以允许你非常快的去跳过或排除一些不需要的数据。此外,与整体数据量大小相比,BRIN 索引相对较小,因此,当你有一个大的数据集时,BRIN 索引就可以表现出较好的性能。
|
||||
|
||||
### Hash 索引, 总算崩溃安全了
|
||||
|
||||
Hash 索引在 Postgres 中已经存在多年了,但是,在 Postgres 10 发布之前,它们一直有一个巨大的警告,不能使用 WAL-logged。这意味着如果你的服务器崩溃,并且你无法使用如 [wal-g][10] 故障转移到备机或从存档中恢复,那么你将丢失那个索引,直到你重建它。 随着 Postgres 10 发布,它们现在可以使用 WAL-logged,因此,你可以再次考虑使用它们 ,但是,真正的问题是,你应该这样做吗?
|
||||
|
||||
Hash 索引有时会提供比 B-Tree 索引更快的查找,并且创建也很快。最大的问题是它们被限制仅用于相等的比较操作,因此你只能用于精确匹配的查找。这使得 hash 索引的灵活性远不及通常使用的 B-Tree 索引,并且,你不能把它看成是一种替代,而是一种使用于特殊情况的索引。
|
||||
|
||||
### 你该使用哪个?
|
||||
|
||||
我们刚才介绍了很多,如果你有点被吓到,也很正常。 如果在你知道这些之前, `CREATE INDEX` ,将始终为你创建使用 B-Tree 索引,并且有一个好消息是,对于大多数的 Postgres 数据库,你做的一直很好或非常好。 :) 从你开始使用更多的 Postgres 特性的角度来说,下面是一个当你使用其它 Postgres 索引类型的备忘清单:
|
||||
|
||||
* B-Tree - 适用于大多数的数据类型和查询
|
||||
|
||||
* GIN - 适用于 JSONB/hstore/arrays
|
||||
|
||||
* GiST - 适用于全文搜索和几何数据类型
|
||||
|
||||
* SP-GiST - 适用于有天然的聚集因素但是分布不均匀的大数据集
|
||||
|
||||
* BRIN - 适用于有顺序排列的真正的大数据集
|
||||
|
||||
* Hash - 适用于等式操作,而且,通常情况下 B-Tree 索引仍然是你所需要的。
|
||||
|
||||
如果你有关于这篇文章的任何问题或反馈,欢迎加入我们的 [slack channel][11]。
|
||||
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
via: https://www.citusdata.com/blog/2017/10/17/tour-of-postgres-index-types/
|
||||
|
||||
作者:[Craig Kerstiens ][a]
|
||||
译者:[qhwdw](https://github.com/qhwdw)
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||
|
||||
[a]:https://www.citusdata.com/blog/2017/10/17/tour-of-postgres-index-types/
|
||||
[1]:https://www.citusdata.com/blog/2016/07/14/choosing-nosql-hstore-json-jsonb/
|
||||
[2]:https://www.citusdata.com/blog/2016/07/14/choosing-nosql-hstore-json-jsonb/
|
||||
[3]:https://www.citusdata.com/blog/2017/10/11/index-all-the-things-in-postgres/
|
||||
[4]:https://www.citusdata.com/blog/2017/09/29/what-performance-can-you-expect-from-postgres/
|
||||
[5]:https://www.citusdata.com/product/cloud
|
||||
[6]:https://en.wikipedia.org/wiki/B-tree
|
||||
[7]:https://www.8kdata.com/blog/postgresql-page-layout/
|
||||
[8]:https://www.postgresql.org/docs/10/static/gin.html
|
||||
[9]:https://www.cs.purdue.edu/spgist/papers/W87R36P214137510.pdf
|
||||
[10]:https://www.citusdata.com/blog/2017/08/18/introducing-wal-g-faster-restores-for-postgres/
|
||||
[11]:https://slack.citusdata.com/
|
||||
[12]:https://twitter.com/share?url=https://www.citusdata.com/blog/2017/10/17/tour-of-postgres-index-types/&text=A%20tour%20of%20Postgres%20Index%20Types&via=citusdata
|
||||
[13]:https://www.linkedin.com/shareArticle?mini=true&url=https://www.citusdata.com/blog/2017/10/17/tour-of-postgres-index-types/
|
Loading…
Reference in New Issue
Block a user