mirror of
https://github.com/LCTT/TranslateProject.git
synced 2025-03-27 02:30:10 +08:00
申请翻译 20200610 Concise data plotting in Python with Altair.md (#19223)
* Update 20200610 Concise data plotting in Python with Altair.md * Update 20200610 Concise data plotting in Python with Altair.md * Update 20200610 Concise data plotting in Python with Altair.md * Update 20200610 Concise data plotting in Python with Altair.md * Rename sources/tech/20200610 Concise data plotting in Python with Altair.md to translated/tech/20200610 Concise data plotting in Python with Altair.md
This commit is contained in:
parent
4d9f775e60
commit
d8c32b5f27
@ -1,193 +0,0 @@
|
||||
[#]: collector: (lujun9972)
|
||||
[#]: translator: ( )
|
||||
[#]: reviewer: ( )
|
||||
[#]: publisher: ( )
|
||||
[#]: url: ( )
|
||||
[#]: subject: (Concise data plotting in Python with Altair)
|
||||
[#]: via: (https://opensource.com/article/20/6/altair-python)
|
||||
[#]: author: (Shaun Taylor-Morgan https://opensource.com/users/shaun-taylor-morgan)
|
||||
|
||||
Concise data plotting in Python with Altair
|
||||
======
|
||||
Altair offers an elegant interface and its own plotting language in a
|
||||
Python libraries
|
||||
![metrics and data shown on a computer screen][1]
|
||||
|
||||
The [plotting libraries][2] available in Python offer multiple ways to present data according to your preferences, whether you prize flexibility, design, ease-of-use, or a particular style.
|
||||
|
||||
Plotting in Altair delivers a different solution than others which I find to be simpler as the amount of plotting done scales. It has an elegant interface, thanks to [Vega][3], the declarative plotting language that allows you to define what your plots should look like, rather than writing loops and conditionals to construct them.
|
||||
|
||||
### Fluent in plotting
|
||||
|
||||
I compare each Python plotting library by plotting the same multi-bar plot. Before we go further, note that you may need to tune your Python environment to get this code to run, including the following.
|
||||
|
||||
* Installing a recent version of Python (instructions for [Linux][4], [Mac][5], and [Windows][6])
|
||||
* Verify that version of Python works with the libraries used in the tutorial
|
||||
|
||||
|
||||
|
||||
The data is available online and can be imported using pandas:
|
||||
|
||||
|
||||
```
|
||||
import pandas as pd
|
||||
df = pd.read_csv('<https://anvil.works/blog/img/plotting-in-python/uk-election-results.csv>')
|
||||
```
|
||||
|
||||
Now we're ready to go. As a point of comparison, this is the plot we made in [Matplotlib][7]:
|
||||
|
||||
![Matplotlib UK election results][8]
|
||||
|
||||
The Matplotlib plot took 16 lines of code to create, including manually calculating the positions of each bar.
|
||||
|
||||
Here's how to make a similar plot in Altair:
|
||||
|
||||
|
||||
```
|
||||
import altair as alt
|
||||
|
||||
chart = alt.Chart(df).mark_bar().encode(
|
||||
x='party',
|
||||
y='seats',
|
||||
column='year',
|
||||
color='party',
|
||||
)
|
||||
|
||||
chart.save('altair-elections.html')
|
||||
```
|
||||
|
||||
Much more concise! Just like [Seaborn][9], Altair works with data that has one column per variable ([Long Form][10]). This allows you to map each variable onto an aspect of the plot—Altair calls these aspects "channels." In our case, we want one bar per `party` on the `x`-axis, we want the `seats` each party won on the `y`-axis, and we want to group the bars into `column`s by `year`. We also want to `color` the bars by `party`. That's how you would describe it in words, and it's exactly what the code says!
|
||||
|
||||
Here's what the plot looks like:
|
||||
|
||||
![Altair plot with default styling][11]
|
||||
|
||||
### Tweaking the style
|
||||
|
||||
That's not too far from what we want. The main difference from the Matplotlib plot is that each Altair `year` group is displayed with a little white space in between—this is just a feature of Altair's multi-bar plots, and it's not a problem.
|
||||
|
||||
However, there are a few other little style improvements that we do want to make.
|
||||
|
||||
#### Non-integer values
|
||||
|
||||
The two non-integer year names (`Feb 1974` and `Oct 1974`) are displayed as `NaN`. We can fix these by casting our `year` values to strings:
|
||||
|
||||
|
||||
```
|
||||
` df['year'] = df['year'].astype(str)`
|
||||
```
|
||||
|
||||
#### Specifying sort order
|
||||
|
||||
We then also need to tell Altair how to sort the data. Altair allows us to specify more details about the `column` channel by passing it a `Column` object. So we tell it to sort in the order that the data appears in the DataFrame:
|
||||
|
||||
|
||||
```
|
||||
chart = alt.Chart(df).mark_bar().encode(
|
||||
# ...
|
||||
column=alt.Column('year', sort=list(df['year']), title=None),
|
||||
# ...
|
||||
)
|
||||
```
|
||||
|
||||
#### Removing axis titles
|
||||
|
||||
We have removed the "year" label from the top of the plot by setting `title=None`. Let's also remove the "party" labels from each column:
|
||||
|
||||
|
||||
```
|
||||
chart = alt.Chart(df).mark_bar().encode(
|
||||
x=alt.X('party', title=None),
|
||||
# ...
|
||||
)
|
||||
```
|
||||
|
||||
#### Specifying a colormap
|
||||
|
||||
Finally, we want to specify our own colors to use for the bars. Altair lets us specify a mapping between values in a `domain` and colors in a `range`, which is exactly what we need:
|
||||
|
||||
|
||||
```
|
||||
cmap = {
|
||||
'Conservative': '#0343df',
|
||||
'Labour': '#e50000',
|
||||
'Liberal': '#ffff14',
|
||||
'Others': '#929591',
|
||||
}
|
||||
|
||||
chart = alt.Chart(df).mark_bar().encode(
|
||||
# ...
|
||||
color=alt.Color('party', scale=alt.Scale(domain=list(cmap.keys()), range=list(cmap.values())))
|
||||
)
|
||||
```
|
||||
|
||||
#### Final code with style tweaks
|
||||
|
||||
After applying these styling tweaks, our code is a little less pleasing to the eye, but it still uses the declarative approach that makes Altair so scalable. We're still assigning independent variables from our data to separate aspects of our plot, rather than performing complex data manipulations as we often need to do in Matplotlib. The only difference is that our variable names are now wrapped in objects such as `alt.X()` that let us tweak how they appear:
|
||||
|
||||
|
||||
```
|
||||
import altair as alt
|
||||
from votes import long as df
|
||||
|
||||
cmap = {
|
||||
'Conservative': '#0343df',
|
||||
'Labour': '#e50000',
|
||||
'Liberal': '#ffff14',
|
||||
'Others': '#929591',
|
||||
}
|
||||
|
||||
df['year'] = df['year'].astype(str)
|
||||
|
||||
# We're still assigning, e.g. 'party' to x, but now we've wrapped it
|
||||
# in alt.X in order to specify its styling
|
||||
chart = alt.Chart(df).mark_bar().encode(
|
||||
x=alt.X('party', title=None),
|
||||
y='seats',
|
||||
column=alt.Column('year', sort=list(df['year']), title=None),
|
||||
color=alt.Color('party', scale=alt.Scale(domain=list(cmap.keys()), range=list(cmap.values())))
|
||||
)
|
||||
|
||||
chart.save('altair-elections.html')
|
||||
```
|
||||
|
||||
In fairness to Matplotlib, we've now reached the same number of lines of code (16) as we used there!
|
||||
|
||||
Here's the Altair plot with our styling tweaks applied:
|
||||
|
||||
![The Altair plot with our custom styling][12]
|
||||
|
||||
### Conclusion** **
|
||||
|
||||
While the amount of code it takes to plot using Altair is similar to other libraries, its declarative plotting language adds a layer of sophistication I have grown to appreciate. Altair also offers a cleanly separate way to tune and tweak the style, which keeps that syntax out of the code blocks meant for plotting. Altair is another great library when considering a plotting solution for Python.
|
||||
|
||||
\---
|
||||
|
||||
_This article was first shared [here][13] and is edited and republished with permission._
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
via: https://opensource.com/article/20/6/altair-python
|
||||
|
||||
作者:[Shaun Taylor-Morgan][a]
|
||||
选题:[lujun9972][b]
|
||||
译者:[译者ID](https://github.com/译者ID)
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||
|
||||
[a]: https://opensource.com/users/shaun-taylor-morgan
|
||||
[b]: https://github.com/lujun9972
|
||||
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/metrics_data_dashboard_system_computer_analytics.png?itok=oxAeIEI- (metrics and data shown on a computer screen)
|
||||
[2]: https://opensource.com/article/20/4/plot-data-python
|
||||
[3]: https://vega.github.io/vega/
|
||||
[4]: https://opensource.com/article/20/4/install-python-linux
|
||||
[5]: https://opensource.com/article/19/5/python-3-default-mac
|
||||
[6]: https://opensource.com/article/19/8/how-install-python-windows
|
||||
[7]: https://opensource.com/article/20/5/matplotlib-python
|
||||
[8]: https://opensource.com/sites/default/files/uploads/matplotlib_1_1.png (Matplotlib UK election results)
|
||||
[9]: https://anvil.works/blog/plotting-in-seaborn
|
||||
[10]: https://anvil.works/blog/tidy-data
|
||||
[11]: https://opensource.com/sites/default/files/uploads/altair-first-try.png (Altair plot with default styling)
|
||||
[12]: https://opensource.com/sites/default/files/uploads/altair_3.png (The Altair plot with our custom styling.)
|
||||
[13]: https://anvil.works/blog/plotting-in-altair
|
@ -0,0 +1,191 @@
|
||||
[#]: collector: (lujun9972)
|
||||
[#]: translator: ()
|
||||
[#]: reviewer: ( )
|
||||
[#]: publisher: ( )
|
||||
[#]: url: ( )
|
||||
[#]: subject: (Concise data plotting in Python with Altair)
|
||||
[#]: via: (https://opensource.com/article/20/6/altair-python)
|
||||
[#]: author: (Shaun Taylor-Morgan https://opensource.com/users/shaun-taylor-morgan)
|
||||
|
||||
Python 下使用 Altair 数据制图简明教程
|
||||
======
|
||||
Altair 作为一个 Python 数据制图库,提供了优雅的接口及自有的绘图语言。
|
||||
![metrics and data shown on a computer screen][1]
|
||||
|
||||
Python 中的 [绘图库][2] 提供了呈现数据的多种方式,可以满足你不同的偏好,如灵活性、布局、易用性,或者特殊的类型。
|
||||
|
||||
和其它方式相比,我发现,Altair 提供的是一种不同的解决方案,且总体而言使用起来更为简单。得益于声明式的绘图语言 [Vega][3],Altair 拥有一套优雅的接口,可以直接定义要绘的图应该是什么样子,而不是通过写一大堆循环和条件判断去一步步构建。
|
||||
|
||||
### 绘图流程
|
||||
|
||||
我通过绘制同一个多柱状图比较了多个 Python 绘图库的差异。正式开始之前,你需要将你的 Python 环境调整到能运行下面代码的状态。具体就是:
|
||||
|
||||
* 安装最新版的 Python( [Linux][4]、[Mac][5] 和 [Windows][6] 系统下的安装方法)
|
||||
* 确认该版本 Python 可以运行本教程所使用的库
|
||||
|
||||
|
||||
|
||||
演示用数据可从网络下载,并且可以用 pandas 直接导入:
|
||||
|
||||
|
||||
```
|
||||
import pandas as pd
|
||||
df = pd.read_csv('<https://anvil.works/blog/img/plotting-in-python/uk-election-results.csv>')
|
||||
```
|
||||
|
||||
准备开始吧。为了做个比较,先看下面这个用 [Matplotlib][7] 做的图:
|
||||
|
||||
![Matplotlib UK election results][8]
|
||||
|
||||
使用 Matplotlib 需要 16 行代码,图柱的位置需要自己计算。
|
||||
|
||||
使用 Altair 绘制相似的图,代码如下:
|
||||
|
||||
|
||||
```
|
||||
import altair as alt
|
||||
|
||||
chart = alt.Chart(df).mark_bar().encode(
|
||||
x='party',
|
||||
y='seats',
|
||||
column='year',
|
||||
color='party',
|
||||
)
|
||||
|
||||
chart.save('altair-elections.html')
|
||||
```
|
||||
|
||||
真是简洁多了!与 [Seaborn][9] 类似,Altair 所用数据的组织形式是每个变量一列(即 [数据列][10] )。这种方式下可以将每个变量映射到图的一个属性上—— Altair 称之为”通道“。在上例中,我们期望每个 “党派” 在 `x` 轴上显示为一组图柱, 其 “席位” 显示在 `y` 轴,且将图柱按照 “年份” 分开为 “列”。我们还想根据 “党派” 给图柱使用不同的 “颜色”。用语言表述需求的话就是上面这个样子,而这也正是代码所要表述的!
|
||||
|
||||
现在把图画出来:
|
||||
|
||||
![Altair plot with default styling][11]
|
||||
|
||||
### 调整样式
|
||||
|
||||
这和我们期待的效果有点接近了。与 Matplotlib 方案相比,主要区别在于 Altair 方案中,每个 `year` 组显示的时候,内部之间都有个小空白——这不是问题,这只是 Altair 多柱状图显示的一个特性。
|
||||
|
||||
所以说呢,还需要对显示样式再做一些改进。
|
||||
|
||||
#### 非整形数据
|
||||
|
||||
两个不是整数的年份名称(`Feb 1974` 和 `Oct 1974`)显示为 `NaN` 了。这可以通过将年份数值 `year` 转换为字符串来解决:
|
||||
|
||||
|
||||
```
|
||||
` df['year'] = df['year'].astype(str)`
|
||||
```
|
||||
|
||||
#### 指定数据排序方法
|
||||
|
||||
还需要让 Altair 知道如何对数据进行排序。Altair 允许通过传给它一个 `Column` 对象,来设定 `Column` 通道的更多细节。现在让 Altair 按照数据在数据集中出现的顺序排列:
|
||||
|
||||
```
|
||||
chart = alt.Chart(df).mark_bar().encode(
|
||||
# ...
|
||||
column=alt.Column('year', sort=list(df['year']), title=None),
|
||||
# ...
|
||||
)
|
||||
```
|
||||
|
||||
#### 移除坐标轴标签
|
||||
|
||||
我们通过设置 `title=None` 移除了图顶的 "year" 标签。下面再一处每列数据的 "party" 标签:
|
||||
|
||||
|
||||
```
|
||||
chart = alt.Chart(df).mark_bar().encode(
|
||||
x=alt.X('party', title=None),
|
||||
# ...
|
||||
)
|
||||
```
|
||||
|
||||
#### 指定颜色图
|
||||
|
||||
最后,我们还想自己指定图柱的颜色。Altair 允许建立 `domain` 中数值与 `range` 中颜色的映射来实现所需功能,太贴心了:
|
||||
|
||||
|
||||
```
|
||||
cmap = {
|
||||
'Conservative': '#0343df',
|
||||
'Labour': '#e50000',
|
||||
'Liberal': '#ffff14',
|
||||
'Others': '#929591',
|
||||
}
|
||||
|
||||
chart = alt.Chart(df).mark_bar().encode(
|
||||
# ...
|
||||
color=alt.Color('party', scale=alt.Scale(domain=list(cmap.keys()), range=list(cmap.values())))
|
||||
)
|
||||
```
|
||||
|
||||
#### 样式调整后的最终代码
|
||||
|
||||
应用上述样式调整之后,代码看起来不那么悦目了,但我们仍然是用声明的方式实现的,这正式 Altair 如此有弹性的原因所在。实现过程中,仍然是使用的异于显示数据的独立变量来分离图中不同属性的,而不是像在 Matplotlib 中那样直接对显示数据做复杂的操作。唯一的不同是,我们的变量名字封装在类似 `alt.X()` 的对象中,从而实现对显示效果的控制:
|
||||
|
||||
|
||||
```
|
||||
import altair as alt
|
||||
from votes import long as df
|
||||
|
||||
cmap = {
|
||||
'Conservative': '#0343df',
|
||||
'Labour': '#e50000',
|
||||
'Liberal': '#ffff14',
|
||||
'Others': '#929591',
|
||||
}
|
||||
|
||||
df['year'] = df['year'].astype(str)
|
||||
|
||||
# We're still assigning, e.g. 'party' to x, but now we've wrapped it
|
||||
# in alt.X in order to specify its styling
|
||||
chart = alt.Chart(df).mark_bar().encode(
|
||||
x=alt.X('party', title=None),
|
||||
y='seats',
|
||||
column=alt.Column('year', sort=list(df['year']), title=None),
|
||||
color=alt.Color('party', scale=alt.Scale(domain=list(cmap.keys()), range=list(cmap.values())))
|
||||
)
|
||||
|
||||
chart.save('altair-elections.html')
|
||||
```
|
||||
|
||||
现在与 Matplotlib 方案扯平了,代码数量达到了 16 行!
|
||||
|
||||
下图是使用我们的样式调整方案之后的 Altair 效果图:
|
||||
|
||||
![The Altair plot with our custom styling][12]
|
||||
|
||||
### **结论**
|
||||
|
||||
尽管在代码数量上,使用 Altair 绘图没有表现出优势,但它的声明式绘图语言使得对图层的操控更为精密,这是我比较欣赏的。Altair 还提供了清晰而独立的方式来调校显示样式,这使得 相关代码与绘图的代码块分离开来。Altair 确实是使用 Python 绘图时又一个很棒的工具库。
|
||||
|
||||
\---
|
||||
|
||||
_本文首次发布于 [这里][13],蒙允编辑后再次发布_
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
via: https://opensource.com/article/20/6/altair-python
|
||||
|
||||
作者:[Shaun Taylor-Morgan][a]
|
||||
选题:[lujun9972][b]
|
||||
译者:[silentdawn-zz](https://github.com/译者ID)
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||
|
||||
[a]: https://opensource.com/users/shaun-taylor-morgan
|
||||
[b]: https://github.com/lujun9972
|
||||
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/metrics_data_dashboard_system_computer_analytics.png?itok=oxAeIEI- (metrics and data shown on a computer screen)
|
||||
[2]: https://opensource.com/article/20/4/plot-data-python
|
||||
[3]: https://vega.github.io/vega/
|
||||
[4]: https://opensource.com/article/20/4/install-python-linux
|
||||
[5]: https://opensource.com/article/19/5/python-3-default-mac
|
||||
[6]: https://opensource.com/article/19/8/how-install-python-windows
|
||||
[7]: https://opensource.com/article/20/5/matplotlib-python
|
||||
[8]: https://opensource.com/sites/default/files/uploads/matplotlib_1_1.png (Matplotlib UK election results)
|
||||
[9]: https://anvil.works/blog/plotting-in-seaborn
|
||||
[10]: https://anvil.works/blog/tidy-data
|
||||
[11]: https://opensource.com/sites/default/files/uploads/altair-first-try.png (Altair plot with default styling)
|
||||
[12]: https://opensource.com/sites/default/files/uploads/altair_3.png (The Altair plot with our custom styling.)
|
||||
[13]: https://anvil.works/blog/plotting-in-altair
|
Loading…
Reference in New Issue
Block a user