Article has been translated

Article has been translated
This commit is contained in:
zxp 2020-12-14 22:13:08 +08:00
parent d8673b7c66
commit ddff56ea75
2 changed files with 154 additions and 174 deletions

View File

@ -1,174 +0,0 @@
[#]: collector: (lujun9972)
[#]: translator: (zhangxiangping)
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (How I use Python to map the global spread of COVID-19)
[#]: via: (https://opensource.com/article/20/4/python-map-covid-19)
[#]: author: (AnuragGupta https://opensource.com/users/999anuraggupta)
我如何使用Python绘制COVID-19的全球扩散图
======
使用这些开源框架创建一个彩色地图,显示病毒的可能的传播路径。![Globe up in the clouds][1]
Create a color coded geographic map of the potential spread of the virus
using these open source scripts.
![Globe up in the clouds][1]
The spread of disease is a real concern for a world in which global travel is commonplace. A few organizations track significant epidemics (and any pandemic), and fortunately, they publish their work as open data. The raw data can be difficult for humans to process, though, and that's why data science is so vital. For instance, it could be useful to visualize the worldwide spread of COVID-19 with Python and Pandas.
It can be hard to know where to start when you're faced with large amounts of raw data. The more you do it, however, the more patterns begin to emerge. Here's a common scenario, applied to COVID-19 data:
1. Download COVID-19 country spread daily data into a Pandas DataFrame object from GitHub. For this, you need the Python Pandas library.
2. Process and clean the downloaded data and make it suitable for visualizing. The downloaded data (as you will see for yourself) is in quite good condition. The one problem with this data is that it uses the names of countries, but it's better to use three-digit ISO 3 codes. To generate the three-digit ISO 3 codes, use a small Python library called pycountry. Having generated these codes, you can add an extra column to our DataFrame and populate it with these codes.
3. Finally, for the visualization, use the **express** module of a library called Plotly. This article uses what are called choropleth maps (available in Plotly) to visualize the worldwide spread of the disease.
### Step 1: Corona data
We will download the latest corona data from:
<https://raw.githubusercontent.com/datasets/covid-19/master/data/countries-aggregated.csv>
We will load the data directly into a Pandas DataFrame. Pandas provides a function, **read_csv()**, which can take a URL and return a DataFrame object as shown below:
```
import pycountry
import plotly.express as px
import pandas as pd
URL_DATASET = r'<https://raw.githubusercontent.com/datasets/covid-19/master/data/countries-aggregated.csv>'
df1 = pd.read_csv(URL_DATASET)
print(df1.head(3))  # Get first 3 entries in the dataframe
print(df1.tail(3))  # Get last 3 entries in the dataframe
```
The screenshot of output (on Jupyter) is:
![Jupyter screenshot][2]
From output, you can see that the DataFrame (df1) has the following columns:
1. Date
2. Country
3. Confirmed
4. Recovered
5. Dead
Further, you can see that the **Date** column has entries starting from January 22 to March 31. This database is updated daily, so you will get the current values.
### Step 2: Cleaning and modifying the data frame
We need to add another column to this DataFrame, which has the three-letter ISO alpha-3 codes. To do this, I followed these steps:
1. Create a list of all countries in the database. This was required because in the **df**, in the column **Country**, each country was figuring for each date. So in effect, the **Country** column had multiple entries for each country. To do this, I used the **unique().tolist()** functions.
2. Then I took a dictionary **d_country_code** (initially empty) and populated it with keys consisting of country names and values consisting of their three-letter ISO codes.
3. To generate the three-letter ISO code for a country, I used the function **pycountry.countries.search_fuzzy(country)**. You need to understand that the return value of this function is a "list of **Country** objects." I passed the return value of this function to a name country_data. Further, in this list of objects, the first object i.e., at index 0, is the best fit. Further, this **\** object has an attribute **alpha_3**. So, I can "access" the 3 letter ISO code by using **country_data[0].alpha_3**. However, it is possible that some country names in the DataFrame may not have a corresponding ISO code (For example, disputed territories). So, for such countries, I gave an ISO code of "i.e. a blank string. Further, you need to wrap this code in a try-except block. The statement: **print(_could not add ISO 3 code for -&gt;'_, country)** will give a printout of those countries for which the ISO 3 codes could not be found. In fact, you will find such countries as shown with white color in the final output.
4. Having got the three-letter ISO code for each country (or an empty string for some), I added the country name (as key) and its corresponding ISO code (as value) to the dictionary **d_country_code**. For adding these, I used the **update()** method of the Python dictionary object.
5. Having created a dictionary of country names and their codes, I added them to the DataFrame using a simple for loop.
### Step 3: Visualizing the spread using Plotly
A choropleth map is a map composed of colored polygons. It is used to represent spatial variations of a quantity. We will use the express module of Plotly conventionally called **px**. Here we show you how to create a choropleth map using the function: **px.choropleth**.
The signature of this function is:
```
`plotly.express.choropleth(data_frame=None, lat=None, lon=None, locations=None, locationmode=None, geojson=None, featureidkey=None, color=None, hover_name=None, hover_data=None, custom_data=None, animation_frame=None, animation_group=None, category_orders={}, labels={}, color_discrete_sequence=None, color_discrete_map={}, color_continuous_scale=None, range_color=None, color_continuous_midpoint=None, projection=None, scope=None, center=None, title=None, template=None, width=None, height=None)`
```
The noteworthy points are that the **choropleth()** function needs the following things:
1. A geometry in the form of a **geojson** object. This is where things are a bit confusing and not clearly mentioned in its documentation. You may or may not provide a **geojson** object. If you provide a **geojson** object, then that object will be used to plot the earth features, but if you don't provide a **geojson** object, then the function will, by default, use one of the built-in geometries. (In our example here, we will use a built-in geometry, so we won't provide any value for the **geojson** argument)
2. A pandas DataFrame object for the attribute **data_frame**. Here we provide our DataFrame ie **df1** we created earlier.
3. We will use the data of **Confirmed** column to decide the color of each country polygon.
4. Further, we will use the **Date** column to create the **animation_frame**. Thus as we slide across the dates, the colors of the countries will change as per the values in the **Confirmed** column.
The complete code is given below:
```
import pycountry
import plotly.express as px
import pandas as pd
# ----------- Step 1 ------------
URL_DATASET = r'<https://raw.githubusercontent.com/datasets/covid-19/master/data/countries-aggregated.csv>'
df1 = pd.read_csv(URL_DATASET)
# print(df1.head) # Uncomment to see what the dataframe is like
# ----------- Step 2 ------------
list_countries = df1['Country'].unique().tolist()
# print(list_countries) # Uncomment to see list of countries
d_country_code = {}  # To hold the country names and their ISO
for country in list_countries:
    try:
        country_data = pycountry.countries.search_fuzzy(country)
        # country_data is a list of objects of class pycountry.db.Country
        # The first item  ie at index 0 of list is best fit
        # object of class Country have an alpha_3 attribute
        country_code = country_data[0].alpha_3
        d_country_code.update({country: country_code})
    except:
        print('could not add ISO 3 code for -&gt;', country)
        # If could not find country, make ISO code ' '
        d_country_code.update({country: ' '})
# print(d_country_code) # Uncomment to check dictionary  
# create a new column iso_alpha in the df
# and fill it with appropriate iso 3 code
for k, v in d_country_code.items():
    df1.loc[(df1.Country == k), 'iso_alpha'] = v
# print(df1.head)  # Uncomment to confirm that ISO codes added
# ----------- Step 3 ------------
fig = px.choropleth(data_frame = df1,
                    locations= "iso_alpha",
                    color= "Confirmed",  # value in column 'Confirmed' determines color
                    hover_name= "Country",
                    color_continuous_scale= 'RdYlGn',  #  color scale red, yellow green
                    animation_frame= "Date")
fig.show()
```
The output is something like the following:
![Map][3]
You can download and run the [complete code][4].
To wrap up, here are some excellent resources on choropleth in Plotly:
* <https://github.com/plotly/plotly.py/blob/master/doc/python/choropleth-maps.md>
* [https://plotly.com/python/reference/#choropleth][5]
--------------------------------------------------------------------------------
via: https://opensource.com/article/20/4/python-map-covid-19
作者:[AnuragGupta][a]
选题:[lujun9972][b]
译者:[zhangxiangping](https://github.com/zxp93)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/999anuraggupta
[b]: https://github.com/lujun9972
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/cloud-globe.png?itok=_drXt4Tn (Globe up in the clouds)
[2]: https://opensource.com/sites/default/files/uploads/jupyter_screenshot.png (Jupyter screenshot)
[3]: https://opensource.com/sites/default/files/uploads/map_2.png (Map)
[4]: https://github.com/ag999git/jupyter_notebooks/blob/master/corona_spread_visualization
[5]: tmp.azs72dmHFd#choropleth

View File

@ -0,0 +1,154 @@
[#]: collector: (lujun9972)
[#]: translator: (zhangxiangping)
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (How I use Python to map the global spread of COVID-19)
[#]: via: (https://opensource.com/article/20/4/python-map-covid-19)
[#]: author: (AnuragGupta https://opensource.com/users/999anuraggupta)
如何使用Python绘制COVID-19的全球扩散图
======
使用这些开源框架创建一个彩色地图,显示病毒的可能的传播路径。![Globe up in the clouds][1]
在全球范围内的旅行中疾病的传播是一个令人担忧的问题。一些组织会跟踪重大的流行病还有所有普遍的流行病并将他们的跟踪工作获得的数据公开出来。这些原生的数据对人来说可能很难处理这就是为什么数据科学如此重要的原因。比如将COVID-19在全球范围内的传播路径用Python和Pandas进行可视化可能对这些数据的分析有所帮助。
最开始当面对如此大数量的原始数据时可能难以下手。但当你开始处理数据之后慢慢地就会发现一些处理数据的方式。下面是用于处理COVID-19数据的一些常见的情况
1. 从Github上下载COVID-19的国际间每日传播数据保存为一个Pandas中的DataFrame对象。这时你需要使用Python中的Pandas库。
2. 处理并清理下载好的数据使其满足可视化数据的输入格式。所下载的数据的情况很好数据规整。这个数据有一个问题是它用国家的名字来标识国家但最好是使用ISO 3码国家代码表来标识国家。为了生成三位ISO 3码可是使用pycountry这个Python库。生成了这些代码之后可以在原有的DataFrame上增加一列然后用这些代码填充进去。
3. 最后为了可视化使用Plotly库中的**express**模块。这篇文章是使用choropleth maps在Plotly库中来对疾病在全球的传播进行可视化的。
### 第一步Corona数据
从下面这个网站上下载最新的corona数据译者注2020-12-14仍可访问有墙
<https://raw.githubusercontent.com/datasets/covid-19/master/data/countries-aggregated.csv>
我们之间将这个下载好的数据载入为Pandas的DataFrame。Pandas提供了一个函数 **read_csv()**可以直接使用URL读取数据并返回一个DataFrame对象具体如下所示
```
import pycountry
import plotly.express as px
import pandas as pd
URL_DATASET = r'<https://raw.githubusercontent.com/datasets/covid-19/master/data/countries-aggregated.csv>'
df1 = pd.read_csv(URL_DATASET)
print(df1.head(3))  # Get first 3 entries in the dataframe
print(df1.tail(3))  # Get last 3 entries in the dataframe
```
在Jupyter上的输出截图
![Jupyter screenshot][2]
从这个输出可以看到这个DataFramedf1包括以下几列数据
1. Date
2. Country
3. Confirmed
4. Recovered
5. Dead
之后还可以看到**Date**这一列包含了从1月22日到3月31日的条目信息。这个数据是每天更新的所以你会得到你当天的值。
### 第二步:清理和修改数据框
我们要往这个DataFrame中增加一列数据就是那个包含了ISO 3编码。可以通过以下三步完成这个任务
1. 创建一个包含所有国家的list。因为在**df**的**Country**列中,国家都是每个日期就重复一次。所以实际上**Country**列中对每个国家就会有多个条目。我使用**unique().tolist()**函数完成这个任务。
2. 我使用**d_country_code**字典对象初始为空然后将其键设置为国家的名称然后它的值设置为其对应的ISO 3编码。
3. 我使用**pycountry.countries.search_fuzzy(country)**为每个国家生成ISO 3编码。你需要明白的是这个函数的返回值是一个**Country**对象的list。我将这个函数的返回值赋给country_data对象。之以这个对象的第一个元素序号0为例。这个**\**对象有一个**alpha_3**属性。所以我使用**country_data[0].alpha_3**就能"获得"第一个元素的ISO 3编码。然而在这个DataFrame中有些国家的名称可能没有对应的ISO 3编码比如有争议的领土。那么对这些“国家”我就用一个空白字符串来替代ISO编码。你也可以用一个try-except代码来替换这部分。except中的语句可以写**print(_could not add ISO 3 code for -&gt;'_, country)**。这样就能在找不到这些“国家”对应的ISO 3编码时给出一个输出提示。实际上你会发现这些“国家”会在最后的输出中用白色来表示。
4. 在获得了每个国家的ISO 3编码有些是空白字符串之后我把这些国家的名称作为键还有国家对应的ISO 3编码作为值添加到之前的字典**d_country_code**中。可以使用Python中字典对象的**update()**方法来完成这个任务。
5. 在创建好了一个包含国家名称和对应ISO 3编码的字典之后我使用一个简单的循环将他们加入到DataFrame中。
### 第三步使用Plotly可视化传播路径
choropleth图是一个由彩色多边形组成的图。它常常用来表示一个变量在空间中的变化。我们使用Plotly中的**px**模块来创建choropleth图具体函数为**px.choropleth**。
这个函数的所包含的参数如下:
```
`plotly.express.choropleth(data_frame=None, lat=None, lon=None, locations=None, locationmode=None, geojson=None, featureidkey=None, color=None, hover_name=None, hover_data=None, custom_data=None, animation_frame=None, animation_group=None, category_orders={}, labels={}, color_discrete_sequence=None, color_discrete_map={}, color_continuous_scale=None, range_color=None, color_continuous_midpoint=None, projection=None, scope=None, center=None, title=None, template=None, width=None, height=None)`
```
**choropleth()**这个函数还有几点需要注意:
1. **geojson**是一个geometry对象上面函数第六个参数。这个对象有点让人困扰因为在函数文档中没有明确地提到这个对象。你可以提供也可以不提供**geojson**对象。如果你提供了**geojson**对象,那么这个对象就会被用来绘制地球特征,如果不提供**geojson**对象那这个函数默认就会使用一个内建的geometry对象。在我们的实验中我们使用内建的geometry对象因此我们不会为**geojson**参数提供值)
2. DataFrame对象有一个**data_frame**属性,在这里我们先前就提供了一个我们创建好的**df1**。
3. 我们用**Confirmed**(确诊)来决定每个国家多边形的颜色。
4. 最后,我们**Date**列创建一个**animation_frame**。这样我们就能通过日期来划分数据,国家的颜色会随着**Confirmed**的变化而变化。
最后完整的代码如下:
```
import pycountry
import plotly.express as px
import pandas as pd
# ----------- Step 1 ------------
URL_DATASET = r'<https://raw.githubusercontent.com/datasets/covid-19/master/data/countries-aggregated.csv>'
df1 = pd.read_csv(URL_DATASET)
# print(df1.head) # Uncomment to see what the dataframe is like
# ----------- Step 2 ------------
list_countries = df1['Country'].unique().tolist()
# print(list_countries) # Uncomment to see list of countries
d_country_code = {}  # To hold the country names and their ISO
for country in list_countries:
    try:
        country_data = pycountry.countries.search_fuzzy(country)
        # country_data is a list of objects of class pycountry.db.Country
        # The first item  ie at index 0 of list is best fit
        # object of class Country have an alpha_3 attribute
        country_code = country_data[0].alpha_3
        d_country_code.update({country: country_code})
    except:
        print('could not add ISO 3 code for -&gt;', country)
        # If could not find country, make ISO code ' '
        d_country_code.update({country: ' '})
# print(d_country_code) # Uncomment to check dictionary  
# create a new column iso_alpha in the df
# and fill it with appropriate iso 3 code
for k, v in d_country_code.items():
    df1.loc[(df1.Country == k), 'iso_alpha'] = v
# print(df1.head)  # Uncomment to confirm that ISO codes added
# ----------- Step 3 ------------
fig = px.choropleth(data_frame = df1,
                    locations= "iso_alpha",
                    color= "Confirmed",  # value in column 'Confirmed' determines color
                    hover_name= "Country",
                    color_continuous_scale= 'RdYlGn',  #  color scale red, yellow green
                    animation_frame= "Date")
fig.show()
```
这段代码的输出就是下面这个图的内容:
![Map][3]
你可以从这里下载并运行完整代码[complete code][4]。
最后这里还有一些关于Plotly绘制choropleth图的不错的资源。
* <https://github.com/plotly/plotly.py/blob/master/doc/python/choropleth-maps.md>
* [https://plotly.com/python/reference/#choropleth][5]
--------------------------------------------------------------------------------
via: https://opensource.com/article/20/4/python-map-covid-19
作者:[AnuragGupta][a]
选题:[lujun9972][b]
译者:[zhangxiangping](https://github.com/zxp93)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/999anuraggupta
[b]: https://github.com/lujun9972
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/cloud-globe.png?itok=_drXt4Tn (Globe up in the clouds)
[2]: https://opensource.com/sites/default/files/uploads/jupyter_screenshot.png (Jupyter screenshot)
[3]: https://opensource.com/sites/default/files/uploads/map_2.png (Map)
[4]: https://github.com/ag999git/jupyter_notebooks/blob/master/corona_spread_visualization
[5]: tmp.azs72dmHFd#choropleth