From 362d5b81fa0689f3967f6003028cdf439bb04673 Mon Sep 17 00:00:00 2001 From: jasminepeng Date: Tue, 10 Oct 2017 11:24:57 +0800 Subject: [PATCH 01/79] =?UTF-8?q?=E6=A0=A1=E5=AF=B9=E4=B8=AD?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 校对中 --- ...ORWARDING MOUNT A SOCKS SERVER WITH SSH.md | 30 ++++++++++--------- 1 file changed, 16 insertions(+), 14 deletions(-) diff --git a/translated/tech/20170715 DYNAMIC PORT FORWARDING MOUNT A SOCKS SERVER WITH SSH.md b/translated/tech/20170715 DYNAMIC PORT FORWARDING MOUNT A SOCKS SERVER WITH SSH.md index a1a1ddf6e0..a09333be92 100644 --- a/translated/tech/20170715 DYNAMIC PORT FORWARDING MOUNT A SOCKS SERVER WITH SSH.md +++ b/translated/tech/20170715 DYNAMIC PORT FORWARDING MOUNT A SOCKS SERVER WITH SSH.md @@ -1,19 +1,19 @@ -动态端口转发:安装带有SSH的SOCKS服务器 +动态端口转发:安装带有 SSH 的 SOCKS 服务器 ================= -在上一篇文章中([Creating TCP / IP (port forwarding) tunnels with SSH: The 8 scenarios possible using OpenSSH][17]),我们看到了处理端口转发的所有可能情况。但是对于静态端口转发,我们只介绍了通过 SSH 连接来访问另一个系统的端口的情况。 +在上一篇文章([通过 SSH 实现 TCP / IP 隧道(端口转发):使用 OpenSSH 可能的 8 种场景][17])中,我们看到了处理端口转发的所有可能情况,不过只是静态端口转发。也就是说,我们只介绍了通过 SSH 连接来访问另一个系统的端口的情况。 -在这篇文章中,我们脱离动态端口转发的前端,而尝试补充它。 +在那篇文章中,我们未涉及动态端口转发,此外一些读者错过了该文章,本篇文章中将尝试补充完整。 当我们谈论使用 SSH 进行动态端口转发时,我们谈论的是将 SSH 服务器 转换为 [SOCKS][2] 服务器。那么什么是 SOCKS 服务器? -你知道 [Web 代理][3]是用来做什么的吗?答案可能是肯定的,因为很多公司都在使用它。它是一个直接连接到互联网的系统,允许没有互联网访问的[内部网][4]客户端通过配置浏览器的代理来请求(尽管也有[透明代理][5])浏览网页。Web 代理除了允许输出到 Internet 之外,还可以缓存页面,图像等。资源已经由某些客户端下载,所以您不必为另一个客户端而下载它们。此外,它允许过滤内容并监视用户的活动。当然了,它的基本功能是转发 HTTP 和 HTTPS 流量。 +你知道 [Web 代理][3]是用来做什么的吗?答案可能是肯定的,因为很多公司都在使用它。它是一个直接连接到互联网的系统,允许没有互联网访问的[内部网][4]客户端通过配置浏览器的代理来请求(尽管也有[透明代理][5])浏览网页。Web 代理除了允许输出到 Internet 之外,还可以缓存页面,图像等。已经由某客户端下载的资源,另一个客户端不必再下载它们。此外,它还可以过滤内容并监视用户的活动。当然了,它的基本功能是转发 HTTP 和 HTTPS 流量。 一个 SOCKS 服务器提供的服务类似于公司内部网络提供的代理服务器服务,但不限于 HTTP/HTTPS,它还允许转发任何 TCP/IP 流量(SOCKS 5 也是 UDP)。 -例如,假设我们希望在一个没有直接连接到互联网的内部网上使用基于 POP3 或 ICMP 的邮件服务和 Thunderbird 的 SMTP 服务。如果我们只有一个 web 代理可以用,我们可以使用的唯一的简单方式是使用一些 webmail(也可以使用 [Thunderbird 的 Webmail 扩展][6])。我们还可以[通过 HTTP 进行隧道传递][7]来利用代理。但最简单的方式是在网络中设置一个可用的 SOCKS 服务器,它可以让我们使用 POP3、ICMP 和 SMTP,而不会造成任何的不便。 +例如,假设我们希望在一个没有直接连接到互联网的内部网上使用基于 POP3 或 ICMP 的邮件服务和 Thunderbird 的 SMTP 服务。如果我们只有一个 web 代理可以用,我们可以使用的唯一的简单方式是使用某个 webmail(也可以使用 [Thunderbird 的 Webmail 扩展][6])。我们还可以[通过 HTTP 隧道][7]来利用代理。但最简单的方式是在网络中设置一个 SOCKS 服务器,它可以让我们使用 POP3、ICMP 和 SMTP,而不会造成任何的不便。 -虽然有很多软件可以配置非常专业的 SOCKS 服务器,我们这里使用 OpenSSH 简单地设置一个: +有很多软件可以配置非常专业的 SOCKS 服务器,我们这里使用 OpenSSH 简单地设置一个: > ``` > Clientessh $ ssh -D 1080 user @ servidorssh @@ -27,7 +27,7 @@ 其中: -* 选项 `-D` 类似于选项为 `-L` 和 `-R` 的静态端口转发。像这样,我们就可以让客户端只监听本地请求或从其他节点到达的请求,具体的取决于我们将请求关联到哪个地址: +* 选项 `-D` 类似于选项为 `-L` 和 `-R` 的静态端口转发。像这样,我们就可以让客户端只监听本地请求或从其他节点到达的请求,具体取决于我们将请求关联到哪个地址: > ``` > -D [bind_address:] port @@ -41,21 +41,21 @@ * 选项 `-f` 会使 `ssh` 停留在后台并将其与当前 `shell` 分离,以便使进程成为守护进程。如果没有选项 `-N`(或不指定命令),则不起作用,否则交互式 shell 将与后台进程不兼容。 - 使用 [PuTTY][8] 也可以非常简单地进行端口重定向。相当于 `ssh -D 0.0.0.0:1080` 使用此配置: + 使用 [PuTTY][8] 也可以非常简单地进行端口重定向。与 `ssh -D 0.0.0.0:1080` 相当的配置: ![PuTTY SOCKS](https://wesharethis.com/wp-content/uploads/2017/07/putty_socks.png) -对于通过 SOCKS 服务器访问另一个网络的应用程序,如果应用程序提供了特殊的支持,就会非常方便(虽然不是必需的),就像浏览器支持使用代理服务器一样。浏览器(如 Firefox 或 Internet Explorer)是使用 SOCKS 服务器访问另一个网络的应用程序示例: +对于通过 SOCKS 服务器访问另一个网络的应用程序,如果应用程序提供了对 SOCKS 服务器的特别支持,就会非常方便(虽然不是必需的),就像浏览器支持使用代理服务器一样。如 Firefox 或 Internet Explorer 的浏览器是使用 SOCKS 服务器访问另一个网络的应用程序示例: ![Firefox SOCKS](https://wesharethis.com/wp-content/uploads/2017/07/firefox_socks.png) ![Internet Explorer SOCKS](https://wesharethis.com/wp-content/uploads/2017/07/internetexplorer_socks.png) -注意:使用 [IEs 4 Linux][1] 进行捕获:如果您需要 Internet Explorer 并使用 Linux,强烈推荐! +注意:截图来自 [IE for Linux][1] :如果您需要 Internet Explorer 并使用 Linux,强烈推荐! 然而,最常见的浏览器并不要求 SOCKS 服务器,因为它们通常与代理服务器配合得更好。 -Thunderbird 也允许这样做,而且很有用: +不过,Thunderbird 也支持 SOCKS,而且很有用: ![Thunderbird SOCKS](https://wesharethis.com/wp-content/uploads/2017/07/thunderbird_socks.png) @@ -63,11 +63,13 @@ Thunderbird 也允许这样做,而且很有用: ![Spotify SOCKS](https://wesharethis.com/wp-content/uploads/2017/07/spotify_socks.png) -我们需要记住的是名称解析。有时我们会发现,在目前的网络中,我们无法解析 SOCKS 服务器另一端所要访问的系统的名称。SOCKS 5 还允许我们传播 DNS 请求(UDP 允许我们使用 SOCKS 5)并将它们发送到另一端:可以指定是否要本地或远程解析(或者也可以测试两者)。支持这一点的应用程序也必须考虑到这一点。例如,Firefox 具有参数 `network.proxy.socks_remote_dns`(在 `about:config` 中),允许我们指定远程解析。默认情况下,它在本地解析。 +需要关注名称解析。有时我们会发现,在目前的网络中,我们无法解析 SOCKS 服务器另一端所要访问的系统的名称。SOCKS 5 还允许我们传播 DNS 请求(UDP 允许我们使用 SOCKS 5)并将它们发送到另一端:可以指定是本地还是远程解析(或者也可以测试两者)。支持此功能的应用程序也必须考虑到这一点。例如,Firefox 具有参数 `network.proxy.socks_remote_dns`(在 `about:config` 中),允许我们指定远程解析。默认情况下,它在本地解析。 -Thunderbird 也支持参数 `network.proxy.socks_remote_dns`,但由于没有地址栏来放置 `about:config`,我们需要改变它,就像在 [MozillaZine:about:config][10] 中读到的,依次点击工具→选项→高级→常规→配置编辑器(按钮)。 +Thunderbird 也支持参数 `network.proxy.socks_remote_dns`,但由于没有地址栏来放置 `about:config`,我们需要改变它,就像在 [MozillaZine:about:config][10] 中读到的,依次点击 工具 → 选项 → 高级 → 常规 → 配置编辑器(按钮)。 -没有对 SOCKS 特殊支持的应用程序可以被 “socksified”。这对于使用 TCP/IP 的许多应用程序都没有问题,但并不是全部,这将很好地工作。“Socksifier” 包括加载一个额外的库,它可以检测对 TCP/IP 堆栈的请求,并修改它们以通过 SOCKS 服务器重定向它们,以便通信中不需要使用 SOCKS 支持进行特殊的编程。 +没有对 SOCKS 特别支持的应用程序可以被 sock化socksified。这对于使用 TCP/IP 的许多应用程序都没有问题,但并不是全部,这将很好地工作。“Socksifier” 包括加载一个额外的库,它可以检测对 TCP/IP 堆栈的请求,并修改它们以通过 SOCKS 服务器重定向它们,以便通信中不需要使用 SOCKS 支持进行特殊的编程。 + +Applications that do not specifically support SOCKS can be “socksified”. This will work well with many applications that use TCP / IP without problems, but not with all. “Socksifier” consists of loading an additional library that detects requests to the TCP / IP stack and modifying them to redirect them through the SOCKS server, so that the communication goes through without the application being specifically programmed with SOCKS support . 在 Windows 和 [Linux.][18] 上都有 “Socksifiers”。 From ceb5bff554212c646cce3305a2078ba5c8bc3b46 Mon Sep 17 00:00:00 2001 From: jasminepeng Date: Tue, 10 Oct 2017 16:34:19 +0800 Subject: [PATCH 02/79] =?UTF-8?q?=E6=A0=A1=E5=AF=B9=E5=AE=8C=E6=AF=95?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 校对完毕 --- ...FORWARDING MOUNT A SOCKS SERVER WITH SSH.md | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/translated/tech/20170715 DYNAMIC PORT FORWARDING MOUNT A SOCKS SERVER WITH SSH.md b/translated/tech/20170715 DYNAMIC PORT FORWARDING MOUNT A SOCKS SERVER WITH SSH.md index a09333be92..97c6e590ab 100644 --- a/translated/tech/20170715 DYNAMIC PORT FORWARDING MOUNT A SOCKS SERVER WITH SSH.md +++ b/translated/tech/20170715 DYNAMIC PORT FORWARDING MOUNT A SOCKS SERVER WITH SSH.md @@ -67,31 +67,29 @@ Thunderbird 也支持参数 `network.proxy.socks_remote_dns`,但由于没有地址栏来放置 `about:config`,我们需要改变它,就像在 [MozillaZine:about:config][10] 中读到的,依次点击 工具 → 选项 → 高级 → 常规 → 配置编辑器(按钮)。 -没有对 SOCKS 特别支持的应用程序可以被 sock化socksified。这对于使用 TCP/IP 的许多应用程序都没有问题,但并不是全部,这将很好地工作。“Socksifier” 包括加载一个额外的库,它可以检测对 TCP/IP 堆栈的请求,并修改它们以通过 SOCKS 服务器重定向它们,以便通信中不需要使用 SOCKS 支持进行特殊的编程。 +没有对 SOCKS 特别支持的应用程序可以被 sock 化socksified。这对于使用 TCP/IP 的许多应用程序都没有问题,但并不是全部。“Sock 化” 包括加载一个额外的库,它可以检测对 TCP/IP 堆栈的请求,并修改请求,并通过 SOCKS 服务器重定向,以便正常通信,而不需要特别编程来支持 SOCKS 。 -Applications that do not specifically support SOCKS can be “socksified”. This will work well with many applications that use TCP / IP without problems, but not with all. “Socksifier” consists of loading an additional library that detects requests to the TCP / IP stack and modifying them to redirect them through the SOCKS server, so that the communication goes through without the application being specifically programmed with SOCKS support . - -在 Windows 和 [Linux.][18] 上都有 “Socksifiers”。 +在 Windows 和 [Linux][18] 上都有 “Socksifiers”。 对于 Windows,我们举个例子,SocksCap 是一种非商业用途的闭源但免费的产品,我使用了很长时间都十分满意。SocksCap 由一家名为 Permeo 的公司制造,该公司是创建 SOCKS 参考技术的公司。Permeo 被 [Blue Coat][11] 买下后,它[停止了 SocksCap 项目][12]。现在你仍然可以在互联网上找到 `sc32r240.exe` 文件。[FreeCap][13] 也是面向 Windows 的免费代码项目,外观和使用都非常类似于 SocksCap。然而,它工作起来更加糟糕,多年来一直没有维护。看起来,它的作者倾向于推出需要付款的新产品 [WideCap][14]。 -这是 SocksCap 的一个方面,当我们 “socksified” 了几个应用程序。当我们从这里启动它们时,这些应用程序将通过 SOCKS 服务器访问网络: +这是 SocksCap 的一个界面,可以看到我们 “socksified” 了几个应用程序。当我们从这里启动它们时,这些应用程序将通过 SOCKS 服务器访问网络: ![SocksCap](https://wesharethis.com/wp-content/uploads/2017/07/sockscap.png) -在配置对话框中可以看到,如果选择了协议 SOCKS 5,我们必须选择在本地或远程解析名称: +在配置对话框中可以看到,如果选择了协议 SOCKS 5,我们可以选择在本地或远程解析名称: ![SocksCap settings](https://wesharethis.com/wp-content/uploads/2017/07/sockscap_settings.png) -在 Linux 上,一直以来我们都有许多方案来替换一个单一的远程命令。在 Debian/Ubuntu 中,命令行输出: +在 Linux 上,如同往常一样,对某个远程命令我们都有许多替代方案。在 Debian/Ubuntu 中,命令行: > ``` > $ Apt-cache search socks > ``` -输出会告诉我们很多东西 +其输出会告诉我们很多。 -最著名的是 [tsocks][15] 和 [proxychains][16]。他们的工作方式大致相同:只需启动我们想要与他们 “socksify” 的应用程序,就是这样。使用 `proxychains` 的 `wget` 的例子: +最著名的是 [tsocks][15] 和 [proxychains][16]。他们的工作方式大致相同:只需启动我们想要用他们 “socksify” 的应用程序,就是这样。使用 `proxychains` 的 `wget` 的例子: > ``` > $ Proxychains wget http://www.google.com @@ -114,7 +112,7 @@ Applications that do not specifically support SOCKS can be “socksified”. Thi > 19:13:21 (24.0 KB / s) - `index.html 'saved [6016] > ``` -为此,我们必须指定要在 `/etc/proxychains.conf` 中使用的代理服务器: +为此,我们必须在 `/etc/proxychains.conf` 中指定要使用的代理服务器: > ``` > [ProxyList] From 1a2657173fa5533eda4938f39a18cbaa8f972fdf Mon Sep 17 00:00:00 2001 From: jasminepeng Date: Tue, 10 Oct 2017 16:35:25 +0800 Subject: [PATCH 03/79] =?UTF-8?q?=E6=A0=A1=E5=AF=B9=E5=AE=8C=E6=AF=95?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 校对完毕 --- ...715 DYNAMIC PORT FORWARDING MOUNT A SOCKS SERVER WITH SSH.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/translated/tech/20170715 DYNAMIC PORT FORWARDING MOUNT A SOCKS SERVER WITH SSH.md b/translated/tech/20170715 DYNAMIC PORT FORWARDING MOUNT A SOCKS SERVER WITH SSH.md index 97c6e590ab..ff0e4663f4 100644 --- a/translated/tech/20170715 DYNAMIC PORT FORWARDING MOUNT A SOCKS SERVER WITH SSH.md +++ b/translated/tech/20170715 DYNAMIC PORT FORWARDING MOUNT A SOCKS SERVER WITH SSH.md @@ -160,5 +160,5 @@ via: https://wesharethis.com/2017/07/15/dynamic-port-forwarding-mount-socks-serv [14]:https://wesharethis.com/goto/http://widecap.ru/en/support/ [15]:https://wesharethis.com/goto/http://tsocks.sourceforge.net/ [16]:https://wesharethis.com/goto/http://proxychains.sourceforge.net/ -[17]:https://wesharethis.com/2017/07/14/creating-tcp-ip-port-forwarding-tunnels-ssh-8-possible-scenarios-using-openssh/ +[17]:https://linux.cn/article-8945-1.html [18]:https://wesharethis.com/2017/07/10/linux-swap-partition/ From 5add4ed10bbca3ba743e5fdef3100a617d41d812 Mon Sep 17 00:00:00 2001 From: wxy Date: Tue, 10 Oct 2017 17:24:54 +0800 Subject: [PATCH 04/79] PRF:20170909 12 cool things you can do with GitHub.md @softpaopao --- ...9 12 cool things you can do with GitHub.md | 233 ++++++++---------- 1 file changed, 99 insertions(+), 134 deletions(-) diff --git a/translated/tech/20170909 12 cool things you can do with GitHub.md b/translated/tech/20170909 12 cool things you can do with GitHub.md index 1e5e1413de..0ef1b51c22 100644 --- a/translated/tech/20170909 12 cool things you can do with GitHub.md +++ b/translated/tech/20170909 12 cool things you can do with GitHub.md @@ -1,124 +1,109 @@ 12 件可以用 GitHub 完成的很酷的事情 ============================================================ -我不能为我的人生想出一个引子来,所以... +我不能为我的人生想出一个引子来,所以…… ### #1 在 GitHub.com 上编辑代码 -我想我要开始的第一件事是多数人都已经知道的(尽管一周之前的我并不知道)。 +我想我要开始介绍的第一件事是多数人都已经知道的(尽管我一周之前还不知道)。 -当你登录到 GitHub ,查看一个文件时(任何文本文件,任何版本库),右上方会有一只小铅笔。点击它,你就可以编辑文件了。 当你编辑完成后,GitHub 会给出文件变更的建议然后为你 fork 你的仓库并创建一个 pull 请求。 +当你登录到 GitHub ,查看一个文件时(任何文本文件,任何版本库),右上方会有一只小铅笔。点击它,你就可以编辑文件了。 当你编辑完成后,GitHub 会给出文件变更的建议,然后为你复刻fork该仓库并创建一个拉取请求pull request(PR)。 -是不是很疯狂?它为你创建了 fork ! - -不需要去 fork ,pull ,本地更改,push 然后创建一个 PR。 +是不是很疯狂?它为你创建了一个复刻! +你不需要自己去复刻、拉取,然后本地修改,再推送,然后创建一个 PR。 ![](https://cdn-images-1.medium.com/max/1600/1*w3yKOnVwomvK-gc7hlQNow.png) -不是一个真正的 PR +*不是一个真正的 PR* -这对于修改错误的拼写以及编辑代码时的一些糟糕的想法是很有用的。 +这对于修改错误拼写以及编辑代码时的一些糟糕的想法是很有用的。 ### #2 粘贴图像 -在评论和 issue 的描述中并不仅限于使用文字。你知道你可以直接从剪切板粘贴图像吗? 在你粘贴的时候,你会看到图片被上传 (到云端,这毫无疑问)并转换成 markdown 显示的图片格式。 +在评论和工单issue的描述中并不仅限于使用文字。你知道你可以直接从剪切板粘贴图像吗? 在你粘贴的时候,你会看到图片被上传 (到云端,这毫无疑问),并转换成 markdown 显示的图片格式。 -整洁。 +棒极了。 ### #3 格式化代码 -如果你想写一个代码块的话,你可以用三个反引号作为开始 —— 就像你在浏览 [熟练掌握 Markdown][3] 页面所学的一样 —— 而且 GitHub 会尝试去推测你所写下的语言。 +如果你想写一个代码块的话,你可以用三个反引号(```)作为开始 —— 就像你在浏览 [精通 Markdown][3] 时所学到的一样 —— 而且 GitHub 会尝试去推测你所写下的编程语言。 -但如果你张贴像是 Vue ,Typescript 或 JSX 这样的代码,你就需要明确指出才能获得高亮显示。 +但如果你粘贴的像是 Vue、Typescript 或 JSX 这样的代码,你就需要明确指出才能获得高亮显示。 在首行注明 ````jsx`: - ![](https://cdn-images-1.medium.com/max/1600/1*xnt83oGWLtJzNzwp-YvSuA.png) …这意味着代码段已经正确的呈现: - ![](https://cdn-images-1.medium.com/max/1600/1*FnOcz-bZi3S9Tn3dDGiIbQ.png) -(顺便说一下,这些用法可以扩展到 gists。 如果你在 gist 中给出 `.jsx` 扩展,你的 JSX 语法就会高亮显示。) +(顺便说一下,这些用法也可以用到 gist。 如果你给一个 gist 用上 `.jsx` 扩展名,你的 JSX 语法就会高亮显示。) 这里是[所有被支持的语法][4]的清单。 -### #4 用 PRs 中的魔法词来关闭 issues +### #4 用 PR 中的魔法词来关闭工单 -比方说你已经创建了一个 pull 请求用来修复 issue #234 。那么你就可以把 “fixes #234” 这段文字放在你的 PR 描述中(或者是在 PR 的评论的任何位置)。 +比方说你已经创建了一个用来修复 `#234` 工单的拉取请求。那么你就可以把 `fixes #234` 这段文字放在你的 PR 的描述中(或者是在 PR 的评论的任何位置)。 -接下来,在合并 PR 时会自动关闭与之对应的问题。这是不是很酷? +接下来,在合并 PR 时会自动关闭与之对应的工单。这是不是很酷? 这里是[更详细的学习帮助][5]。 ### #5 链接到评论 -是否你曾经想要链接到一个特定的评论但却不知道该怎么做?这是因为你不知道如何去做到这些。但那都将是过去了我的朋友,因为我在这里告诉你,点击紧挨着名字的日期或时间,这就是如何链接到一个评论。 - - +是否你曾经想要链接到一个特定的评论但却无从着手?这是因为你不知道如何去做到这些。不过那都过去了,我的朋友,我告诉你啊,点击紧挨着名字的日期或时间,这就是如何链接到一个评论的方法。 ![](https://cdn-images-1.medium.com/max/1600/1*rSq4W-utQGga5GOW-w2QGg.png) -嘿,这里有 gaearon 的照片! +*嘿,这里有 gaearon 的照片!* ### #6 链接到代码 -那么你想要链接到代码的特定行。我明白。 +那么你想要链接到代码的特定行么。我了解了。 试试这个:在查看文件的时候,点击挨着代码的行号。 -哇哦,你看到了么?行号位置更新出了 URL !如果你按下 Shift 键并点击其他的行号,SHAZAAM ,URL 再一次更新并且现在出现了行范围的高亮。 +哇哦,你看到了么?URL 更新了,加上了行号!如果你按下 `Shift` 键并点击其他的行号,格里格里巴巴变!URL 再一次更新并且现在出现了行范围的高亮。 -分享这个 URL 将会链接到这个文件的那些行。但等一下,链接所指向的是当前分支。如果文件发生变更了怎么办?也许一个文件当前状态的永久链接就是你以后需要的。 +分享这个 URL 将会链接到这个文件的那些行。但等一下,链接所指向的是当前分支。如果文件发生变更了怎么办?也许一个文件当前状态的永久链接permalink就是你以后需要的。 我比较懒,所以我已经在一张截图中做完了上面所有的步骤: - - ![](https://cdn-images-1.medium.com/max/1600/1*5Qg2GqTkTKuXLARasZN57A.png) -说起 URLs… +*说起 URL…* ### #7 像命令行一样使用 GitHub URL -使用 UI 来浏览 GitHub 有着很好的体验。但有些时候最快到达你想去的地方的方法就是在地址栏输入。举个例子,如果我想要跳转到一个我正在工作的分支然后查看与 master 分支的 diff,我就可以在我的仓库名称的后边输入 `/compare/branch-name` 。 +使用 UI 来浏览 GitHub 有着很好的体验。但有些时候最快到达你想去的地方的方法就是在地址栏输入。举个例子,如果我想要跳转到一个我正在工作的分支,然后查看与 master 分支的差异,我就可以在我的仓库名称的后边输入 `/compare/branch-name` 。 -这样就会登录到指定分支的 diff 页面。 - - +这样就会访问到指定分支的 diff 页面。 ![](https://cdn-images-1.medium.com/max/2000/1*DqexM1y398gSaozLNllroA.png) -然而这就是与 master 分支的 diff ,如果我正在 integration 分支工作,我可以输入 `/compare/integration-branch...my-branch`。 - - +然而这就是与 master 分支的 diff,如果我要与 develoment 分支比较,我可以输入 `/compare/development...my-branch`。 ![](https://cdn-images-1.medium.com/max/2000/1*roOXDuo_-9QKI5NLKmveGQ.png) -对于键盘上的快捷键,`ctrl`+`L` 或 `cmd`+`L` 将会向上跳转光标进入 URL 那里(至少在 Chrome 中是这样)。这一点 —— 加上你的浏览器会自动补全的事实 —— 能够成为一种在分支间跳转的便捷方式。 +对于你这种键盘快枪手来说,`ctrl`+`L` 或 `cmd`+`L` 将会向上跳转光标进入 URL 那里(至少在 Chrome 中是这样)。这(再加上你的浏览器会自动补全)能够成为一种在分支间跳转的便捷方式。 -小贴士:使用方向键在 Chrome 的自动完成建议中移动同时按 `shift`+`delete` 来删除历史条目(e.g. 一旦分支被合并)。 +专家技巧:使用方向键在 Chrome 的自动完成建议中移动同时按 `shift`+`delete` 来删除历史条目(例如,一旦分支被合并后)。 -(我真的好奇如果我把快捷键写成 `shift + delete` 这样的话,是不是读起来会更加容易。但严格来说 ‘+’ 并不是快捷键的一部分,所以我并不觉得这很舒服。这一点让 _我_ 整晚难以入睡,Rhonda。) - -### #8 在 issue 中创建列表 - -你想要在你的 issue 中看到一个复选框列表吗? +(我真的好奇如果我把快捷键写成 `shift + delete` 这样的话,是不是读起来会更加容易。但严格来说 ‘+’ 并不是快捷键的一部分,所以我并不觉得这很舒服。这一点纠结让 _我_ 整晚难以入睡,Rhonda。) +### #8 在工单中创建列表 +你想要在你的工单issue中看到一个复选框列表吗? ![](https://cdn-images-1.medium.com/max/1600/1*QIe-XOKOXTB3hXaLesr0zw.png) -想要在查看列表中的 issue 时候显示为一个漂亮的 “2 of 5” bar(译者注:条形码)吗? - - +你想要在工单列表中显示为一个漂亮的 “2 of 5” 进度条吗? ![](https://cdn-images-1.medium.com/max/1600/1*06WdEpxuasda2-lavjjvNw.png) -那很好!你可以使用这些的语法创建交互式的复选框: +很好!你可以使用这些的语法创建交互式的复选框: ``` - [ ] Screen width (integer) @@ -128,203 +113,183 @@ - [ ] Custom elements ``` -表示方法是空格,破折号,再空格,左括号,填入空格(或者一个 x ),然后封闭括号 ,接着空格,最后是一些话。 +它的表示方法是空格、破折号、再空格、左括号、填入空格(或者一个 `x` ),然后封闭括号,接着空格,最后是一些话。 -然后其实你可以选中或取消选中这些框!出于一些原因这些对我来说看上去就像是技术的魔法。你可以 _选中_ 这些框! 同时底层的文本会进行更新。 - -他们接下来会想什么? - -噢,如果你在一个 project board 上有这些 issue 的话,它也会在这里显示进度: +然后你可以实际选中或取消选中这些框!出于一些原因这些对我来说看上去就像是技术魔法。你可以_选中_这些框! 同时底层的文本会进行更新。 +他们接下来会想到什么魔法? +噢,如果你在一个项目面板project board上有这些工单的话,它也会在这里显示进度: ![](https://cdn-images-1.medium.com/max/1600/1*x_MzgCJXFp-ygsqFQB5qHA.png) -如果在我提到“在一个 project board 上”时你不知道我在说些什么,那么你会在下面的页面进一步了解。 +如果在我提到“在一个项目面板上”时你不知道我在说些什么,那么你会在本页下面进一步了解。 -比如,在页面下 2 厘米的地方。 +比如,在本页面下 2 厘米的地方。 -### #9 GitHub 上的 Project boards +### #9 GitHub 上的项目面板 -我常常在大项目中使用 Jira 。而对于个人项目我总是会使用 Trello 。我很喜欢他们两个。 +我常常在大项目中使用 Jira 。而对于个人项目我总是会使用 Trello 。我很喜欢它们两个。 -当我学会的几周后 GitHub 有它自己的产品,就在我的仓库上的 Project 标签,我想过我会照搬一套我已经在 Trello 上进行的任务。 - - +当我学会 GitHub 的几周后,它也有了自己的项目产品,就在我的仓库上的 Project 标签,我想我会照搬一套我已经在 Trello 上进行的任务。 ![](https://cdn-images-1.medium.com/max/2000/1*NF7ZnHndZQ2SFUc5PK-Cqw.png) -没有一个是有趣的 - -这里是在 GitHub project 上相同的内容: +*没有一个是有趣的任务* +这里是在 GitHub 项目上相同的内容: ![](https://cdn-images-1.medium.com/max/2000/1*CHsofapb4JtEDmveOvTYVQ.png) -你的眼睛会因为缺乏对比而适应。 +*你的眼睛最终会适应这种没有对比的显示* -出于速度的缘故,我把上面所有的都添加为 “notes” —— 意思是他们不是真正的 GitHub issue 。 - -但在 GitHub 上,管理任务的权限被集成在版本库的其他地方 —— 所以你可能想要从仓库添加存在的 issue 到 board 上。 - -你可以点击右上角的 Add Cards 然后找你想要添加的东西。这里特殊的[搜索语法][6]就派上用场了,举个例子,输入 `is:pr is:open` 然后现在你可以拖动任何开启的 PRs 到 board 上,或者要是你想清理一些 bug 的话就输入 `label:bug` +出于速度的缘故,我把上面所有的都添加为 “备注note” —— 意思是它们不是真正的 GitHub 工单。 +但在 GitHub 上,管理任务的能力被集成在版本库的其他地方 —— 所以你可能想要从仓库添加已有的工单到面板上。 +你可以点击右上角的添加卡片Add Cards,然后找你想要添加的东西。在这里,特殊的[搜索语法][6]就派上用场了,举个例子,输入 `is:pr is:open` 然后现在你可以拖动任何开启的 PR 到项目面板上,或者要是你想清理一些 bug 的话就输入 `label:bug`。 ![](https://cdn-images-1.medium.com/max/2000/1*rTVCR92HhIPhrVnOnXRZkQ.png) -或者你可以将现有的 notes 转换为 issues 。 - - +亦或者你可以将现有的备注转换为工单。 ![](https://cdn-images-1.medium.com/max/1600/1*pTm7dygsyLxsOUDkM7CTcg.png) -或者最后,从一个现有的 issue 屏幕,把它添加到在右边面板的一个 project 。 - - +再或者,从一个现有工单的屏幕上,把它添加到右边面板的项目上。 ![](https://cdn-images-1.medium.com/max/1600/1*Czs0cSc91tXv411uneEM9A.png) -它们将会进入一个 project board 的分类列表,这样你就能减少候选的列表的数量。 +它们将会进入那个项目面板的分类列表,这样你就能决定放到哪一类。 -作为实现那些 task 的代码,在同一仓库下你所拥有的 ‘task’ 定义有一个巨大(超大)的好处。这意味着今后的几年你能够用一行代码做一个 git blame 并且找出方法回到最初在这个 task 后面写下那些代码的根据,不需要在 Jira、Trello 或其他地方寻找蛛丝马迹。 +在实现那些任务的同一个仓库下放置任务的内容有一个巨大(超大)的好处。这意味着今后的几年你能够在一行代码上做一个 `git blame`,可以让你找出最初在这个任务背后写下那些代码的根据,而不需要在 Jira、Trello 或其它地方寻找蛛丝马迹。 #### 缺点 -在过去的三周我已经对所有的 tasks 使用 GitHub 取代 Jira 进行了测试(在有点看板风格的较小规模的项目上) ,到目前为止我都很喜欢。 +在过去的三周我已经对所有的任务使用 GitHub 取代 Jira 进行了测试(在有点看板风格的较小规模的项目上) ,到目前为止我都很喜欢。 -但是我无法想象在 scrum (译者注:迭代式增量软件开发过程)项目上使用,我想要在那里完成正确估算、速度的测算以及所有的好东西。 +但是我无法想象在 scrum(LCTT 译注:迭代式增量软件开发过程)项目上使用它,我想要在那里完成正确的工期估算、开发速度的测算以及所有的好东西怕是不行。 -好消息是,GitHub Projects 只有很少一些“功能”,并不会让你花很长时间去评估它是否值得让你去切换。因此留下一个悬念,看看你是怎么想的。 +好消息是,GitHub 项目只有很少一些“功能”,并不会让你花很长时间去评估它是否值得让你去切换。因此要不要试试,你自己看着办。 -总的来说,我有 _得知_ [ZenHub][7] 并且打开过 10 分钟,这也是有史以来的第一次。它是对 GitHub 高效的延伸,可以让你估计你的 issue 并创建 epics 和 dependencies。它也有速度和燃尽图功能;这看起来 _可能是_ 这地球上最伟大的事情。 +无论如何,我_听说过_ [ZenHub][7] 并且在 10 分钟前第一次打开了它。它是对 GitHub 高效的延伸,可以让你估计你的工单并创建 epic 和 dependency。它也有 velocity 和燃尽图burndown chart功能;这看起来_可能是_世界上最棒的东西了。 延伸阅读: [GitHub help on Projects][8]。 -### #10 GitHub wiki +### #10 GitHub 维基 -对于非结构化集合类的页面 —— 就像 Wikipedia —— GitHub Wiki 提供的(下文我会称之为 Gwiki )就很优秀。 +对于一堆非结构化页面(就像维基百科一样), GitHub 维基wiki提供的(下文我会称之为 Gwiki)就很优秀。 -对于结构化集合类的页面 —— 举个例子,你的文档 —— 并没那么多。这里没办法说“这个页面是那个页面的子页”,或者有像‘下一节’和‘上一节’这样的按钮。Hansel 和 Gretel 将会完蛋,因为这里没有面包屑(译者注:引自童话故事《糖果屋》)。 +结构化的页面集合并没那么多,比如说你的文档。这里没办法说“这个页面是那个页面的子页”,或者有像‘下一节’和‘上一节’这样的按钮。Hansel 和 Gretel 将会完蛋,因为这里没有面包屑导航(LCTT 译注:引自童话故事《糖果屋》)。 -(边注,你有 _读过_ 那个故事吗? 这是个残酷的故事。两个混蛋小子将饥肠辘辘的老巫婆烧死在 _她自己的火炉_ 里。无疑留下她来收拾残局。我想这就是为什么如今的年轻人是如此的敏感 —— 今天的睡前故事没有太多的暴力内容。) +(边注,你有_读过_那个故事吗? 这是个残酷的故事。两个混蛋小子将饥肠辘辘的老巫婆烧死在_她自己的火炉_里。毫无疑问她是留下来收拾残局的。我想这就是为什么如今的年轻人是如此的敏感 —— 今天的睡前故事太不暴力了。) -继续 —— 把 Gwiki 拿出来接着讲,我输入一些 NodeJS 文档中的内容作为 wiki 页面,然后创建一个侧边栏让我能够模拟出一些真实结构。这个侧边栏会一直存在,尽管它无法高亮显示你当前所在的页面。 - -链接不得不手动维护,但总的来说,我认为这已经很好了。如果你觉得有需要的话可以[看一下][9]。  +继续 —— 把 Gwiki 拿出来接着讲,我输入一些 NodeJS 文档中的内容作为维基页面,然后创建一个侧边栏以模拟一些真实结构。这个侧边栏会一直存在,尽管它无法高亮显示你当前所在的页面。 +其中的链接必须手动维护,但总的来说,我认为这已经很好了。如果你觉得有需要的话可以[看一下][9]。  ![](https://cdn-images-1.medium.com/max/1600/1*BSKQpkLmVQpUML0Je9WsLQ.png) -它将不会与像 GitBook(它使用了[Redux 文档][10])或一个定制的网站这样的东西去竞争。但它仍然会占据 80% 的页面而且就在你的仓库里。 +它将不会与像 GitBook(它使用了 [Redux 文档][10])或定制的网站这样的东西相比较。但它八成够用了,而且它就在你的仓库里。 -我是一个粉丝。 +我是它的一个粉丝。 -我的建议:如果你已经拥有不止一个 `README.md` 文件并且想要一些不同的页面作为用户指南或是更详细的文档,那么下一步你就需要停止使用 Gwiki 了。 +我的建议:如果你已经拥有不止一个 `README.md` 文件,并且想要一些不同的页面作为用户指南或是更详细的文档,那么下一步你就需要停止使用 Gwiki 了。 如果你开始觉得缺少的结构或导航非常有必要的话,去切换到其他的产品吧。 -### #11 GitHub Pages (带有 Jekyll) - -你可能已经知道了可以使用 GitHub Pages 来托管静态站点。如果你不知道的话现在就可以去试试。不过这一节确切的说是关于使用 _Jekyll_ 来构建一个站点。 - -最简单的就是, GitHub Pages + Jekyll 会将你的 `README.md` 呈现在一个漂亮的主题中。举个例子,从 [关于 github][11] 看看我的 readme 页面: +### #11 GitHub 页面(带有 Jekyll) +你可能已经知道了可以使用 GitHub 页面Pages 来托管静态站点。如果你不知道的话现在就可以去试试。不过这一节确切的说是关于使用 Jekyll 来构建一个站点。 +最简单的来说, GitHub 页面 + Jekyll 会将你的 `README.md` 呈现在一个漂亮的主题中。举个例子,看看我的 [关于 github][11] 中的 readme 页面: ![](https://cdn-images-1.medium.com/max/2000/1*nU-vZfChZ0mZw9zO-6iJow.png) -如果我为我的 GitHub 站点点击 ‘settings’ 标签,开启 GitHub Pages,然后挑选一个 Jekyll 主题… - - +点击 GitHub 上我的站点的设置settings标签,开启 GitHub 页面功能,然后挑选一个 Jekyll 主题…… ![](https://cdn-images-1.medium.com/max/1600/1*tT9AS7tNfEjbAcT3mkzgdw.png) -我会得到一个[ Jekyll 主题页面][12]: - +我就会得到一个 [Jekyll 主题的页面][12]: ![](https://cdn-images-1.medium.com/max/2000/1*pIE2FMyWih7nFAdP-yGXtQ.png) -根据这一点我可以构建一个主要基于易于编辑的 markdown 文件的静态站点,本质上是把 GitHub 变成一个 CMS(译者注:内容管理系统)。 +由此我可以构建一个主要基于易于编辑的 markdown 文件的静态站点,其本质上是把 GitHub 变成一个 CMS(LCTT 译注:内容管理系统)。 -我还没有真正的使用过它,但这就是 React 和 Bootstrap 站点构建的过程,所以并不可怕。 +我还没有真正的使用过它,但这就是 React 和 Bootstrap 网站构建的过程,所以并不可怕。 -注意,在本地运行需要 Ruby ( Windows 用户就需要交换一下眼色并且转向其他的方向。macOS 用户会像这样 “出什么问题了,你要去哪里?Ruby 是一个通用平台!GEMS!”)。 +注意,在本地运行它需要 Ruby (Windows 用户会彼此交换一下眼神,然后转头看向其它的方向。macOS 用户会发出这样这样的声音 “出什么问题了,你要去哪里?Ruby 可是一个通用平台!GEMS 万岁!”)。 -(这里也有必要加上“暴力或威胁的内容或活动” 在 GitHub Pages 上是不被允许的,因此你不能去部署重启你的 Hansel 和 Gretel 。) +(这里也有必要加上,“暴力或威胁的内容或活动” 在 GitHub 页面上是不允许的,因此你不能去部署你的 Hansel 和 Gretel 重启之旅了。) #### 我的意见 -我观察的 GitHub Pages + Jekyll 越多(为了这篇文章),整件事情好像越是看起来有一点奇怪。 +为了这篇文章,我对 GitHub 页面 + Jekyll 研究越多,就越觉得这件事情有点奇怪。 -‘让所有的复杂性远离你所拥有的属于自己的网站’这样的想法是很棒的。但是你仍然需要在本地生成配置。而且可怕的是需要为这样“简单”的东西使用很多 CLI(译者注:命令行界面)命令。 +“拥有你自己的网站,让所有的复杂性远离”这样的想法是很棒的。但是你仍然需要在本地生成配置。而且可怕的是需要为这样“简单”的东西使用很多 CLI(LCTT 译注:命令行界面)命令。 -我只是略读了[入门部分][13]的七页,给我的感觉像是 _我是_ 这里仅有的简单的事情。此前我甚至从来没有学习过所谓简单的“Front Matter”的语法或者所谓简单的“Liquid 模板引擎”的来龙去脉。 +我只是略读了[入门部分][13]的七页,给我的感觉像是_我才是_那个小白。此前我甚至从来没有学习过所谓简单的 “Front Matter” 的语法或者所谓简单的 “Liquid 模板引擎” 的来龙去脉。 -我宁愿只写一个网站。 +我宁愿去手工编写一个网站。 -老实说我有点惊讶 Facebook 使用它来写 React 文档,因为他们能够用 React 来构件他们的帮助文档并且在一天之内 [pre-render 预渲染到静态的 HTML 文件][14]。 +老实说我有点惊讶 Facebook 使用它来写 React 文档,因为他们能够用 React 来构建他们的帮助文档,并且在一天之内[预渲染到静态的 HTML 文件][14]。 -他们所需要的就跟使用 CMS 中已有的 Markdown 文件一样。 +他们所需要做的就是利用已有的 Markdown 文件,就像跟使用 CMS 一样。 -我想是这样… +我想是这样…… ### #12 使用 GitHub 作为 CMS 比如说你有一个带有一些文本的网站,但是你并不想在 HTML 的标记中储存那些文本。 -取而代之,你想要存放文本块到一个很容易被非开发者编辑的地方。也许使用一些版本控制的形式。甚至可能是一个审查过程。 +取而代之,你想要把这堆文本存放到某个地方,以便非开发者也可以很容易地编辑。也许要使用某种形式的版本控制。甚至还可能需要一个审查过程。 这里是我的建议:在你的版本库中使用 markdown 文件存储文本。然后在你的前端使用插件来获取这些文本块并在页面呈现。 -我是 React 的支持者,因此这里有一个 `` 插件的示例,给出一些 markdown 的路径,它们将被获取,解析,并以 HTML 的形式呈现。 +我是 React 的支持者,因此这里有一个 `` 插件的示例,给出一些 markdown 的路径,它就会被获取、解析,并以 HTML 的形式呈现。 +(我正在使用 [marked][1] npm 包来将 markdown 解析为 HTML。) -(我正在使用的是 [marked][1] npm 包来将 markdown 解析为 HTML。) +这里是我的示例仓库 [/text-snippets][2],里边有一些 markdown 文件 。 -这里是指向我的示例仓库 [/text-snippets][2],里边有一些 markdown 文件 。 +(你也可以使用 GitHub API 来[获取内容][15] —— 但我不确定你是否能搞定。) -(你也可以前往[获取内容][15]页面获取 GiHub API 来使用 —— 但我不确定你是否可以。) +你可以像这样使用插件: -你可以使用像这样的插件: +如此,GitHub 就是你的 CMS 了,可以说,不管有多少文本块都可以放进去。 -所以现在 GitHub 就是你的 CMS,可以说,不管有多少文本块都可以放进去。 +上边的示例只是在浏览器上安装好插件后获取 markdown 。如果你想要一个静态站点那么你需要服务器端渲染。 -上边的示例只是在浏览器上安装好插件后获取 markdown 。如果你想要一个静态站点那么你需要服务器端渲染(server-render)。 +有个好消息!没有什么能阻止你从服务器中获取所有的 markdown 文件 (并配上各种为你服务的缓存策略)。如果你沿着这条路继续走下去的话,你可能会想要去试试使用 GitHub API 去获取目录中的所有 markdown 文件的列表。 -好消息!没什么能阻止你从服务器中获取所有的 markdown 文件 (配上各种为你服务的缓存策略)。如果你沿着这条路继续走下去的话,你可能会想要去看看使用 GitHub API 去获取目录中的所有 markdown 文件的列表。 +### 奖励环节 —— GitHub 工具! -### Bonus round —— GitHub 工具! - -我曾经使用过一段时间的 [Chrome 的扩展 Octotree ][16] 而且现在我推荐它。虽然并非真心诚意,但不管怎样我还是推荐它。 - -它会在左侧提供一个带有树视图的面板以显示当前你所查看的仓库。 +我曾经使用过一段时间的 [Chrome 的扩展 Octotree][16],而且现在我推荐它。虽然不是吐血推荐,但不管怎样我还是推荐它。 +它会在左侧提供一个带有树状视图的面板以显示当前你所查看的仓库。 ![](https://cdn-images-1.medium.com/max/2000/1*-MgFq3TEjdys1coiF5-dCw.png) -从[这个视频][17]中我学会了 [octobox][18] ,到目前为止看起来还不错。它是一个 GitHub issues 的收件箱。这就是我要说的全部。 - -说到颜色,在上面所有的截图中我都使用了亮色主题,所以不要吓到你。不过说真的,我看到的其他东西都是在黑色的主题上,为什么我非要忍受 GitHub 这个苍白的主题呐? +从[这个视频][17]中我了解到了 [octobox][18] ,到目前为止看起来还不错。它是一个 GitHub 工单的收件箱。这一句介绍就够了。 +说到颜色,在上面所有的截图中我都使用了亮色主题,所以希望不要闪瞎你的双眼。不过说真的,我看到的其他东西都是黑色的主题,为什么我非要忍受 GitHub 这个苍白的主题呐? ![](https://cdn-images-1.medium.com/max/2000/1*SUdLeoaq8AtVQyE-dCw-Tg.png) -这是由 Chrome 扩展 [Stylish][19](它可以在任何网站使用主题)和 [GitHub Dark][20] 风格的一个组合。同时为了完成这样的外观也需要,黑色主题的 Chrome 开发者工具(这是内建的,在设置中打开) 以及 [Atom One Dark for Chrome 主题][21]。 +这是由 Chrome 扩展 [Stylish][19](它可以在任何网站使用主题)和 [GitHub Dark][20] 风格的一个组合。要完全黑化,那黑色主题的 Chrome 开发者工具(这是内建的,在设置中打开) 以及 [Atom One Dark for Chrome 主题][21]你肯定也需要。 ### Bitbucket -这些并不完全适合这篇文章的所有地方,但是如果我不称赞 Bitbucket 的话,那就不对了。 +这些内容不适合放在这篇文章的任何地方,但是如果我不称赞 Bitbucket 的话,那就不对了。 两年前我开始了一个项目并花了大半天时间评估哪一个 git 托管服务更适合,最终 Bitbucket 赢得了相当不错的成绩。他们的代码审查流程遥遥领先(这甚至比 GitHub 拥有的指派审阅者的概念要早很长时间)。 -GitHub 在后来赶上了比赛,这是非常成功的。但不幸的是在过去的一年里我没有机会使用 Bitbucket —— 也许他们依然在某些方面领先。所以,我会力劝每一个选择 git 托管服务的人也要考虑 Bitbucket 。 +GitHub 后来在这次审查竞赛中追了上来,干的不错。不幸的是在过去的一年里我没有机会再使用 Bitbucket —— 也许他们依然在某些方面领先。所以,我会力劝每一个选择 git 托管服务的人考虑一下 Bitbucket 。 ### 结尾 -就是这样!我希望这里至少有三件事是你此前并不知道的,我也希望你拥有愉快的一天。 +就是这样!我希望这里至少有三件事是你此前并不知道的,祝好。 -编辑:在评论中有更多的建议;随便留下你自己喜欢的。真的,我真的希望你能拥有愉快的一天。 +修订:在评论中有更多的技巧;请尽管留下你自己喜欢的技巧。真的,真心祝好。 -------------------------------------------------------------------------------- @@ -332,7 +297,7 @@ via: https://hackernoon.com/12-cool-things-you-can-do-with-github-f3e0424cf2f0 作者:[David Gilbertson][a] 译者:[softpaopao](https://github.com/softpaopao) -校对:[jasminepeng](https://github.com/jasminepeng) +校对:[wxy](https://github.com/wxy) 本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 From 9fd96a0d2f2cae09f19b567e7984e0119bb8641e Mon Sep 17 00:00:00 2001 From: wxy Date: Tue, 10 Oct 2017 17:25:49 +0800 Subject: [PATCH 05/79] PUB:20170909 12 cool things you can do with GitHub.md @softpaopao https://linux.cn/article-8946-1.html --- .../20170909 12 cool things you can do with GitHub.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename {translated/tech => published}/20170909 12 cool things you can do with GitHub.md (100%) diff --git a/translated/tech/20170909 12 cool things you can do with GitHub.md b/published/20170909 12 cool things you can do with GitHub.md similarity index 100% rename from translated/tech/20170909 12 cool things you can do with GitHub.md rename to published/20170909 12 cool things you can do with GitHub.md From 83135286a64a6fd5e27f7194dd0f11e3d3aadb0d Mon Sep 17 00:00:00 2001 From: wxy Date: Tue, 10 Oct 2017 22:39:59 +0800 Subject: [PATCH 06/79] PUB:20170715 DYNAMIC PORT FORWARDING MOUNT A SOCKS SERVER WITH SSH.md @firmianay @jasminepeng https://linux.cn/article-8947-1.html --- ...ORWARDING MOUNT A SOCKS SERVER WITH SSH.md | 160 +++++++++++++++++ ...ORWARDING MOUNT A SOCKS SERVER WITH SSH.md | 164 ------------------ 2 files changed, 160 insertions(+), 164 deletions(-) create mode 100644 published/20170715 DYNAMIC PORT FORWARDING MOUNT A SOCKS SERVER WITH SSH.md delete mode 100644 translated/tech/20170715 DYNAMIC PORT FORWARDING MOUNT A SOCKS SERVER WITH SSH.md diff --git a/published/20170715 DYNAMIC PORT FORWARDING MOUNT A SOCKS SERVER WITH SSH.md b/published/20170715 DYNAMIC PORT FORWARDING MOUNT A SOCKS SERVER WITH SSH.md new file mode 100644 index 0000000000..b96c3ba302 --- /dev/null +++ b/published/20170715 DYNAMIC PORT FORWARDING MOUNT A SOCKS SERVER WITH SSH.md @@ -0,0 +1,160 @@ +动态端口转发:安装带有 SSH 的 SOCKS 服务器 +================= + +在上一篇文章([通过 SSH 实现 TCP / IP 隧道(端口转发):使用 OpenSSH 可能的 8 种场景][17])中,我们看到了处理端口转发的所有可能情况,不过那只是静态端口转发。也就是说,我们只介绍了通过 SSH 连接来访问另一个系统的端口的情况。 + +在那篇文章中,我们未涉及动态端口转发,此外一些读者没看过该文章,本篇文章中将尝试补充完整。 + +当我们谈论使用 SSH 进行动态端口转发时,我们说的是将 SSH 服务器转换为 [SOCKS][2] 服务器。那么什么是 SOCKS 服务器? + +你知道 [Web 代理][3]是用来做什么的吗?答案可能是肯定的,因为很多公司都在使用它。它是一个直接连接到互联网的系统,允许没有互联网访问的[内部网][4]客户端让其浏览器通过代理来(尽管也有[透明代理][5])浏览网页。Web 代理除了允许输出到 Internet 之外,还可以缓存页面、图像等。已经由某客户端下载的资源,另一个客户端不必再下载它们。此外,它还可以过滤内容并监视用户的活动。当然了,它的基本功能是转发 HTTP 和 HTTPS 流量。 + +一个 SOCKS 服务器提供的服务类似于公司内部网络提供的代理服务器服务,但不限于 HTTP/HTTPS,它还允许转发任何 TCP/IP 流量(SOCKS 5 也支持 UDP)。 + +例如,假设我们希望在一个没有直接连接到互联网的内部网上通过 Thunderbird 使用 POP3 、 ICMP 和 SMTP 的邮件服务。如果我们只有一个 web 代理可以用,我们可以使用的唯一的简单方式是使用某个 webmail(也可以使用 [Thunderbird 的 Webmail 扩展][6])。我们还可以通过 [HTTP 隧道][7]来起到代理的用途。但最简单的方式是在网络中设置一个 SOCKS 服务器,它可以让我们使用 POP3、ICMP 和 SMTP,而不会造成任何的不便。 + +虽然有很多软件可以配置非常专业的 SOCKS 服务器,但用 OpenSSH 设置一个只需要简单的一条命令: + +``` +Clientessh $ ssh -D 1080 user@servidorssh +``` + +或者我们可以改进一下: + +``` +Clientessh $ ssh -fN -D 0.0.0.0:1080 user@servidorssh +``` + +其中: + +* 选项 `-D` 类似于选项为 `-L` 和 `-R` 的静态端口转发。像那些一样,我们可以让客户端只监听本地请求或从其他节点到达的请求,具体取决于我们将请求关联到哪个地址: + ``` + -D [bind_address:] port + ``` + + 在静态端口转发中可以看到,我们使用选项 `-R` 进行反向端口转发,而动态转发是不可能的。我们只能在 SSH 客户端创建 SOCKS 服务器,而不能在 SSH 服务器端创建。 +* 1080 是 SOCKS 服务器的典型端口,正如 8080 是 Web 代理服务器的典型端口一样。 +* 选项 `-N` 防止实际启动远程 shell 交互式会话。当我们只用 `ssh` 来建立隧道时很有用。 +* 选项 `-f` 会使 `ssh` 停留在后台并将其与当前 shell 分离,以便使该进程成为守护进程。如果没有选项 `-N`(或不指定命令),则不起作用,否则交互式 shell 将与后台进程不兼容。 + +使用 [PuTTY][8] 也可以非常简单地进行端口重定向。与 `ssh -D 0.0.0.0:1080` 相当的配置如下: + +![PuTTY SOCKS](https://wesharethis.com/wp-content/uploads/2017/07/putty_socks.png) + +对于通过 SOCKS 服务器访问另一个网络的应用程序,如果应用程序提供了对 SOCKS 服务器的特别支持,就会非常方便(虽然不是必需的),就像浏览器支持使用代理服务器一样。作为一个例子,如 Firefox 或 Internet Explorer 这样的浏览器使用 SOCKS 服务器访问另一个网络的应用程序: + +![Firefox SOCKS](https://wesharethis.com/wp-content/uploads/2017/07/firefox_socks.png) + +![Internet Explorer SOCKS](https://wesharethis.com/wp-content/uploads/2017/07/internetexplorer_socks.png) + +注意:上述截图来自 [IE for Linux][1] :如果您需要在 Linux 上使用 Internet Explorer,强烈推荐! + +然而,最常见的浏览器并不要求 SOCKS 服务器,因为它们通常与代理服务器配合得更好。 + +不过,Thunderbird 也支持 SOCKS,而且很有用: + +![Thunderbird SOCKS](https://wesharethis.com/wp-content/uploads/2017/07/thunderbird_socks.png) + +另一个例子:[Spotify][9] 客户端同样支持 SOCKS: + +![Spotify SOCKS](https://wesharethis.com/wp-content/uploads/2017/07/spotify_socks.png) + +需要关注一下名称解析。有时我们会发现,在目前的网络中,我们无法解析 SOCKS 服务器另一端所要访问的系统的名称。SOCKS 5 还允许我们通过隧道传播 DNS 请求( 因为 SOCKS 5 允许我们使用 UDP)并将它们发送到另一端:可以指定是本地还是远程解析(或者也可以两者都试试)。支持此功能的应用程序也必须考虑到这一点。例如,Firefox 具有参数 `network.proxy.socks_remote_dns`(在 `about:config` 中),允许我们指定远程解析。而默认情况下,它在本地解析。 + +Thunderbird 也支持参数 `network.proxy.socks_remote_dns`,但由于没有地址栏来放置 `about:config`,我们需要改变它,就像在 [MozillaZine:about:config][10] 中读到的,依次点击 工具 → 选项 → 高级 → 常规 → 配置编辑器(按钮)。 + +没有对 SOCKS 特别支持的应用程序可以被 sock 化socksified。这对于使用 TCP/IP 的许多应用程序都没有问题,但并不是全部。“sock 化” 需要加载一个额外的库,它可以检测对 TCP/IP 堆栈的请求,并修改请求,以通过 SOCKS 服务器重定向,从而不需要特别编程来支持 SOCKS 便可以正常通信。 + +在 Windows 和 [Linux][18] 上都有 “Sock 化工具”。 + +对于 Windows,我们举个例子,SocksCap 是一种闭源,但对非商业使用免费的产品,我使用了很长时间都十分满意。SocksCap 由一家名为 Permeo 的公司开发,该公司是创建 SOCKS 参考技术的公司。Permeo 被 [Blue Coat][11] 买下后,它[停止了 SocksCap 项目][12]。现在你仍然可以在互联网上找到 `sc32r240.exe` 文件。[FreeCap][13] 也是面向 Windows 的免费代码项目,外观和使用都非常类似于 SocksCap。然而,它工作起来更加糟糕,多年来一直没有缺失维护。看起来,它的作者倾向于推出需要付款的新产品 [WideCap][14]。 + +这是 SocksCap 的一个界面,可以看到我们 “sock 化” 了的几个应用程序。当我们从这里启动它们时,这些应用程序将通过 SOCKS 服务器访问网络: + +![SocksCap](https://wesharethis.com/wp-content/uploads/2017/07/sockscap.png) + +在配置对话框中可以看到,如果选择了协议 SOCKS 5,我们可以选择在本地或远程解析名称: + +![SocksCap settings](https://wesharethis.com/wp-content/uploads/2017/07/sockscap_settings.png) + +在 Linux 上,如同往常一样,对某个远程命令我们都有许多替代方案。在 Debian/Ubuntu 中,命令行: + +``` +$ Apt-cache search socks +``` + +的输出会告诉我们很多。 + +最著名的是 [tsocks][15] 和 [proxychains][16]。它们的工作方式大致相同:只需用它们启动我们想要 “sock 化” 的应用程序就行。使用 `proxychains` 的 `wget` 的例子: + +``` +$ Proxychains wget http://www.google.com +ProxyChains-3.1 (http://proxychains.sf.net) +--19: 13: 20-- http://www.google.com/ +Resolving www.google.com ... +DNS-request | Www.google.com +| S-chain | - <- - 10.23.37.3:1080-<><>-4.2.2.2:53-<><>-OK +| DNS-response | Www.google.com is 72.14.221.147 +72.14.221.147 +Connecting to www.google.com | 72.14.221.147 |: 80 ... +| S-chain | - <- - 10.23.37.3:1080-<><>-72.14.221.147:80-<><>-OK +Connected. +HTTP request sent, awaiting response ... 200 OK +Length: unspecified [text / html] +Saving to: `index.html ' + + [<=>] 6,016 24.0K / s in 0.2s + +19:13:21 (24.0 KB / s) - `index.html 'saved [6016] +``` + +要让它可以工作,我们必须在 `/etc/proxychains.conf` 中指定要使用的代理服务器: + +``` +[ProxyList] +Socks5 clientessh 1080 +``` + +我们也设置远程进行 DNS 请求: + +``` +# Proxy DNS requests - no leak for DNS data +Proxy_dns +``` + +另外,在前面的输出中,我们已经看到了同一个 `proxychains` 的几条信息性的消息, 非 `wget` 的行是标有字符串 `|DNS-request|`、`|S-chain|` 或 `|DNS-response|` 的。如果我们不想看到它们,也可以在配置中进行调整: + +``` +# Quiet mode (no output from library) +Quiet_mode +``` + +-------------------------------------------------------------------------------- + +via: https://wesharethis.com/2017/07/15/dynamic-port-forwarding-mount-socks-server-ssh/ + +作者:[Ahmad][a] +译者:[firmianay](https://github.com/firmianay) +校对:[jasminepeng](https://github.com/jasminepeng) + +本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 + +[a]:https://wesharethis.com/author/ahmad/ +[1]:https://wesharethis.com/goto/http://www.tatanka.com.br/ies4linux/page/Main_Page +[2]:https://wesharethis.com/goto/http://en.wikipedia.org/wiki/SOCKS +[3]:https://wesharethis.com/goto/http://en.wikipedia.org/wiki/Proxy_server +[4]:https://wesharethis.com/goto/http://en.wikipedia.org/wiki/Intranet +[5]:https://wesharethis.com/goto/http://en.wikipedia.org/wiki/Proxy_server#Transparent_and_non-transparent_proxy_server +[6]:https://wesharethis.com/goto/http://webmail.mozdev.org/ +[7]:https://wesharethis.com/goto/http://en.wikipedia.org/wiki/HTTP_tunnel_(software) +[8]:https://wesharethis.com/goto/http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html +[9]:https://wesharethis.com/goto/https://www.spotify.com/int/download/linux/ +[10]:https://wesharethis.com/goto/http://kb.mozillazine.org/About:config +[11]:https://wesharethis.com/goto/http://www.bluecoat.com/ +[12]:https://wesharethis.com/goto/http://www.bluecoat.com/products/sockscap +[13]:https://wesharethis.com/goto/http://www.freecap.ru/eng/ +[14]:https://wesharethis.com/goto/http://widecap.ru/en/support/ +[15]:https://wesharethis.com/goto/http://tsocks.sourceforge.net/ +[16]:https://wesharethis.com/goto/http://proxychains.sourceforge.net/ +[17]:https://linux.cn/article-8945-1.html +[18]:https://wesharethis.com/2017/07/10/linux-swap-partition/ diff --git a/translated/tech/20170715 DYNAMIC PORT FORWARDING MOUNT A SOCKS SERVER WITH SSH.md b/translated/tech/20170715 DYNAMIC PORT FORWARDING MOUNT A SOCKS SERVER WITH SSH.md deleted file mode 100644 index ff0e4663f4..0000000000 --- a/translated/tech/20170715 DYNAMIC PORT FORWARDING MOUNT A SOCKS SERVER WITH SSH.md +++ /dev/null @@ -1,164 +0,0 @@ -动态端口转发:安装带有 SSH 的 SOCKS 服务器 -================= - -在上一篇文章([通过 SSH 实现 TCP / IP 隧道(端口转发):使用 OpenSSH 可能的 8 种场景][17])中,我们看到了处理端口转发的所有可能情况,不过只是静态端口转发。也就是说,我们只介绍了通过 SSH 连接来访问另一个系统的端口的情况。 - -在那篇文章中,我们未涉及动态端口转发,此外一些读者错过了该文章,本篇文章中将尝试补充完整。 - -当我们谈论使用 SSH 进行动态端口转发时,我们谈论的是将 SSH 服务器 转换为 [SOCKS][2] 服务器。那么什么是 SOCKS 服务器? - -你知道 [Web 代理][3]是用来做什么的吗?答案可能是肯定的,因为很多公司都在使用它。它是一个直接连接到互联网的系统,允许没有互联网访问的[内部网][4]客户端通过配置浏览器的代理来请求(尽管也有[透明代理][5])浏览网页。Web 代理除了允许输出到 Internet 之外,还可以缓存页面,图像等。已经由某客户端下载的资源,另一个客户端不必再下载它们。此外,它还可以过滤内容并监视用户的活动。当然了,它的基本功能是转发 HTTP 和 HTTPS 流量。 - -一个 SOCKS 服务器提供的服务类似于公司内部网络提供的代理服务器服务,但不限于 HTTP/HTTPS,它还允许转发任何 TCP/IP 流量(SOCKS 5 也是 UDP)。 - -例如,假设我们希望在一个没有直接连接到互联网的内部网上使用基于 POP3 或 ICMP 的邮件服务和 Thunderbird 的 SMTP 服务。如果我们只有一个 web 代理可以用,我们可以使用的唯一的简单方式是使用某个 webmail(也可以使用 [Thunderbird 的 Webmail 扩展][6])。我们还可以[通过 HTTP 隧道][7]来利用代理。但最简单的方式是在网络中设置一个 SOCKS 服务器,它可以让我们使用 POP3、ICMP 和 SMTP,而不会造成任何的不便。 - -有很多软件可以配置非常专业的 SOCKS 服务器,我们这里使用 OpenSSH 简单地设置一个: - -> ``` -> Clientessh $ ssh -D 1080 user @ servidorssh -> ``` - -或者我们可以改进一下: - -> ``` -> Clientessh $ ssh -fN -D 0.0.0.0:1080 user @ servidorssh -> ``` - -其中: - -* 选项 `-D` 类似于选项为 `-L` 和 `-R` 的静态端口转发。像这样,我们就可以让客户端只监听本地请求或从其他节点到达的请求,具体取决于我们将请求关联到哪个地址: - - > ``` - > -D [bind_address:] port - > ``` - - 在静态端口转发中可以看到,我们使用选项 `-R` 进行反向端口转发,而动态转发是不可能的。我们只能在 SSH 客户端创建 SOCKS 服务器,而不能在 SSH 服务器端创建。 - -* 1080 是 SOCKS 服务器的典型端口,正如 8080 是 Web 代理服务器的典型端口一样。 - -* 选项 `-N` 防止了远程 shell 交互式会话的实际启动。当我们只使用 `ssh` 来建立隧道时很有用。 - -* 选项 `-f` 会使 `ssh` 停留在后台并将其与当前 `shell` 分离,以便使进程成为守护进程。如果没有选项 `-N`(或不指定命令),则不起作用,否则交互式 shell 将与后台进程不兼容。 - - 使用 [PuTTY][8] 也可以非常简单地进行端口重定向。与 `ssh -D 0.0.0.0:1080` 相当的配置: - -![PuTTY SOCKS](https://wesharethis.com/wp-content/uploads/2017/07/putty_socks.png) - -对于通过 SOCKS 服务器访问另一个网络的应用程序,如果应用程序提供了对 SOCKS 服务器的特别支持,就会非常方便(虽然不是必需的),就像浏览器支持使用代理服务器一样。如 Firefox 或 Internet Explorer 的浏览器是使用 SOCKS 服务器访问另一个网络的应用程序示例: - -![Firefox SOCKS](https://wesharethis.com/wp-content/uploads/2017/07/firefox_socks.png) - -![Internet Explorer SOCKS](https://wesharethis.com/wp-content/uploads/2017/07/internetexplorer_socks.png) - -注意:截图来自 [IE for Linux][1] :如果您需要 Internet Explorer 并使用 Linux,强烈推荐! - -然而,最常见的浏览器并不要求 SOCKS 服务器,因为它们通常与代理服务器配合得更好。 - -不过,Thunderbird 也支持 SOCKS,而且很有用: - -![Thunderbird SOCKS](https://wesharethis.com/wp-content/uploads/2017/07/thunderbird_socks.png) - -另一个例子:[Spotify][9] 客户端同样支持 SOCKS: - -![Spotify SOCKS](https://wesharethis.com/wp-content/uploads/2017/07/spotify_socks.png) - -需要关注名称解析。有时我们会发现,在目前的网络中,我们无法解析 SOCKS 服务器另一端所要访问的系统的名称。SOCKS 5 还允许我们传播 DNS 请求(UDP 允许我们使用 SOCKS 5)并将它们发送到另一端:可以指定是本地还是远程解析(或者也可以测试两者)。支持此功能的应用程序也必须考虑到这一点。例如,Firefox 具有参数 `network.proxy.socks_remote_dns`(在 `about:config` 中),允许我们指定远程解析。默认情况下,它在本地解析。 - -Thunderbird 也支持参数 `network.proxy.socks_remote_dns`,但由于没有地址栏来放置 `about:config`,我们需要改变它,就像在 [MozillaZine:about:config][10] 中读到的,依次点击 工具 → 选项 → 高级 → 常规 → 配置编辑器(按钮)。 - -没有对 SOCKS 特别支持的应用程序可以被 sock 化socksified。这对于使用 TCP/IP 的许多应用程序都没有问题,但并不是全部。“Sock 化” 包括加载一个额外的库,它可以检测对 TCP/IP 堆栈的请求,并修改请求,并通过 SOCKS 服务器重定向,以便正常通信,而不需要特别编程来支持 SOCKS 。 - -在 Windows 和 [Linux][18] 上都有 “Socksifiers”。 - -对于 Windows,我们举个例子,SocksCap 是一种非商业用途的闭源但免费的产品,我使用了很长时间都十分满意。SocksCap 由一家名为 Permeo 的公司制造,该公司是创建 SOCKS 参考技术的公司。Permeo 被 [Blue Coat][11] 买下后,它[停止了 SocksCap 项目][12]。现在你仍然可以在互联网上找到 `sc32r240.exe` 文件。[FreeCap][13] 也是面向 Windows 的免费代码项目,外观和使用都非常类似于 SocksCap。然而,它工作起来更加糟糕,多年来一直没有维护。看起来,它的作者倾向于推出需要付款的新产品 [WideCap][14]。 - -这是 SocksCap 的一个界面,可以看到我们 “socksified” 了几个应用程序。当我们从这里启动它们时,这些应用程序将通过 SOCKS 服务器访问网络: - -![SocksCap](https://wesharethis.com/wp-content/uploads/2017/07/sockscap.png) - -在配置对话框中可以看到,如果选择了协议 SOCKS 5,我们可以选择在本地或远程解析名称: - -![SocksCap settings](https://wesharethis.com/wp-content/uploads/2017/07/sockscap_settings.png) - -在 Linux 上,如同往常一样,对某个远程命令我们都有许多替代方案。在 Debian/Ubuntu 中,命令行: - -> ``` -> $ Apt-cache search socks -> ``` - -其输出会告诉我们很多。 - -最著名的是 [tsocks][15] 和 [proxychains][16]。他们的工作方式大致相同:只需启动我们想要用他们 “socksify” 的应用程序,就是这样。使用 `proxychains` 的 `wget` 的例子: - -> ``` -> $ Proxychains wget http://www.google.com -> ProxyChains-3.1 (http://proxychains.sf.net) -> --19: 13: 20-- http://www.google.com/ -> Resolving www.google.com ... -> DNS-request | Www.google.com -> | S-chain | - <- - 10.23.37.3:1080-<><>-4.2.2.2:53-<><>-OK -> | DNS-response | Www.google.com is 72.14.221.147 -> 72.14.221.147 -> Connecting to www.google.com | 72.14.221.147 |: 80 ... -> | S-chain | - <- - 10.23.37.3:1080-<><>-72.14.221.147:80-<><>-OK -> Connected. -> HTTP request sent, awaiting response ... 200 OK -> Length: unspecified [text / html] -> Saving to: `index.html ' -> -> [<=>] 6,016 24.0K / s in 0.2s -> -> 19:13:21 (24.0 KB / s) - `index.html 'saved [6016] -> ``` - -为此,我们必须在 `/etc/proxychains.conf` 中指定要使用的代理服务器: - -> ``` -> [ProxyList] -> Socks5 clientessh 1080 -> ``` - -DNS 请求是远程进行的: - -> ``` -> # Proxy DNS requests - no leak for DNS data -> Proxy_dns -> ``` - -另外,在前面的输出中,我们已经看到了同一个 `proxychains` 的几条信息性的消息,而不是标有字符串 `|DNS-request|`、`|S-chain|` 或 `|DNS-response|` 行中的 `wget`。如果我们不想看到它们,也可以在配置中进行调整: - -> ``` -> # Quiet mode (no output from library) -> Quiet_mode -> ``` - --------------------------------------------------------------------------------- - -via: https://wesharethis.com/2017/07/15/dynamic-port-forwarding-mount-socks-server-ssh/ - -作者:[Ahmad][a] -译者:[firmianay](https://github.com/firmianay) -校对:[jasminepeng](https://github.com/jasminepeng) - -本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 - -[a]:https://wesharethis.com/author/ahmad/ -[1]:https://wesharethis.com/goto/http://www.tatanka.com.br/ies4linux/page/Main_Page -[2]:https://wesharethis.com/goto/http://en.wikipedia.org/wiki/SOCKS -[3]:https://wesharethis.com/goto/http://en.wikipedia.org/wiki/Proxy_server -[4]:https://wesharethis.com/goto/http://en.wikipedia.org/wiki/Intranet -[5]:https://wesharethis.com/goto/http://en.wikipedia.org/wiki/Proxy_server#Transparent_and_non-transparent_proxy_server -[6]:https://wesharethis.com/goto/http://webmail.mozdev.org/ -[7]:https://wesharethis.com/goto/http://en.wikipedia.org/wiki/HTTP_tunnel_(software) -[8]:https://wesharethis.com/goto/http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html -[9]:https://wesharethis.com/goto/https://www.spotify.com/int/download/linux/ -[10]:https://wesharethis.com/goto/http://kb.mozillazine.org/About:config -[11]:https://wesharethis.com/goto/http://www.bluecoat.com/ -[12]:https://wesharethis.com/goto/http://www.bluecoat.com/products/sockscap -[13]:https://wesharethis.com/goto/http://www.freecap.ru/eng/ -[14]:https://wesharethis.com/goto/http://widecap.ru/en/support/ -[15]:https://wesharethis.com/goto/http://tsocks.sourceforge.net/ -[16]:https://wesharethis.com/goto/http://proxychains.sourceforge.net/ -[17]:https://linux.cn/article-8945-1.html -[18]:https://wesharethis.com/2017/07/10/linux-swap-partition/ From e8e774ec12c91fcd6fefa487ebe9c7daf9e06788 Mon Sep 17 00:00:00 2001 From: wxy Date: Tue, 10 Oct 2017 23:22:12 +0800 Subject: [PATCH 07/79] PRF&PUB:20170310 Why DevOps is the end of security as we know it.md @geekpi --- ...ps is the end of security as we know it.md | 66 ++++++++++++++++++ ...ps is the end of security as we know it.md | 67 ------------------- 2 files changed, 66 insertions(+), 67 deletions(-) create mode 100644 published/20170310 Why DevOps is the end of security as we know it.md delete mode 100644 translated/talk/20170310 Why DevOps is the end of security as we know it.md diff --git a/published/20170310 Why DevOps is the end of security as we know it.md b/published/20170310 Why DevOps is the end of security as we know it.md new file mode 100644 index 0000000000..ad7f437867 --- /dev/null +++ b/published/20170310 Why DevOps is the end of security as we know it.md @@ -0,0 +1,66 @@ +为什么 DevOps 如我们所知道的那样,是安全的终结 +========== + +![](https://techbeacon.com/sites/default/files/styles/article_hero_image/public/field/image/rugged-devops-end-of-security.jpg?itok=Gp1xxSMK) + +安全难以推行。在企业管理者迫使开发团队尽快发布程序的大环境下,很难说服他们花费有限的时间来修补安全漏洞。但是鉴于所有网络攻击中有 84% 发生在应用层,作为一个组织是无法承担其开发团队不包括安全性带来的后果。 + +DevOps 的崛起为许多安全负责人带来了困境。Sonatype 的前 CTO [Josh Corman][2] 说:“这是对安全的威胁,但这也是让安全变得更好的机会。” Corman 是一个坚定的[将安全和 DevOps 实践整合起来创建 “坚固的 DevOps”][3]的倡导者。_Business Insights_ 与 Corman 谈论了安全和 DevOps 共同的价值,以及这些共同价值如何帮助组织更少地受到中断和攻击的影响。 + +### 安全和 DevOps 实践如何互惠互利? + +**Josh Corman:** 一个主要的例子是 DevOps 团队对所有可测量的东西进行检测的倾向。安全性一直在寻找更多的情报和遥测。你可以获取许多 DevOps 团队正在测量的信息,并将这些信息输入到你的日志管理或 SIEM (安全信息和事件管理系统)。 + +一个 OODA 循环(观察observe定向orient决定decide行为act)的前提是有足够普遍的眼睛和耳朵,以注意到窃窃私语和回声。DevOps 为你提供无处不在的仪器。 + +### 他们有分享其他文化观点吗? + +**JC:** “严肃对待你的代码”是一个共同的价值观。例如,由 Netflix 编写的软件工具 Chaos Monkey 是 DevOps 团队的分水岭。它是为了测试亚马逊网络服务的弹性和可恢复性而创建的,Chaos Monkey 使得 Netflix 团队更加强大,更容易为中断做好准备。 + +所以现在有个想法是我们的系统需要测试,因此,James Wickett 和我及其他人决定做一个邪恶的、带有攻击性的 Chaos Monkey,这就是 GAUNTLT 项目的来由。它基本上是一堆安全测试, 可以在 DevOps 周期和 DevOps 工具链中使用。它也有非常适合 DevOps 的API。 + +### 企业安全和 DevOps 价值在哪里相交? + +**JC:** 这两个团队都认为复杂性是一切事情的敌人。例如,[安全人员和 Rugged DevOps 人员][4]实际上可以说:“看,我们在我们的项目中使用了 11 个日志框架 - 也许我们不需要那么多,也许攻击面和复杂性可能会让我们受到伤害或者损害产品的质量或可用性。” + +复杂性往往是许多事情的敌人。通常情况下,你不会很难说服 DevOps 团队在架构层面使用更好的建筑材料:使用最新的、最不易受攻击的版本,并使用较少的组件。 + +### “更好的建筑材料”是什么意思? + +**JC:** 我是世界上最大的开源仓库的保管人,所以我能看到他们在使用哪些版本,里面有哪些漏洞,何时他们没有修复漏洞,以及等了多久。例如,某些日志记录框架从不会修复任何错误。其中一些会在 90 天内修复了大部分的安全漏洞。人们越来越多地遭到攻击,因为他们使用了一个毫无安全的框架。 + +除此之外,即使你不知道日志框架的质量,拥有 11 个不同的框架会变得非常笨重、出现 bug,还有额外的工作和复杂性。你暴露在漏洞中的风险是非常大的。你想把时间花在修复大量的缺陷上,还是在制造下一个大的破坏性的事情上? + +[Rugged DevOps 的关键是软件供应链管理][5],其中包含三个原则:使用更少和更好的供应商、使用这些供应商的最高质量的部分、并跟踪这些部分,以便在发生错误时,你可以有一个及时和敏捷的响应。 + +### 所以变更管理也很重要。 + +**JC:** 是的,这是另一个共同的价值。我发现,当一家公司想要执行诸如异常检测或净流量分析等安全测试时,他们需要知道“正常”的样子。让人们失误的许多基本事情与仓库和补丁管理有关。 + +我在 _Verizon 数据泄露调查报告_中看到,追踪去年被成功利用的漏洞后,其中 97% 归结为 10 个 CVE(常见漏洞和风险),而这 10 个已经被修复了十多年。所以,我们羞于谈论高级间谍活动。我们没有做基本的补丁工作。现在,我不是说如果你修复这 10 个CVE,那么你就没有被利用,而是这占据了人们实际失误的最大份额。 + +[DevOps 自动化工具][6]的好处是它们已经成为一个意外的变更管理数据库。其真实反应了谁在哪里什么时候做了变更。这是一个巨大的胜利,因为我们经常对安全性有最大影响的因素无法控制。你承受了 CIO 和 CTO 做出的选择的后果。随着 IT 通过自动化变得更加严格和可重复,你可以减少人为错误的机会,并且哪里发生了变化更加可追溯。 + +### 你认为什么是最重要的共同价值? + +**JC:** DevOps 涉及到过程和工具链,但我认为定义这种属性的是文化,特别是同感。 DevOps 有用是因为开发人员和运维团队能够更好地了解彼此,并做出更明智的决策。不是在解决孤岛中的问题,而是为了活动流程和目标解决。如果你向 DevOps 的团队展示安全如何能使他们变得更好,那么作为回馈他们往往会问:“那么,我们是否有任何选择让你的生活更轻松?”因为他们通常不知道他们做的 X、Y 或 Z 的选择使它无法包含安全性。 + +对于安全团队,驱动价值的方法之一是在寻求帮助之前变得更有所帮助,在我们告诉 DevOps 团队要做什么之前提供定性和定量的价值。你必须获得 DevOps 团队的信任,并获得发挥的权利,然后才能得到回报。它通常比你想象的快很多。 + +-------------------------------------------------------------------------------- + +via: https://techbeacon.com/why-devops-end-security-we-know-it + +作者:[Mike Barton][a] +译者:[geekpi](https://github.com/geekpi) +校对:[wxy](https://github.com/wxy) + +本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 + +[a]:https://twitter.com/intent/follow?original_referer=https%3A%2F%2Ftechbeacon.com%2Fwhy-devops-end-security-we-know-it%3Fimm_mid%3D0ee8c5%26cmp%3Dem-webops-na-na-newsltr_20170310&ref_src=twsrc%5Etfw®ion=follow_link&screen_name=mikebarton&tw_p=followbutton +[1]:https://techbeacon.com/resources/application-security-devops-true-state?utm_source=tb&utm_medium=article&utm_campaign=inline-cta +[2]:https://twitter.com/joshcorman +[3]:https://techbeacon.com/want-rugged-devops-team-your-release-security-engineers +[4]:https://techbeacon.com/rugged-devops-rsa-6-takeaways-security-ops-pros +[5]:https://techbeacon.com/josh-corman-security-devops-how-shared-team-values-can-reduce-threats +[6]:https://techbeacon.com/devops-automation-best-practices-how-much-too-much diff --git a/translated/talk/20170310 Why DevOps is the end of security as we know it.md b/translated/talk/20170310 Why DevOps is the end of security as we know it.md deleted file mode 100644 index 2044247476..0000000000 --- a/translated/talk/20170310 Why DevOps is the end of security as we know it.md +++ /dev/null @@ -1,67 +0,0 @@ -# 为什么 DevOps 是我们所知道的安全的终结 - -![](https://techbeacon.com/sites/default/files/styles/article_hero_image/public/field/image/rugged-devops-end-of-security.jpg?itok=Gp1xxSMK) - -安全可能是一个艰难的销售。在企业管理者迫使开发团队尽快发布程序的环境下,很难说服他们花费有限的周期来修补安全漏洞。但是鉴于所有网络攻击中有 84% 发生在应用层,组织无法承担其开发团队不包括安全性带来的后果。 - -DevOps 的崛起为许多安全领导者带来了困境。Sonatype 的前 CTO [Josh Corman][2] 说:“这是对安全的威胁,但这是安全变得更好的机会。” Corman 是一个坚定的[整合安全和 DevOps 实践来创建 “坚固的 DevOps”][3]的倡导者。_Business Insights_ 与 Corman 谈论了安全和 DevOps 共同的价值,以及这些共同价值如何帮助组织受到更少受到中断和攻击的影响。 - -DevOps 中真正的安全状态是什么?[获取报告][1] - -### 安全和 DevOps 实践如何互惠互利? - -** Josh Corman:** 一个主要的例子是 DevOps 团队对所有可测量的东西进行检测的倾向。安全性一直在寻找更多的情报和遥测。你可以采纳许多 DevOps 团队正在测量的内容, 并将这些信息输入到你的日志管理或 SIEM (安全信息和事件管理系统)。 - -一个 OODA 循环(观察、定向、决定、行为)的前提是有足够普遍的眼睛和耳朵, 以注意到窃窃私语和回声。DevOps 为你提供无处不在的仪器。 - -### 他们有分享的其他文化态度吗? - -** JC:** “严肃对待你的代码”是一个共同的价值。例如,由 Netflix 编写的软件工具 Chaos Monkey 是 DevOps 团队的分水岭。它是为了测试亚马逊网络服务的弹性和可恢复性,Chaos Monkey 使得 Netflix 团队更加强大,更容易为中断做好准备。 - -所以现在有个想法是我们的系统需要测试,因此,James Wickett 和我和其他人决定做一个邪恶的、武装的 Chaos Monkey,这就是 GAUNTLT 项目的来由。它基本上是一堆安全测试, 可以在 DevOps 周期和 DevOps 工具链中使用。它也有非常 DevOps 友好的API。 - -### 企业安全和 DevOps 价值在哪里相交? - -** JC:** 两个团队都认为复杂性是一切事情的敌人。例如,[安全人员和坚固 DevOps 人员][4]实际上可以说:“看,我们在我们的项目中使用了 11 个日志框架 - 也许我们不需要那么多,也许攻击面和复杂性可能会让我们受到伤害或者损害产品的质量或可用性。” - -复杂性往往是许多事情的敌人。通常情况下,你不会很难说服 DevOps 团队在架构层面使用更好的建筑材料:使用最新的,最不易受攻击的版本,并使用较少的。 - -### “更好的建筑材料”是什么意思? - -** JC:** 我是世界上最大的开源仓库的保管人,所以我能看到他们在使用哪些版本,里面有哪些漏洞,何时不为漏洞进行修复, 以及多久。例如,某些日志记录框架不会修复任何错误。其中一些在 90 天内修复了大部分的安全漏洞。人们越来越多地遭到破坏,因为他们使用了一个没有安全的框架。 - -除此之外,即使你不知道日志框架的质量,拥有 11 个不同的框架会变得非常笨重、出现 bug,还有额外的工作和复杂性。你暴露在漏洞中的风险要大得多。你想花时间在修复大量的缺陷上,还是在制造下一个大的破坏性的事情上? - -[坚固的 DevOps 的关键是软件供应链管理][5],其中包含三个原则:使用更少和更好的供应商、使用这些供应商的最高质量的部分、并跟踪这些部分,以便在发生错误时,你可以有一个及时和敏捷的响应。 - -### 所以改变管理也很重要。 - -** JC:** 是的,这是另一个共同的价值。我发现,当一家公司想要执行诸如异常检测或净流量分析等安全测试时,他们需要知道“正常”的样子。让人们失误的许多基本事情与仓库和补丁管理有关。 - -我在 _Verizon 数据泄露调查报告中看到_,去年成功利用 97% 的漏洞追踪后只有 10 个 CVE(常见漏洞和风险),而这 10 个已经被修复了十多年。所以,我们羞于谈论高级间谍活动。我们没有做基本的补丁。现在,我不是说如果你修复这 10 个CVE,那么你就没有被利用,而这占据了人们实际失误的最大份额。 - -[DevOps 自动化工具][6]的好处是它们已经成为一个意外的变更管理数据库。这真实反应了谁在哪里什么时候做了更改。这是一个巨大的胜利,因为我们经常对安全性有最大影响的因素无法控制。你承受了 CIO 和 CTO 做出的选择的后果。随着 IT 通过自动化变得更加严格和可重复,你可以减少人为错误的机会,并可在哪里发生变化更加可追溯。 - -### 你说什么是最重要的共同价值? - -** JC:** DevOps 涉及过程和工具链,但我认为定义属性是文化,特别是移情。 DevOps 有用是因为开发人员和运维团队更好地了解彼此,并能做出更明智的决策。不是在解决孤岛中的问题,而是为了活动流程和目标解决。如果你向 DevOps 的团队展示安全如何能使他们变得更好,那么作为回馈他们往往会问:“那么, 我们是否有任何选择让你的生活更轻松?”因为他们通常不知道他们做的 X、Y 或 Z 的选择使它无法包含安全性。 - -对于安全团队,驱动价值的方法之一是在寻求帮助之前变得更有所帮助,在我们告诉 DevOps 团队要做什么之前提供定性和定量的价值。你必须获得 DevOps 团队的信任,并获得发挥的权利,然后才能得到回报。它通常比你想象的快很多。 - --------------------------------------------------------------------------------- - -via: https://techbeacon.com/why-devops-end-security-we-know-it - -作者:[Mike Barton][a] -译者:[geekpi](https://github.com/geekpi) -校对:[校对者ID](https://github.com/校对者ID) - -本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 - -[a]:https://twitter.com/intent/follow?original_referer=https%3A%2F%2Ftechbeacon.com%2Fwhy-devops-end-security-we-know-it%3Fimm_mid%3D0ee8c5%26cmp%3Dem-webops-na-na-newsltr_20170310&ref_src=twsrc%5Etfw®ion=follow_link&screen_name=mikebarton&tw_p=followbutton -[1]:https://techbeacon.com/resources/application-security-devops-true-state?utm_source=tb&utm_medium=article&utm_campaign=inline-cta -[2]:https://twitter.com/joshcorman -[3]:https://techbeacon.com/want-rugged-devops-team-your-release-security-engineers -[4]:https://techbeacon.com/rugged-devops-rsa-6-takeaways-security-ops-pros -[5]:https://techbeacon.com/josh-corman-security-devops-how-shared-team-values-can-reduce-threats -[6]:https://techbeacon.com/devops-automation-best-practices-how-much-too-much From fc7c33a835624724d3ecf9106a4694c6836dbe75 Mon Sep 17 00:00:00 2001 From: geekpi Date: Wed, 11 Oct 2017 08:44:49 +0800 Subject: [PATCH 08/79] translated --- ...cing Flashback an Internet mocking tool.md | 211 ------------------ ...cing Flashback an Internet mocking tool.md | 208 +++++++++++++++++ 2 files changed, 208 insertions(+), 211 deletions(-) delete mode 100644 sources/tech/20170403 Introducing Flashback an Internet mocking tool.md create mode 100644 translated/tech/20170403 Introducing Flashback an Internet mocking tool.md diff --git a/sources/tech/20170403 Introducing Flashback an Internet mocking tool.md b/sources/tech/20170403 Introducing Flashback an Internet mocking tool.md deleted file mode 100644 index 0cb55ef14f..0000000000 --- a/sources/tech/20170403 Introducing Flashback an Internet mocking tool.md +++ /dev/null @@ -1,211 +0,0 @@ -translating---geekpi - -Introducing Flashback, an Internet mocking tool -============================================================ - -> Flashback is designed to mock HTTP and HTTPS resources, like web services and REST APIs, for testing purposes. - - ![Introducing Flashback, an Internet mocking tool](https://opensource.com/sites/default/files/styles/image-full-size/public/images/life/OSDC_Internet_Cables_520x292_0614_RD.png?itok=U4sZjWv5 "Introducing Flashback, an Internet mocking tool") ->Image by : Opensource.com - -At LinkedIn, we often develop web applications that need to interact with third-party websites. We also employ automatic testing to ensure the quality of our software before it is shipped to production. However, a test is only as useful as it is reliable. - -With that in mind, it can be highly problematic for a test to have external dependencies, such as on a third-party website, for instance. These external sites may change without notice, suffer from downtime, or otherwise become temporarily inaccessible, as the Internet is not 100% reliable. - -If one of our tests relies on being able to communicate with a third-party website, the cause of any failures is hard to pinpoint. A failure could be due to an internal change at LinkedIn, an external change made by the maintainers of the third-party website, or an issue with the network infrastructure. As you can imagine, there are many reasons why interactions with a third-party website may fail, so you may wonder, how will I deal with this problem? - -The good news is that there are many Internet mocking tools that can help. One such tool is [Betamax][4]. It works by intercepting HTTP connections initiated by a web application and then later replaying them. For a test, Betamax can be used to replace any interaction over HTTP with previously recorded responses, which can be served very reliably. - -Initially, we chose to use Betamax in our test automation at LinkedIn. It worked quite well, but we ran into a few problems: - -* For security reasons, our test environment does not have Internet access; however, as with most proxies, Betamax requires an Internet connection to function properly. -* We have many use cases that require using authentication protocols, such as OAuth and OpenId. Some of these protocols require complex interactions over HTTP. In order to mock them, we needed a sophisticated model for capturing and replaying the requests. - -To address these challenges, we decided to build upon ideas established by Betamax and create our own Internet mocking tool, called Flashback. We are also proud to announce that Flashback is now open source. - -### What is Flashback? - -Flashback is designed to mock HTTP and HTTPS resources, like web services and [REST][5] APIs, for testing purposes. It records HTTP/HTTPS requests and plays back a previously recorded HTTP transaction—which we call a "scene"—so that no external connection to the Internet is required in order to complete testing. - -Flashback can also replay scenes based on the partial matching of requests. It does so using "match rules." A match rule associates an incoming request with a previously recorded request, which is then used to generate a response. For example, the following code snippet implements a basic match rule, where the test method "matches" an incoming request via [this URL][6]. - -HTTP requests generally contain a URL, method, headers, and body. Flashback allows match rules to be defined for any combination of these components. Flashback also allows users to add whitelist or blacklist labels to URL query parameters, headers, and the body. - -For instance, in an OAuth authorization flow, the request query parameters may look like the following: - -``` -oauth_consumer_key="jskdjfljsdklfjlsjdfs", -      oauth_nonce="ajskldfjalksjdflkajsdlfjasldfja;lsdkj", -oauth_signature="asdfjaklsdjflasjdflkajsdklf", -oauth_signature_method="HMAC-SHA1", -oauth_timestamp="1318622958", -oauth_token="asdjfkasjdlfajsdklfjalsdjfalksdjflajsdlfa", -oauth_version="1.0" -``` - -Many of these values will change with every request because OAuth requires clients to generate a new value for **oauth_nonce** every time. In our testing, we need to verify values of **oauth_consumer_key, oauth_signature_method**, and **oauth_version** while also making sure that **oauth_nonce**, **oauth_signature**, **oauth_timestamp**, and **oauth_token** exist in the request. Flashback gives us the ability to create our own match rules to achieve this goal. This feature lets us test requests with time-varying data, signatures, tokens, etc. without any changes on the client side. - -This flexible matching and the ability to function without connecting to the Internet are the attributes that separate Flashback from other mocking solutions. Some other notable features include: - -* Flashback is a cross-platform and cross-language solution, with the ability to test both JVM (Java Virtual Machine) and non-JVM (C++, Python, etc.) apps. -* Flashback can generate SSL/TLS certificates on the fly to emulate secured channels for HTTPS requests. - -### How to record an HTTP transaction - -Recording an HTTP transaction for later playback using Flashback is a relatively straightforward process. Before we dive into the procedure, let us first lay out some terminology: - -* A** Scene** stores previously recorded HTTP transactions (in JSON format) that can be replayed later. For example, here is one sample [Flashback scene][1].      -* The **Root Path** is the file path of the directory that contains the Flashback scene data. -* A **Scene Name** is the name of a given scene. -* A **Scene Mode** is the mode in which the scene is being used—either "record" or "playback." -* A **Match Rule** is a rule that determines if the incoming client request matches the contents of a given scene. -* **Flashback Proxy** is an HTTP proxy with two modes of operation, record and playback.  -* **Host** and **port** are the proxy host and port. - -In order to record a scene, you must make a real, external request to the destination, and the HTTPS request and response will then be stored in the scene with the match rule that you have specified. When recording, Flashback behaves exactly like a typical MITM (Man in the Middle) proxy—it is only in playback mode that the connection flow and data flow are restricted to just between the client and the proxy. - -To see Flashback in action, let us create a scene that captures an interaction with example.org by doing the following: - -1\. Check out the Flashback source code: - -``` -git clone https://github.com/linkedin/flashback.git -``` - -2\. Start the Flashback admin server: - -``` -./startAdminServer.sh -port 1234 -``` - -3\. Start the [Flashback Proxy][7]. Note the Flashback above will be started in record mode on localhost, port 5555\. The match rule requires an exact match (match HTTP body, headers, and URL). The scene will be stored under **/tmp/test1**. - -4\. Flashback is now ready to record, so use it to proxy a request to example.org: - -``` -curl http://www.example.org -x localhost:5555 -X GET -``` - -5\. Flashback can (optionally) record multiple requests in a single. To finish recording, [shut down Flashback][8]. - -6\. To verify what has been recorded, we can view the contents of the scene in the output directory (**/tmp/test1**). It should [contain the following][9]. - -It's also easy to [use Flashback in your Java code][10]. - -### How to replay an HTTP transaction - -To replay a previously stored scene, use the same basic setup as is used when recording; the only difference is that you [set the "Scene Mode" to "playback" in Step 3 above][11]. - -One way to verify that the response is from the scene, and not the external source, is to disable your Internet connectivity temporarily when you go through Steps 1 through 6\. Another way is to modify your scene file and see if the response is the same as what you have in the file. - -Here is [an example in Java][12]. - -### How to record and replay an HTTPS transaction - -The process for recording and replaying an HTTPS transaction with Flashback is very similar to that used for HTTP transactions. However, special care needs to be given to the security certificates used for the SSL component of HTTPS. In order for Flashback to act as a MITM proxy, creating a Certificate Authority (CA) certificate is necessary. This certificate will be used during the creation of the secure channel between the client and Flashback, and will allow Flashback to inspect the data in HTTPS requests it proxies. This certificate should then be stored as a trusted source so that the client will be able to authenticate Flashback when making calls to it. For instructions on how to create a certificate, there are many resources [like this one][13] that can be quite helpful. Most companies have their own internal policies for administering and securing certificates—be sure to follow yours. - -It is worth noting here that Flashback is intended to be used for testing purposes only. Feel free to integrate Flashback with your service whenever you need it, but note that the record feature of Flashback will need to store everything from the wire, then use it during the replay mode. We recommend that you pay extra attention to ensure that no sensitive member data is being recorded or stored inadvertently. Anything that may violate your company's data protection or privacy policy is your responsibility. - -Once the security certificate is accounted for, the only difference between HTTP and HTTPS in terms of setup for recording is the addition of a few further parameters. - -* **RootCertificateInputStream**: This can be either a stream or file path that indicates the CA certificate's filename. -* **RootCertificatePassphrase**: This is the passphrase created for the CA certificate. -* **CertificateAuthority**: These are the CA certificate's properties. - -[View the code used to record an HTTPS transaction][14] with Flashback, including the above terms. - -Replaying an HTTPS transaction with Flashback uses the same process as recording. The only difference is that the scene mode is set to "playback." This is demonstrated in [this code][15]. - -### Supporting dynamic changes - -In order to allow for flexibility in testing, Flashback lets you dynamically change scenes and match rules. Changing scenes dynamically allows for testing the same requests with different responses, such as success, **time_out**, **rate_limit**, etc. [Scene changes][16] only apply to scenarios where we have POSTed data to update the external resource. See the following diagram as an example. - - ![Scenarios where we have POSTed data to update the external resource.](https://opensource.com/sites/default/files/changingscenes.jpg "Scenarios where we have POSTed data to update the external resource.") - -Being able to [change the match rule][17] dynamically allows us to test complicated scenarios. For example, we have a use case that requires us to test HTTP calls to both public and private resources of Twitter. For public resources, the HTTP requests are constant, so we can use the "MatchAll" rule. However, for private resources, we need to sign requests with an OAuth consumer secret and an OAuth access token. These requests contain a lot of parameters that have unpredictable values, so the static MatchAll rule wouldn't work. - -### Use cases - -At LinkedIn, Flashback is mainly used for mocking different Internet providers in integration tests, as illustrated in the diagrams below. The first diagram shows an internal service inside a LinkedIn production data center interacting with Internet providers (such as Google) via a proxy layer. We want to test this internal service in a testing environment. - - ![Testing this internal service in a testing environment.](https://opensource.com/sites/default/files/testingenvironment.jpg "Testing this internal service in a testing environment.") - -The second and third diagrams show how we can record and playback scenes in different environments. Recording happens in our dev environment, where the user starts Flashback on the same port as the proxy started. All external requests from the internal service to providers will go through Flashback instead of our proxy layer. After the necessary scenes get recorded, we can deploy them to our test environment. - - ![After the necessary scenes get recorded, we can deploy them to our test environment.](https://opensource.com/sites/default/files/testenvironmentimage2.jpg "After the necessary scenes get recorded, we can deploy them to our test environment.") - -In the test environment (which is isolated and has no Internet access), Flashback is started on the same port as in the dev environment. All HTTP requests are still coming from the internal service, but the responses will come from Flashback instead of the Internet providers. - - ![Responses will come from Flashback instead of the Internet providers.](https://opensource.com/sites/default/files/flashbackresponsesimage.jpg "Responses will come from Flashback instead of the Internet providers.") - -### Future directions - -We'd like to see if we can support non-HTTP protocols, such as FTP or JDBC, in the future, and maybe even give users the flexibility to inject their own customized protocol using the MITM proxy framework. We will continue improving the Flashback setup API to make supporting non-Java languages easier. - -### Now available as an open source project - -We were fortunate enough to present Flashback at GTAC 2015\. At the show, several members of the audience asked if we would be releasing Flashback as an open source project so they could use it for their own testing efforts. - -### Google TechTalks: GATC 2015—Mock the Internet - - - -We're happy to announce that Flashback is now open source and is available under a BSD (Berkeley Software Distribution) two-clause license. To get started, visit the [Flashback GitHub repo][18]. - - _Originally posted on the [LinkedIn Engineering blog][2]. Reposted with permission._ - -### Acknowledgements - -Flashback was created by [Shangshang Feng][19], [Yabin Kang][20], and [Dan Vinegrad][21], and inspired by [Betamax][22]. Special thanks to [Hwansoo Lee][23], [Eran Leshem][24], [Kunal Kandekar][25], [Keith Dsouza][26], and [Kang Wang][27] for help with code reviews. We would also thank our management—[Byron Ma][28], [Yaz Shimizu][29], [Yuliya Averbukh][30], [Christopher Hazlett][31], and [Brandon Duncan][32]—for their support in the development and open sourcing of Flashback. - --------------------------------------------------------------------------------- - -作者简介: - -Shangshang Feng - Shangshang is senior software engineer in LinkedIn's NYC office. He spent the last three and half years working on a gateway platform at LinkedIn. Before LinkedIn, he worked on infrastructure teams at Thomson Reuters and ViewTrade securities. - ---------- - -via: https://opensource.com/article/17/4/flashback-internet-mocking-tool - -作者:[ Shangshang Feng][a] -译者:[译者ID](https://github.com/译者ID) -校对:[校对者ID](https://github.com/校对者ID) - -本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 - -[a]:https://opensource.com/users/shangshangfeng -[1]:https://gist.github.com/anonymous/17d226050d8a9b79746a78eda9292382 -[2]:https://engineering.linkedin.com/blog/2017/03/flashback-mocking-tool -[3]:https://opensource.com/article/17/4/flashback-internet-mocking-tool?rate=Jwt7-vq6jP9kS7gOT6f6vgwVlZupbyzWsVXX41ikmGk -[4]:https://github.com/betamaxteam/betamax -[5]:https://en.wikipedia.org/wiki/Representational_state_transfer -[6]:https://gist.github.com/anonymous/91637854364287b38897c0970aad7451 -[7]:https://gist.github.com/anonymous/2f5271191edca93cd2e03ce34d1c2b62 -[8]:https://gist.github.com/anonymous/f899ebe7c4246904bc764b4e1b93c783 -[9]:https://gist.github.com/sf1152/c91d6d62518fe62cc87157c9ce0e60cf -[10]:https://gist.github.com/anonymous/fdd972f1dfc7363f4f683a825879ce19 -[11]:https://gist.github.com/anonymous/ae1c519a974c3bc7de2a925254b6550e -[12]:https://gist.github.com/anonymous/edcc1d60847d51b159c8fd8a8d0a5f8b -[13]:https://jamielinux.com/docs/openssl-certificate-authority/introduction.html -[14]:https://gist.github.com/anonymous/091d13179377c765f63d7bf4275acc11 -[15]:https://gist.github.com/anonymous/ec6a0fd07aab63b7369bf8fde69c1f16 -[16]:https://gist.github.com/anonymous/1f1660280acb41277fbe2c257bab2217 -[17]:https://gist.github.com/anonymous/0683c43f31bd916b76aff348ff87f51b -[18]:https://github.com/linkedin/flashback -[19]:https://www.linkedin.com/in/shangshangfeng -[20]:https://www.linkedin.com/in/benykang -[21]:https://www.linkedin.com/in/danvinegrad/ -[22]:https://github.com/betamaxteam/betamax -[23]:https://www.linkedin.com/in/hwansoo/ -[24]:https://www.linkedin.com/in/eranl/ -[25]:https://www.linkedin.com/in/kunalkandekar/ -[26]:https://www.linkedin.com/in/dsouzakeith/ -[27]:https://www.linkedin.com/in/kang-wang-44960b4/ -[28]:https://www.linkedin.com/in/byronma/ -[29]:https://www.linkedin.com/in/yazshimizu/ -[30]:https://www.linkedin.com/in/yuliya-averbukh-818a41/ -[31]:https://www.linkedin.com/in/chazlett/ -[32]:https://www.linkedin.com/in/dudcat/ -[33]:https://opensource.com/user/125361/feed -[34]:https://opensource.com/users/shangshangfeng diff --git a/translated/tech/20170403 Introducing Flashback an Internet mocking tool.md b/translated/tech/20170403 Introducing Flashback an Internet mocking tool.md new file mode 100644 index 0000000000..f4383b10b1 --- /dev/null +++ b/translated/tech/20170403 Introducing Flashback an Internet mocking tool.md @@ -0,0 +1,208 @@ +介绍 Flashback,一个互联网模拟工具 +============================================================ + +> Flashback 用于模拟 HTTP 和 HTTPS 资源,如 Web 服务和 REST API,用于测试目的。 + + ![Introducing Flashback, an Internet mocking tool](https://opensource.com/sites/default/files/styles/image-full-size/public/images/life/OSDC_Internet_Cables_520x292_0614_RD.png?itok=U4sZjWv5 "Introducing Flashback, an Internet mocking tool") +>图片提供: Opensource.com + +在 LinkedIn,我们经常开发需要与第三方网站交互的 Web 应用程序。我们还采用自动测试,以确保我们的软件在发布到生产环境之前的质量。然而,测试只是在可靠时才有用。 + +考虑到这一点,有外部依赖关系的测试是有很大的问题的,例如在第三方网站上。这些外部网站可能会没有通知地发生改变、遭受停机、或者由于互联网的不可靠性暂时无法访问。 + +如果我们的一个测试依赖于能够与第三方网站通信,那么任何故障的原因都很难确定。失败可能是因为 LinkedIn 的内部变更,第三方网站的维护人员进行的外部变更或网络基础设施的问题。你可以想像,与第三方网站的交互可能会有很多失败的原因,因此你可能想要知道,我将如何处理这个问题? + +好消息是有许多互联网模拟工具可以帮助。其中一个是 [Betamax][4]。它通过拦截 Web 应用程序发起的 HTTP 连接,之后重放起作用。对于测试,Betamax 可以用来替换以前记录的响应的 HTTP 上的任何交互,它可以非常可靠地提供这个服务。 + +最初,我们选择在 LinkedIn 的自动化测试中使用 Betamax。它工作得很好,但我们遇到了一些问题: + +* 出于安全考虑,我们的测试环境没有接入互联网。然而,与大多数代理一样,Betamax 需要 Internet 连接才能正常运行。 +* 我们有许多需要使用身份验证协议的情况,例如 OAuth 和 OpenId。其中一些协议需要通过 HTTP 进行复杂的交互。为了模拟它们,我们需要一个复杂的模型来捕获和重放请求。 + +为了应对这些挑战,我们决定基于 Betamax 的想法,构建我们自己的互联网模拟工具,名为 Flashback。我们也很自豪地宣布 Flashback 现在是开源的。 + +### 什么是 Flashback? + +Flashback 用于模拟 HTTP 和 HTTPS 资源,如 Web 服务和 [REST][5] API,用于测试目的。它记录 HTTP/HTTPS 请求并重放以前记录的 HTTP 事务 - 我们称之为“场景”,这样就不需要连接到 Internet 才能完成测试。 + +Flashback 也可以根据请求的部分匹配重放场景。它使用的是“匹配规则”。匹配规则将传入请求与先前记录的请求相关联,然后将其用于生成响应。例如,以下代码片段实现了一个基本匹配规则,其中测试方法“匹配”[此 URL][6]的传入请求。 + +HTTP 请求通常包含URL、方法、标头和正文。Flashback 允许为这些组件的任意组合定义匹配规则。Flashback 还允许用户向 URL 查询参数,标头和正文添加白名单或黑名单标签。 + +例如,在 OAuth 授权流程中,请求查询参数可能如下所示: + +``` +oauth_consumer_key="jskdjfljsdklfjlsjdfs", +      oauth_nonce="ajskldfjalksjdflkajsdlfjasldfja;lsdkj", +oauth_signature="asdfjaklsdjflasjdflkajsdklf", +oauth_signature_method="HMAC-SHA1", +oauth_timestamp="1318622958", +oauth_token="asdjfkasjdlfajsdklfjalsdjfalksdjflajsdlfa", +oauth_version="1.0" +``` + +这些值许多将随着每个请求而改变,因为 OAuth 要求客户端每次为 **oauth_nonce** 生成一个新值。在我们的测试中,我们需要验证 **oauth_consumer_key、oauth_signature_method** 和 **oauth_version** 的值,同时确保 **oauth_nonce**、**oauth_signature**、**oauth_timestamp** 和 **oauth_token** 存在于请求中。Flashback 使我们有能力创建我们自己的匹配规则来实现这一目标。此功能允许我们测试随时间变化的数据、签名、令牌等的请求,而客户端没有任何更改。 + +这种灵活的匹配和在不连接互联网的情况下运行的功能是将 Flashback 与其他模拟解决方案分开的属性。其他一些显著特点包括: + +* Flashback 是一种跨平台和跨语言解决方案,能够测试 JVM(Java虚拟机)和非 JVM(C++、Python等)应用程序。 +* Flashback 可以随时生成 SSL/TLS 证书,以模拟 HTTPS 请求的安全通道。 + +### 如何记录 HTTP 事务 + +使用 Flashback 记录 HTTP 事务以便稍后重放是一个比较简单的过程。在我们深入了解流程之前,我们首先列出一些术语: + +* **场景** 存储以前记录的 HTTP 事务 (以 JSON 格式),它可以在以后重放。例如,这里是一个[Flashback 场景][1]示例。 +* **根路径** 是包含 Flashback 场景数据的目录的文件路径。 +* **场景名称** 是给定场景的名称。 +* **场景模式** 是使用场景的模式, 即“录制”或“重放”。 +* **匹配规则** 确定传入的客户端请求是否与给定场景的内容匹配的规则。 +* **Flashback 代理** 是一个 HTTP 代理,共有录制和重放两种操作模式。 +* **主机** 和 **端口** 是代理主机和端口。 + +为了录制场景,你必须向目的地址发出真实的外部请求,然后 HTTPS 请求和响应将使用你指定的匹配规则存储在场景中。在录制时,Flashback 的行为与典型的 MITM(中间人)代理完全相同 - 只有在重放模式下,连接流和数据流仅限于客户端和代理之间。 + +要实际看下 Flashback,让我们创建一个场景,通过执行以下操作捕获与 example.org 的交互: + +1\. 取回 Flashback 的源码: + +``` +git clone https://github.com/linkedin/flashback.git +``` + +2\. 启动 Flashback 管理服务器: + +``` +./startAdminServer.sh -port 1234 +``` + +3\. 注意上面的 Flashback 将在本地端口 5555 上启动录制模式。匹配规则需要完全匹配(匹配 HTTP 正文、标题和 URL)。场景将存储在 **/tmp/test1** 下。 +4\. Flashback 现在可以记录了,所以用它来代理对 example.org 的请求: + +``` +curl http://www.example.org -x localhost:5555 -X GET +``` + +5\. Flashback可以(可选)在一个记录中记录多个请求。要完成录制,[关闭 Flashback][8]。 + +6\. 要验证已记录的内容,我们可以在输出目录(**/tmp/test1**)中查看场景的内容。它应该[包含以下内容][9]。 + +这也很容易[在 Java 代码中使用 Flashback][10]。 + +### 如何重放 HTTP 事务 + +要重放先前存储的场景,请使用与录制时使用的相同的基本设置。唯一的区别是[将“场景模式”设置为上述步骤 3 中的“播放”][11]。 + +验证响应来自场景而不是外部源的一种方法是在你执行步骤 1 到 6 时临时禁用 Internet 连接。另一种方法是修改场景文件,看看响应是否与文件中的相同。 + +这是[ Java 中的一个例子][12]。 + +### 如何记录并重播HTTPS事务 + +使用 Flashback 记录并重放 HTTPS 事务的过程非常类似于 HTTP 事务的过程。但是,需要特别注意用于 HTTPS SSL 组件的安全证书。为了使 Flashback 作为 MITM 代理,必须创建证书颁发机构(CA)证书。在客户端和 Flashback 之间创建安全通道时将使用此证书,并允许 Flashback 检查其代理的 HTTPS 请求中的数据。然后将此证书存储为受信任的源,以便客户端在进行调用时能够对 Flashback 进行身份验证。有关如何创建证书的说明,有很多[类似这样][13]的资源是非常有帮助的。大多数公司都有自己的管理和获取证书的内部策略 - 请务必用你们自己的方法。 + +这里值得一提的是,Flashback 仅用于测试目的。你可以随时随地将 Flashback 与你的服务集成在一起,但需要注意的是,Flashback 的记录功能将需要存储所有的数据,然后在重放模式下使用它。我们建议你特别注意确保不会无意中记录或存储敏感成员数据。任何可能违反贵公司数据保护或隐私政策的行为都是你的责任。 + +一旦涉及安全证书,HTTP 和 HTTPS 之间在记录设置方面的唯一区别是添加了一些其他参数。 + +* **RootCertificateInputStream**: 表示 CA 证书文件路径或流。 +* **RootCertificatePassphrase**: 为CA证书创建的密码。 +* **CertificateAuthority**: CA证书的属性 + +[查看 Flashback 中用于记录 HTTPS 事务的代码][14],它包括上述条款。 + +使用 Flashback 重放 HTTPS 事务使用与录制相同的过程。唯一的区别是场景模式设置为“播放”。这在[此代码][15]中演示。 + +### 支持动态修改 + +为了测试灵活性,Flashback 允许你动态地更改场景和匹配规则。动态更改场景允许使用不同的响应(如 success、**time_out**、**rate_limit** 等)测试相同的请求。[场景更改][16]仅适用于我们已经 POST 更新外部资源的场景。以下图为例。 + + ![Scenarios where we have POSTed data to update the external resource.](https://opensource.com/sites/default/files/changingscenes.jpg "Scenarios where we have POSTed data to update the external resource.") + +这能够[更改匹配规则][17]动态地允许我们测试复杂的场景。例如,我们有一个情况,要求我们测试 Twitter 的公共和私有资源的 HTTP 调用。对于公共资源,HTTP 请求是不变的,所以我们可以使用 “MatchAll” 规则。然而,对于私人资源,我们需要使用 OAuth 消费者密码和 OAuth 访问令牌来签名请求。这些请求包含大量具有不可预测值的参数,因此静态 MatchAll 规则将无法正常工作。 + +### 使用案例 + +在 LinkedIn,Flashback 主要用于在集成测试中模拟不同的互联网提供商,如下图所示。第一张图展示了通过代理层与 LinkedIn 生产数据中心内的内部服务,该服务与互联网提供商(如 Google)进行交互。我们想在测试环境中测试这个内部服务。 + + ![Testing this internal service in a testing environment.](https://opensource.com/sites/default/files/testingenvironment.jpg "Testing this internal service in a testing environment.") + +第二和第三张图表展示了我们如何在不同的环境中录制和重放场景。记录发生在我们的开发环境中,用户在代理启动的同一端口上启动 Flashback。从内部服务到提供商的所有外部请求将通过 Flashback 而不是我们的代理层。在必要场景得到记录后,我们可以将其部署到我们的测试环境中。 + + ![After the necessary scenes get recorded, we can deploy them to our test environment.](https://opensource.com/sites/default/files/testenvironmentimage2.jpg "After the necessary scenes get recorded, we can deploy them to our test environment.") + +在测试环境(隔离并且没有 Internet 访问)中,Flashback 在与开发环境相同的端口上启动。所有 HTTP 请求仍然来自内部服务,但响应将来自 Flashback 而不是 Internet 提供商。 + + ![Responses will come from Flashback instead of the Internet providers.](https://opensource.com/sites/default/files/flashbackresponsesimage.jpg "Responses will come from Flashback instead of the Internet providers.") + +### 未来方向 + +我们希望将来可以支持非 HTTP 协议(如 FTP 或 JDBC),甚至可以让用户使用 MITM 代理框架来自行注入自己的定制协议。我们将继续改进 Flashback 设置 API,使其支持非 Java 语言更容易。 + +### 现在作为一个开源项目可用 + +我们很幸运能够在 GTAC 2015 上发布 Flashback。在展会上,有几名观众询问是否将 Flashback 作为开源项目发布,以便他们可以将其用于自己的测试工作。 + +### Google TechTalks:GATC 2015 - 模拟互联网 + + + +我们很高兴地宣布,Flashback 现在以 BSD(Berkeley Software Distribution)双条款许可证开源。要开始使用,请访问[ Flashback GitHub 仓库][18]。 + + _该文原始发表在[LinkedIn 工程博客上][2]。转载许可_ + +### 致谢 + +Flashback 由 [Shangshang Feng][19]、[Yabin Kang][20] 和 [Dan Vinegrad][21] 创建,并受到 [Betamax][22] 启发。特别感谢 [Hwansoo Lee][23]、[Eran Leshem][24]、[Kunal Kandekar][25]、[Keith Dsouza][26] 和 [Kang Wang][27] 帮助审阅代码。同样感谢我们的管理层 - [Byron Ma][28]、[Yaz Shimizu][29]、[Yuliya Averbukh][30]、[Christopher Hazlett][31] 和 [Brandon Duncan][32] - 感谢他们在开发和开源 Flashback 中的支持。 + +-------------------------------------------------------------------------------- + +作者简介: + +Shangshang Feng - Shangshang 是 LinkedIn 纽约市办公室的高级软件工程师。他花了三年半的时间在 LinkedIn 的网关平台工作。在加入 LinkedIn 之前,他曾在 Thomson Reuters 和 ViewTrade 证券的基础设施团队工作。 + +--------- + +via: https://opensource.com/article/17/4/flashback-internet-mocking-tool + +作者:[ Shangshang Feng][a] +译者:[geekpi](https://github.com/geekpi) +校对:[校对者ID](https://github.com/校对者ID) + +本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 + +[a]:https://opensource.com/users/shangshangfeng +[1]:https://gist.github.com/anonymous/17d226050d8a9b79746a78eda9292382 +[2]:https://engineering.linkedin.com/blog/2017/03/flashback-mocking-tool +[3]:https://opensource.com/article/17/4/flashback-internet-mocking-tool?rate=Jwt7-vq6jP9kS7gOT6f6vgwVlZupbyzWsVXX41ikmGk +[4]:https://github.com/betamaxteam/betamax +[5]:https://en.wikipedia.org/wiki/Representational_state_transfer +[6]:https://gist.github.com/anonymous/91637854364287b38897c0970aad7451 +[7]:https://gist.github.com/anonymous/2f5271191edca93cd2e03ce34d1c2b62 +[8]:https://gist.github.com/anonymous/f899ebe7c4246904bc764b4e1b93c783 +[9]:https://gist.github.com/sf1152/c91d6d62518fe62cc87157c9ce0e60cf +[10]:https://gist.github.com/anonymous/fdd972f1dfc7363f4f683a825879ce19 +[11]:https://gist.github.com/anonymous/ae1c519a974c3bc7de2a925254b6550e +[12]:https://gist.github.com/anonymous/edcc1d60847d51b159c8fd8a8d0a5f8b +[13]:https://jamielinux.com/docs/openssl-certificate-authority/introduction.html +[14]:https://gist.github.com/anonymous/091d13179377c765f63d7bf4275acc11 +[15]:https://gist.github.com/anonymous/ec6a0fd07aab63b7369bf8fde69c1f16 +[16]:https://gist.github.com/anonymous/1f1660280acb41277fbe2c257bab2217 +[17]:https://gist.github.com/anonymous/0683c43f31bd916b76aff348ff87f51b +[18]:https://github.com/linkedin/flashback +[19]:https://www.linkedin.com/in/shangshangfeng +[20]:https://www.linkedin.com/in/benykang +[21]:https://www.linkedin.com/in/danvinegrad/ +[22]:https://github.com/betamaxteam/betamax +[23]:https://www.linkedin.com/in/hwansoo/ +[24]:https://www.linkedin.com/in/eranl/ +[25]:https://www.linkedin.com/in/kunalkandekar/ +[26]:https://www.linkedin.com/in/dsouzakeith/ +[27]:https://www.linkedin.com/in/kang-wang-44960b4/ +[28]:https://www.linkedin.com/in/byronma/ +[29]:https://www.linkedin.com/in/yazshimizu/ +[30]:https://www.linkedin.com/in/yuliya-averbukh-818a41/ +[31]:https://www.linkedin.com/in/chazlett/ +[32]:https://www.linkedin.com/in/dudcat/ +[33]:https://opensource.com/user/125361/feed +[34]:https://opensource.com/users/shangshangfeng From 0d64104ad725bc44d0bd9f25c9998a7201a614d8 Mon Sep 17 00:00:00 2001 From: geekpi Date: Wed, 11 Oct 2017 08:49:11 +0800 Subject: [PATCH 09/79] translating --- sources/tech/20170811 UP – deploy serverless apps in seconds.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sources/tech/20170811 UP – deploy serverless apps in seconds.md b/sources/tech/20170811 UP – deploy serverless apps in seconds.md index 460aa6bfac..ae0e48fb83 100644 --- a/sources/tech/20170811 UP – deploy serverless apps in seconds.md +++ b/sources/tech/20170811 UP – deploy serverless apps in seconds.md @@ -1,3 +1,5 @@ +translating----geekpi + UP – deploy serverless apps in seconds ============================================================ From bf61691227b11fc4c3164c6f68b8c128b3eb538c Mon Sep 17 00:00:00 2001 From: Ezio Date: Wed, 11 Oct 2017 14:09:23 +0800 Subject: [PATCH 10/79] =?UTF-8?q?20171011-1=20=E9=80=89=E9=A2=98?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- ...oncurrent Servers Part 1 - Introduction.md | 214 ++++++++++++++++++ 1 file changed, 214 insertions(+) create mode 100644 sources/tech/20171002 Concurrent Servers Part 1 - Introduction.md diff --git a/sources/tech/20171002 Concurrent Servers Part 1 - Introduction.md b/sources/tech/20171002 Concurrent Servers Part 1 - Introduction.md new file mode 100644 index 0000000000..c6a6983b37 --- /dev/null +++ b/sources/tech/20171002 Concurrent Servers Part 1 - Introduction.md @@ -0,0 +1,214 @@ +[Concurrent Servers: Part 1 - Introduction][18] +============================================================ + +This is the first post in a series about concurrent network servers. My plan is to examine several popular concurrency models for network servers that handle multiple clients simultaneously, and judge those models on scalability and ease of implementation. All servers will listen for socket connections and implement a simple protocol to interact with clients. + +All posts in the series: + +* [Part 1 - Introduction][7] + +* [Part 2 - Threads][8] + +* [Part 3 - Event-driven][9] + +### The protocol + +The protocol used throughout this series is very simple, but should be sufficient to demonstrate many interesting aspects of concurrent server design. Notably, the protocol is  _stateful_  - the server changes internal state based on the data clients send, and its behavior depends on that internal state. Not all protocols all stateful - in fact, many protocols over HTTP these days are stateless - but stateful protocols are sufficiently common to warrant a serious discussion. + +Here's the protocol, from the server's point of view: + +![](https://raw.githubusercontent.com/LCTT/wiki-images/master/TranslateProject/ref_img/005.png) + +In words: the server waits for a new client to connect; when a client connects, the server sends it a `*` character and enters a "wait for message state". In this state, the server ignores everything the client sends until it sees a `^` character that signals that a new message begins. At this point it moves to the "in message" state, where it echoes back everything the client sends, incrementing each byte [[1]][10]. When the client sends a `$`, the server goes back to waiting for a new message. The `^` and `$` characters are only used to delimit messages - they are not echoed back. + +An implicit arrow exists from each state back to the "wait for client" state, in case the client disconnects. By corollary, the only way for a client to signal "I'm done" is to simply close its side of the connection. + +Obviously, this protocol is a simplification of more realistic protocols that have complicated headers, escape sequences (to support `$` inside a message body, for example) and additional state transitions, but for our goals this will do just fine. + +Another note: this series is introductory, and assumes clients are generally well behaved (albeit potentially slow); therefore there are no timeouts and no special provisions made to ensure that the server doesn't end up being blocked indefinitely by rogue (or buggy) clients. + +### A sequential server + +Our first server in this series is a simple "sequential" server, written in C without using any libraries beyond standard POSIX fare for sockets. The server is sequential because it can only handle a single client at any given time; when a client connects, the server enters the state machine shown above and won't even listen on the socket for new clients until the current client is done. Obviously this isn't concurrent and doesn't scale beyond very light loads, but it's helpful to discuss since we need a simple-to-understand baseline. + +The full code for this server [is here][11]; in what follows, I'll focus on some highlights. The outer loop in `main` listens on the socket for new clients to connect. Once a client connects, it calls `serve_connection` which runs through the protocol until the client disconnects. + +To accept new connections, the sequential server calls `accept` on a listening socket in a loop: + +``` +while (1) { + struct sockaddr_in peer_addr; + socklen_t peer_addr_len = sizeof(peer_addr); + + int newsockfd = + accept(sockfd, (struct sockaddr*)&peer_addr, &peer_addr_len); + + if (newsockfd < 0) { + perror_die("ERROR on accept"); + } + + report_peer_connected(&peer_addr, peer_addr_len); + serve_connection(newsockfd); + printf("peer done\n"); +} +``` + +Each time `accept` returns a new connected socket, the server calls `serve_connection`; note that this is a  _blocking_ call - until `serve_connection` returns, `accept` is not called again; the server blocks until one client is done before accepting a new client. In other words, clients are serviced  _sequentially_ . + +Here's `serve_connection`: + +``` +typedef enum { WAIT_FOR_MSG, IN_MSG } ProcessingState; + +void serve_connection(int sockfd) { + if (send(sockfd, "*", 1, 0) < 1) { + perror_die("send"); + } + + ProcessingState state = WAIT_FOR_MSG; + + while (1) { + uint8_t buf[1024]; + int len = recv(sockfd, buf, sizeof buf, 0); + if (len < 0) { + perror_die("recv"); + } else if (len == 0) { + break; + } + + for (int i = 0; i < len; ++i) { + switch (state) { + case WAIT_FOR_MSG: + if (buf[i] == '^') { + state = IN_MSG; + } + break; + case IN_MSG: + if (buf[i] == '$') { + state = WAIT_FOR_MSG; + } else { + buf[i] += 1; + if (send(sockfd, &buf[i], 1, 0) < 1) { + perror("send error"); + close(sockfd); + return; + } + } + break; + } + } + } + + close(sockfd); +} +``` + +It pretty much follows the protocol state machine. Each time around the loop, the server attempts to receive data from the client. Receiving 0 bytes means the client disconnected, and the loop exits. Otherwise, the received buffer is examined byte by byte, and each byte can potentially trigger a state change. + +The number of bytes `recv` returns is completely independent of the number of messages (`^...$` enclosed sequences of bytes) the client sends. Therefore, it's important to go through the whole buffer in a state-keeping loop. Critically, each received buffer may contain multiple messages, but also the start of a new message without its actual ending; the ending can arrive in the next buffer, which is why the processing state is maintained across loop iterations. + +For example, suppose the `recv` function in the main loop returned non-empty buffers three times for some connection: + +1. `^abc$de^abte$f` + +2. `xyz^123` + +3. `25$^ab$abab` + +What data is the server sending back? Tracing the code manually is very useful to understand the state transitions (for the answer see [[2]][12]). + +### Multiple concurrent clients + +What happens when multiple clients attempt to connect to the sequential server at roughly the same time? + +The server's code (and its name - `sequential-server`) make it clear that clients are only handled  _one at a time_ . As long as the server is busy dealing with a client in `serve_connection`, it doesn't accept new client connections. Only when the current client disconnects does `serve_connection` return and the outer-most loop may accept new client connections. + +To show this in action, [the sample code for this series][13] includes a Python script that simulates several clients trying to connect at the same time. Each client sends the three buffers shown above [[3]][14], with some delays between them. + +The client script runs the clients concurrently in separate threads. Here's a transcript of the client's interaction with our sequential server: + +``` +$ python3.6 simple-client.py -n 3 localhost 9090 +INFO:2017-09-16 14:14:17,763:conn1 connected... +INFO:2017-09-16 14:14:17,763:conn1 sending b'^abc$de^abte$f' +INFO:2017-09-16 14:14:17,763:conn1 received b'b' +INFO:2017-09-16 14:14:17,802:conn1 received b'cdbcuf' +INFO:2017-09-16 14:14:18,764:conn1 sending b'xyz^123' +INFO:2017-09-16 14:14:18,764:conn1 received b'234' +INFO:2017-09-16 14:14:19,764:conn1 sending b'25$^ab0000$abab' +INFO:2017-09-16 14:14:19,765:conn1 received b'36bc1111' +INFO:2017-09-16 14:14:19,965:conn1 disconnecting +INFO:2017-09-16 14:14:19,966:conn2 connected... +INFO:2017-09-16 14:14:19,967:conn2 sending b'^abc$de^abte$f' +INFO:2017-09-16 14:14:19,967:conn2 received b'b' +INFO:2017-09-16 14:14:20,006:conn2 received b'cdbcuf' +INFO:2017-09-16 14:14:20,968:conn2 sending b'xyz^123' +INFO:2017-09-16 14:14:20,969:conn2 received b'234' +INFO:2017-09-16 14:14:21,970:conn2 sending b'25$^ab0000$abab' +INFO:2017-09-16 14:14:21,970:conn2 received b'36bc1111' +INFO:2017-09-16 14:14:22,171:conn2 disconnecting +INFO:2017-09-16 14:14:22,171:conn0 connected... +INFO:2017-09-16 14:14:22,172:conn0 sending b'^abc$de^abte$f' +INFO:2017-09-16 14:14:22,172:conn0 received b'b' +INFO:2017-09-16 14:14:22,210:conn0 received b'cdbcuf' +INFO:2017-09-16 14:14:23,173:conn0 sending b'xyz^123' +INFO:2017-09-16 14:14:23,174:conn0 received b'234' +INFO:2017-09-16 14:14:24,175:conn0 sending b'25$^ab0000$abab' +INFO:2017-09-16 14:14:24,176:conn0 received b'36bc1111' +INFO:2017-09-16 14:14:24,376:conn0 disconnecting +``` + +The thing to note here is the connection name: `conn1` managed to get through to the server first, and interacted with it for a while. The next connection - `conn2` - only got through after the first one disconnected, and so on for the third connection. As the logs show, each connection is keeping the server busy for ~2.2 seconds (which is exactly what the artificial delays in the client code add up to), and during this time no other client can connect. + +Clearly, this is not a scalable strategy. In our case, the client incurs the delay leaving the server completely idle for most of the interaction. A smarter server could handle dozens of other clients while the original one is busy on its end (and we'll see how to achieve that later in the series). Even if the delay is on the server side, this delay is often something that doesn't really keep the CPU too busy; for example, looking up information in a database (which is mostly network waiting time for a database server, or disk lookup time for local databases). + +### Summary and next steps + +The goal of presenting this simple sequential server is twofold: + +1. Introduce the problem domain and some basics of socket programming used throughout the series. + +2. Provide motivation for concurrent serving - as the previous section demonstrates, the sequential server doesn't scale beyond very trivial loads and is not an efficient way of using resources, in general. + +Before reading the next posts in the series, make sure you understand the server/client protocol described here and the code for the sequential server. I've written about such simple protocols before; for example,[framing in serial communications][15] and [co-routines as alternatives to state machines][16]. For basics of network programming with sockets, [Beej's guide][17] is not a bad starting point, but for a deeper understanding I'd recommend a book. + +If anything remains unclear, please let me know in comments or by email. On to concurrent servers! + +* * * + + +[[1]][1] The In/Out notation on state transitions denotes a [Mealy machine][2]. + +[[2]][3] The answer is `bcdbcuf23436bc`. + +[[3]][4] With a small difference of an added string of `0000` at the end - the server's answer to this sequence is a signal for the client to disconnect; it's a simplistic handshake that ensures the client had time to receive all of the server's reply. + +-------------------------------------------------------------------------------- + +via: https://eli.thegreenplace.net/2017/concurrent-servers-part-1-introduction/ + +作者:[Eli Bendersky][a] +译者:[译者ID](https://github.com/译者ID) +校对:[校对者ID](https://github.com/校对者ID) + +本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 + +[a]:https://eli.thegreenplace.net/pages/about +[1]:https://eli.thegreenplace.net/2017/concurrent-servers-part-1-introduction/#id1 +[2]:https://en.wikipedia.org/wiki/Mealy_machine +[3]:https://eli.thegreenplace.net/2017/concurrent-servers-part-1-introduction/#id2 +[4]:https://eli.thegreenplace.net/2017/concurrent-servers-part-1-introduction/#id3 +[5]:https://eli.thegreenplace.net/tag/concurrency +[6]:https://eli.thegreenplace.net/tag/c-c +[7]:http://eli.thegreenplace.net/2017/concurrent-servers-part-1-introduction/ +[8]:http://eli.thegreenplace.net/2017/concurrent-servers-part-2-threads/ +[9]:http://eli.thegreenplace.net/2017/concurrent-servers-part-3-event-driven/ +[10]:https://eli.thegreenplace.net/2017/concurrent-servers-part-1-introduction/#id4 +[11]:https://github.com/eliben/code-for-blog/blob/master/2017/async-socket-server/sequential-server.c +[12]:https://eli.thegreenplace.net/2017/concurrent-servers-part-1-introduction/#id5 +[13]:https://github.com/eliben/code-for-blog/tree/master/2017/async-socket-server +[14]:https://eli.thegreenplace.net/2017/concurrent-servers-part-1-introduction/#id6 +[15]:http://eli.thegreenplace.net/2009/08/12/framing-in-serial-communications/ +[16]:http://eli.thegreenplace.net/2009/08/29/co-routines-as-an-alternative-to-state-machines +[17]:http://beej.us/guide/bgnet/ +[18]:https://eli.thegreenplace.net/2017/concurrent-servers-part-1-introduction/ From bd0d707409b0386f7057d62d9833caa672d436c5 Mon Sep 17 00:00:00 2001 From: Ezio Date: Wed, 11 Oct 2017 14:12:03 +0800 Subject: [PATCH 11/79] =?UTF-8?q?20171011-2=20=E9=80=89=E9=A2=98?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- ...004 Concurrent Servers Part 2 - Threads.md | 295 ++++++++++++++++++ 1 file changed, 295 insertions(+) create mode 100644 sources/tech/20171004 Concurrent Servers Part 2 - Threads.md diff --git a/sources/tech/20171004 Concurrent Servers Part 2 - Threads.md b/sources/tech/20171004 Concurrent Servers Part 2 - Threads.md new file mode 100644 index 0000000000..8ac1e9f490 --- /dev/null +++ b/sources/tech/20171004 Concurrent Servers Part 2 - Threads.md @@ -0,0 +1,295 @@ +[Concurrent Servers: Part 2 - Threads][19] +============================================================ +This is part 2 of a series on writing concurrent network servers. [Part 1][20] presented the protocol implemented by the server, as well as the code for a simple sequential server, as a baseline for the series. + +In this part, we're going to look at multi-threading as one approach to concurrency, with a bare-bones threaded server implementation in C, as well as a thread pool based implementation in Python. + +All posts in the series: + +* [Part 1 - Introduction][8] + +* [Part 2 - Threads][9] + +* [Part 3 - Event-driven][10] + +### The multi-threaded approach to concurrent server design + +When discussing the performance of the sequential server in part 1, it was immediately obvious that a lot of compute resources are wasted while the server processes a client connection. Even assuming a client that sends messages immediately and doesn't do any waiting, network communication is still involved; networks tend to be millions (or more) times slower than a modern CPU, so the CPU running the sequential server will spend the vast majority of time in gloriuos boredom waiting for new socket traffic to arrive. + +Here's a chart showing how sequential client processing happens over time: + +![Sequential client-handling flow](https://eli.thegreenplace.net/images/2017/sequential-flow.png) + +The diagrams shows 3 clients. The diamond shapes denote the client's "arrival time" (the time at which the client attempted to connect to the server). The black lines denote "wait time" (the time clients spent waiting for the server to actually accept their connection), and the colored bars denote actual "processing time" (the time server and client are interacting using the protocol). At the end of the colored bar, the client disconnects. + +In the diagram above, even though the green and orange clients arrived shortly after the blue one, they have to wait for a while until the server is done with the blue client. At this point the green client is accepted, while the orange one has to wait even longer. + +A multi-threaded server would launch multiple control threads, letting the OS manage concurrency on the CPU (and across multiple CPU cores). When a client connects, a thread is created to serve it, while the server is ready to accept more clients in the main thread. The time chart for this mode looks like the following: + +![Concurrent client-handling flow](https://eli.thegreenplace.net/images/2017/concurrent-flow.png) + +### One thread per client, in C using pthreads + +Our [first code sample][11] in this post is a simple "one thread per client" server, written in C using the foundational [pthreads API][12] for multi-threading. Here's the main loop: + +``` +while (1) { + struct sockaddr_in peer_addr; + socklen_t peer_addr_len = sizeof(peer_addr); + + int newsockfd = + accept(sockfd, (struct sockaddr*)&peer_addr, &peer_addr_len); + + if (newsockfd < 0) { + perror_die("ERROR on accept"); + } + + report_peer_connected(&peer_addr, peer_addr_len); + pthread_t the_thread; + + thread_config_t* config = (thread_config_t*)malloc(sizeof(*config)); + if (!config) { + die("OOM"); + } + config->sockfd = newsockfd; + pthread_create(&the_thread, NULL, server_thread, config); + + // Detach the thread - when it's done, its resources will be cleaned up. + // Since the main thread lives forever, it will outlive the serving threads. + pthread_detach(the_thread); +} +``` + +And this is the `server_thread` function: + +``` +void* server_thread(void* arg) { + thread_config_t* config = (thread_config_t*)arg; + int sockfd = config->sockfd; + free(config); + + // This cast will work for Linux, but in general casting pthread_id to an + // integral type isn't portable. + unsigned long id = (unsigned long)pthread_self(); + printf("Thread %lu created to handle connection with socket %d\n", id, + sockfd); + serve_connection(sockfd); + printf("Thread %lu done\n", id); + return 0; +} +``` + +The thread "configuration" is passed as a `thread_config_t` structure: + +``` +typedef struct { int sockfd; } thread_config_t; +``` + +The `pthread_create` call in the main loop launches a new thread that runs the `server_thread` function. This thread terminates when `server_thread` returns. In turn, `server_thread` returns when `serve_connection` returns.`serve_connection` is exactly the same function from part 1. + +In part 1 we used a script to launch multiple clients concurrently and observe how the server handles them. Let's do the same with the multithreaded server: + +``` +$ python3.6 simple-client.py -n 3 localhost 9090 +INFO:2017-09-20 06:31:56,632:conn1 connected... +INFO:2017-09-20 06:31:56,632:conn2 connected... +INFO:2017-09-20 06:31:56,632:conn0 connected... +INFO:2017-09-20 06:31:56,632:conn1 sending b'^abc$de^abte$f' +INFO:2017-09-20 06:31:56,632:conn2 sending b'^abc$de^abte$f' +INFO:2017-09-20 06:31:56,632:conn0 sending b'^abc$de^abte$f' +INFO:2017-09-20 06:31:56,633:conn1 received b'b' +INFO:2017-09-20 06:31:56,633:conn2 received b'b' +INFO:2017-09-20 06:31:56,633:conn0 received b'b' +INFO:2017-09-20 06:31:56,670:conn1 received b'cdbcuf' +INFO:2017-09-20 06:31:56,671:conn0 received b'cdbcuf' +INFO:2017-09-20 06:31:56,671:conn2 received b'cdbcuf' +INFO:2017-09-20 06:31:57,634:conn1 sending b'xyz^123' +INFO:2017-09-20 06:31:57,634:conn2 sending b'xyz^123' +INFO:2017-09-20 06:31:57,634:conn1 received b'234' +INFO:2017-09-20 06:31:57,634:conn0 sending b'xyz^123' +INFO:2017-09-20 06:31:57,634:conn2 received b'234' +INFO:2017-09-20 06:31:57,634:conn0 received b'234' +INFO:2017-09-20 06:31:58,635:conn1 sending b'25$^ab0000$abab' +INFO:2017-09-20 06:31:58,635:conn2 sending b'25$^ab0000$abab' +INFO:2017-09-20 06:31:58,636:conn1 received b'36bc1111' +INFO:2017-09-20 06:31:58,636:conn2 received b'36bc1111' +INFO:2017-09-20 06:31:58,637:conn0 sending b'25$^ab0000$abab' +INFO:2017-09-20 06:31:58,637:conn0 received b'36bc1111' +INFO:2017-09-20 06:31:58,836:conn2 disconnecting +INFO:2017-09-20 06:31:58,836:conn1 disconnecting +INFO:2017-09-20 06:31:58,837:conn0 disconnecting +``` + +Indeed, all clients connected at the same time, and their communication with the server occurs concurrently. + +### Challenges with one thread per client + +Even though threads are fairly efficient in terms of resource usage on modern OSes, the approach outlined in the previous section can still present challenges with some workloads. + +Imagine a scenario where many clients are connecting simultaneously, and some of the sessions are long-lived. This means that many threads may be active at the same time in the server. Too many threads can consume a large amount of memory and CPU time just for the context switching [[1]][13]. An alternative way to look at it is as a security problem: this design makes it the server an easy target for a [DoS attack][14] - connect a few 100,000s of clients at the same time and let them all sit idle - this will likely kill the server due to excessive resource usage. + +A larger problem occurs when there's a non-trivial amount of CPU-bound computation the server has to do for each client. In this case, swamping the server is considerably easier - just a few dozen clients can bring a server to its knees. + +For these reasons, it's prudent the do some  _rate-limiting_  on the number of concurrent clients handled by a multi-threaded server. There's a number of ways to do this. The simplest that comes to mind is simply count the number of clients currently connected and restrict that number to some quantity (that was determined by careful benchmarking, hopefully). A variation on this approach that's very popular in concurrent application design is using a  _thread pool_ . + +### Thread pools + +The idea of a [thread pool][15] is simple, yet powerful. The server creates a number of working threads that all expect to get tasks from some queue. This is the "pool". Then, each client connection is dispatched as a task to the pool. As long as there's an idle thread in the pool, it's handed the task. If all the threads in the pool are currently busy, the server blocks until the pool accepts the task (which happens after one of the busy threads finished processing its current task and went back to an idle state). + +Here's a diagram showing a pool of 4 threads, each processing a task. Tasks (client connections in our case) are waiting until one of the threads in the pool is ready to accept new tasks. + +It should be fairly obvious that the thread pool approach provides a rate-limiting mechanism in its very definition. We can decide ahead of time how many threads we want our server to have. Then, this is the maximal number of clients processed concurrently - the rest are waiting until one of the threads becomes free. If we have 8 threads in the pool, 8 is the maximal number of concurrent clients the server handles - even if thousands are attempting to connect simultaneously. + +How do we decide how many threads should be in the pool? By a careful analysis of the problem domain, benchmarking, experimentation and also by the HW we have. If we have a single-core cloud instance that's one answer, if we have a 100-core dual socket server available, the answer is different. Picking the thread pool size can also be done dynamically at runtime based on load - I'll touch upon this topic in future posts in this series. + +Servers that use thread pools manifest  _graceful degradation_  in the face of high load - clients are accepted at some steady rate, potentially slower than their rate of arrival for some periods of time; that said, no matter how many clients are trying to connect simultaneously, the server will remain responsive and will just churn through the backlog of clients to its best ability. Contrast this with the one-thread-per-client server which can merrily accept a large number of clients until it gets overloaded, at which point it's likely to either crash or start working very slowly for  _all_  processed clients due to resource exhaustion (such as virtual memory thrashing). + +### Using a thread pool for our network server + +For [this variation of the server][16] I've switched to Python, which comes with a robust implementation of a thread pool in the standard library (`ThreadPoolExecutor` from the `concurrent.futures` module) [[2]][17]. + +This server creates a thread pool, then loops to accept new clients on the main listening socket. Each connected client is dispatched into the pool with `submit`: + +``` +pool = ThreadPoolExecutor(args.n) +sockobj = socket.socket(socket.AF_INET, socket.SOCK_STREAM) +sockobj.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) +sockobj.bind(('localhost', args.port)) +sockobj.listen(15) + +try: + while True: + client_socket, client_address = sockobj.accept() + pool.submit(serve_connection, client_socket, client_address) +except KeyboardInterrupt as e: + print(e) + sockobj.close() +``` + +The `serve_connection` function is very similar to its C counterpart, serving a single client until the client disconnects, while following our protocol: + +``` +ProcessingState = Enum('ProcessingState', 'WAIT_FOR_MSG IN_MSG') + +def serve_connection(sockobj, client_address): + print('{0} connected'.format(client_address)) + sockobj.sendall(b'*') + state = ProcessingState.WAIT_FOR_MSG + + while True: + try: + buf = sockobj.recv(1024) + if not buf: + break + except IOError as e: + break + for b in buf: + if state == ProcessingState.WAIT_FOR_MSG: + if b == ord(b'^'): + state = ProcessingState.IN_MSG + elif state == ProcessingState.IN_MSG: + if b == ord(b'$'): + state = ProcessingState.WAIT_FOR_MSG + else: + sockobj.send(bytes([b + 1])) + else: + assert False + + print('{0} done'.format(client_address)) + sys.stdout.flush() + sockobj.close() +``` + +Let's see how the thread pool size affects the blocking behavior for multiple concurrent clients. For demonstration purposes, I'll run the threadpool server with a pool size of 2 (only two threads are created to service clients): + +``` +$ python3.6 threadpool-server.py -n 2 +``` + +And in a separate terminal, let's run the client simulator again, with 3 concurrent clients: + +``` +$ python3.6 simple-client.py -n 3 localhost 9090 +INFO:2017-09-22 05:58:52,815:conn1 connected... +INFO:2017-09-22 05:58:52,827:conn0 connected... +INFO:2017-09-22 05:58:52,828:conn1 sending b'^abc$de^abte$f' +INFO:2017-09-22 05:58:52,828:conn0 sending b'^abc$de^abte$f' +INFO:2017-09-22 05:58:52,828:conn1 received b'b' +INFO:2017-09-22 05:58:52,828:conn0 received b'b' +INFO:2017-09-22 05:58:52,867:conn1 received b'cdbcuf' +INFO:2017-09-22 05:58:52,867:conn0 received b'cdbcuf' +INFO:2017-09-22 05:58:53,829:conn1 sending b'xyz^123' +INFO:2017-09-22 05:58:53,829:conn0 sending b'xyz^123' +INFO:2017-09-22 05:58:53,830:conn1 received b'234' +INFO:2017-09-22 05:58:53,831:conn0 received b'2' +INFO:2017-09-22 05:58:53,831:conn0 received b'34' +INFO:2017-09-22 05:58:54,831:conn1 sending b'25$^ab0000$abab' +INFO:2017-09-22 05:58:54,832:conn1 received b'36bc1111' +INFO:2017-09-22 05:58:54,832:conn0 sending b'25$^ab0000$abab' +INFO:2017-09-22 05:58:54,833:conn0 received b'36bc1111' +INFO:2017-09-22 05:58:55,032:conn1 disconnecting +INFO:2017-09-22 05:58:55,032:conn2 connected... +INFO:2017-09-22 05:58:55,033:conn2 sending b'^abc$de^abte$f' +INFO:2017-09-22 05:58:55,033:conn0 disconnecting +INFO:2017-09-22 05:58:55,034:conn2 received b'b' +INFO:2017-09-22 05:58:55,071:conn2 received b'cdbcuf' +INFO:2017-09-22 05:58:56,036:conn2 sending b'xyz^123' +INFO:2017-09-22 05:58:56,036:conn2 received b'234' +INFO:2017-09-22 05:58:57,037:conn2 sending b'25$^ab0000$abab' +INFO:2017-09-22 05:58:57,038:conn2 received b'36bc1111' +INFO:2017-09-22 05:58:57,238:conn2 disconnecting +``` + +Recall the behavior of previously discussed servers: + +1. In the sequential server, all connections were serialized. One finished, and only then the next started. + +2. In the thread-per-client server earlier in this post, all connections wer accepted and serviced concurrently. + +Here we see another possibility: two connections are serviced concurrently, and only when one of them is done the third is admitted. This is a direct result of the thread pool size set to 2\. For a more realistic use case we'd set the thread pool size to much higher, depending on the machine and the exact protocol. This buffering behavior of thread pools is well understood - I've written about it more in detail [just a few months ago][18] in the context of Clojure's `core.async` module. + +### Summary and next steps + +This post discusses multi-threading as a means of concurrency in network servers. The one-thread-per-client approach is presented for an initial discussion, but this method is not common in practice since it's a security hazard. + +Thread pools are much more common, and most popular programming languages have solid implementations (for some, like Python, it's in the standard library). The thread pool server presented here doesn't suffer from the problems of one-thread-per-client. + +However, threads are not the only way to handle multiple clients concurrently. In the next post we're going to look at some solutions using  _asynchronous_ , or  _event-driven_  programming. + +* * * + +[[1]][1] To be fair, modern Linux kernels can tolerate a significant number of concurrent threads - as long as these threads are mostly blocked on I/O, of course. [Here's a sample program][2] that launches a configurable number of threads that sleep in a loop, waking up every 50 ms. On my 4-core Linux machine I can easily launch 10000 threads; even though these threads sleep almost all the time, they still consume between one and two cores for the context switching. Also, they occupy 80 GB of virtual memory (8 MB is the default per-thread stack size for Linux). More realistic threads that actually use memory and not just sleep in a loop can therefore exhaust the physical memory of a machine fairly quickly. + +[[2]][3] Implementing a thread pool from scratch is a fun exercise, but I'll leave it for another day. I've written about hand-rolled [thread pools for specific tasks][4] in the past. That's in Python; doing it in C would be more challenging, but shouldn't take more than a few of hours for an experienced programmer. + +-------------------------------------------------------------------------------- + +via: https://eli.thegreenplace.net/2017/concurrent-servers-part-2-threads/ + +作者:[Eli Bendersky][a] +译者:[译者ID](https://github.com/译者ID) +校对:[校对者ID](https://github.com/校对者ID) + +本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 + +[a]:https://eli.thegreenplace.net/pages/about +[1]:https://eli.thegreenplace.net/2017/concurrent-servers-part-2-threads/#id1 +[2]:https://github.com/eliben/code-for-blog/blob/master/2017/async-socket-server/threadspammer.c +[3]:https://eli.thegreenplace.net/2017/concurrent-servers-part-2-threads/#id2 +[4]:http://eli.thegreenplace.net/2011/12/27/python-threads-communication-and-stopping +[5]:https://eli.thegreenplace.net/tag/concurrency +[6]:https://eli.thegreenplace.net/tag/c-c +[7]:https://eli.thegreenplace.net/tag/python +[8]:http://eli.thegreenplace.net/2017/concurrent-servers-part-1-introduction/ +[9]:http://eli.thegreenplace.net/2017/concurrent-servers-part-2-threads/ +[10]:http://eli.thegreenplace.net/2017/concurrent-servers-part-3-event-driven/ +[11]:https://github.com/eliben/code-for-blog/blob/master/2017/async-socket-server/threaded-server.c +[12]:http://eli.thegreenplace.net/2010/04/05/pthreads-as-a-case-study-of-good-api-design +[13]:https://eli.thegreenplace.net/2017/concurrent-servers-part-2-threads/#id3 +[14]:https://en.wikipedia.org/wiki/Denial-of-service_attack +[15]:https://en.wikipedia.org/wiki/Thread_pool +[16]:https://github.com/eliben/code-for-blog/blob/master/2017/async-socket-server/threadpool-server.py +[17]:https://eli.thegreenplace.net/2017/concurrent-servers-part-2-threads/#id4 +[18]:http://eli.thegreenplace.net/2017/clojure-concurrency-and-blocking-with-coreasync/ +[19]:https://eli.thegreenplace.net/2017/concurrent-servers-part-2-threads/ +[20]:http://eli.thegreenplace.net/2017/concurrent-servers-part-1-introduction/ From 2fae2d0dddd624115c47d5ee7a11b3ea5c55659b Mon Sep 17 00:00:00 2001 From: Ezio Date: Wed, 11 Oct 2017 14:15:24 +0800 Subject: [PATCH 12/79] =?UTF-8?q?20171011-3=20=E9=80=89=E9=A2=98?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- ...oncurrent Servers Part 3 - Event-driven.md | 620 ++++++++++++++++++ 1 file changed, 620 insertions(+) create mode 100644 sources/tech/20171006 Concurrent Servers Part 3 - Event-driven.md diff --git a/sources/tech/20171006 Concurrent Servers Part 3 - Event-driven.md b/sources/tech/20171006 Concurrent Servers Part 3 - Event-driven.md new file mode 100644 index 0000000000..dc8e1ebb75 --- /dev/null +++ b/sources/tech/20171006 Concurrent Servers Part 3 - Event-driven.md @@ -0,0 +1,620 @@ +[Concurrent Servers: Part 3 - Event-driven][25] +============================================================ + +This is part 3 of a series of posts on writing concurrent network servers. [Part 1][26] introduced the series with some building blocks, and [part 2 - Threads][27] discussed multiple threads as one viable approach for concurrency in the server. + +Another common approach to achieve concurrency is called  _event-driven programming_ , or alternatively  _asynchronous_  programming [[1]][28]. The range of variations on this approach is very large, so we're going to start by covering the basics - using some of the fundamental APIs than form the base of most higher-level approaches. Future posts in the series will cover higher-level abstractions, as well as various hybrid approaches. + +All posts in the series: + +* [Part 1 - Introduction][12] + +* [Part 2 - Threads][13] + +* [Part 3 - Event-driven][14] + +### Blocking vs. nonblocking I/O + +As an introduction to the topic, let's talk about the difference between blocking and nonblocking I/O. Blocking I/O is easier to undestand, since this is the "normal" way we're used to I/O APIs working. While receiving data from a socket, a call to `recv`  _blocks_  until some data is received from the peer connected to the other side of the socket. This is precisely the issue with the sequential server of part 1. + +So blocking I/O has an inherent performance problem. We saw one way to tackle this problem in part 2, using multiple threads. As long as one thread is blocked on I/O, other threads can continue using the CPU. In fact, blocking I/O is usually very efficient on resource usage while the thread is waiting - the thread is put to sleep by the OS and only wakes up when whatever it was waiting for is available. + + _Nonblocking_  I/O is a different approach. When a socket is set to nonblocking mode, a call to `recv` (and to `send`, but let's just focus on receiving here) will always return very quickly, even if there's no data to receive. In this case, it will return a special error status [[2]][15] notifying the caller that there's no data to receive at this time. The caller can then go do something else, or try to call `recv` again. + +The difference between blocking and nonblocking `recv` is easiest to demonstrate with a simple code sample. Here's a small program that listens on a socket, continuously blocking on `recv`; when `recv` returns data, the program just reports how many bytes were received [[3]][16]: + +``` +int main(int argc, const char** argv) { + setvbuf(stdout, NULL, _IONBF, 0); + + int portnum = 9988; + if (argc >= 2) { + portnum = atoi(argv[1]); + } + printf("Listening on port %d\n", portnum); + + int sockfd = listen_inet_socket(portnum); + struct sockaddr_in peer_addr; + socklen_t peer_addr_len = sizeof(peer_addr); + + int newsockfd = accept(sockfd, (struct sockaddr*)&peer_addr, &peer_addr_len); + if (newsockfd < 0) { + perror_die("ERROR on accept"); + } + report_peer_connected(&peer_addr, peer_addr_len); + + while (1) { + uint8_t buf[1024]; + printf("Calling recv...\n"); + int len = recv(newsockfd, buf, sizeof buf, 0); + if (len < 0) { + perror_die("recv"); + } else if (len == 0) { + printf("Peer disconnected; I'm done.\n"); + break; + } + printf("recv returned %d bytes\n", len); + } + + close(newsockfd); + close(sockfd); + + return 0; +} +``` + +The main loop repeatedly calls `recv` and reports what it returned (recall that `recv` returns 0 when the peer has disconnected). To try it out, we'll run this program in one terminal, and in a separate terminal connect to it with `nc`, sending a couple of short lines, separated by a delay of a couple of seconds: + +``` +$ nc localhost 9988 +hello # wait for 2 seconds after typing this +socket world +^D # to end the connection> +``` + +The listening program will print the following: + +``` +$ ./blocking-listener 9988 +Listening on port 9988 +peer (localhost, 37284) connected +Calling recv... +recv returned 6 bytes +Calling recv... +recv returned 13 bytes +Calling recv... +Peer disconnected; I'm done. +``` + +Now let's try a nonblocking version of the same listening program. Here it is: + +``` +int main(int argc, const char** argv) { + setvbuf(stdout, NULL, _IONBF, 0); + + int portnum = 9988; + if (argc >= 2) { + portnum = atoi(argv[1]); + } + printf("Listening on port %d\n", portnum); + + int sockfd = listen_inet_socket(portnum); + struct sockaddr_in peer_addr; + socklen_t peer_addr_len = sizeof(peer_addr); + + int newsockfd = accept(sockfd, (struct sockaddr*)&peer_addr, &peer_addr_len); + if (newsockfd < 0) { + perror_die("ERROR on accept"); + } + report_peer_connected(&peer_addr, peer_addr_len); + + // Set nonblocking mode on the socket. + int flags = fcntl(newsockfd, F_GETFL, 0); + if (flags == -1) { + perror_die("fcntl F_GETFL"); + } + + if (fcntl(newsockfd, F_SETFL, flags | O_NONBLOCK) == -1) { + perror_die("fcntl F_SETFL O_NONBLOCK"); + } + + while (1) { + uint8_t buf[1024]; + printf("Calling recv...\n"); + int len = recv(newsockfd, buf, sizeof buf, 0); + if (len < 0) { + if (errno == EAGAIN || errno == EWOULDBLOCK) { + usleep(200 * 1000); + continue; + } + perror_die("recv"); + } else if (len == 0) { + printf("Peer disconnected; I'm done.\n"); + break; + } + printf("recv returned %d bytes\n", len); + } + + close(newsockfd); + close(sockfd); + + return 0; +} +``` + +A couple of notable differences from the blocking version: + +1. The `newsockfd` socket returned by `accept` is set to nonblocking mode by calling `fcntl`. + +2. When examining the return status of `recv`, we check whether `errno` is set to a value saying that no data is available for receiving. In this case we just sleep for 200 milliseconds and continue to the next iteration of the loop. + +The same expermient with `nc` yields the following printout from this nonblocking listener: + +``` +$ ./nonblocking-listener 9988 +Listening on port 9988 +peer (localhost, 37288) connected +Calling recv... +Calling recv... +Calling recv... +Calling recv... +Calling recv... +Calling recv... +Calling recv... +Calling recv... +Calling recv... +recv returned 6 bytes +Calling recv... +Calling recv... +Calling recv... +Calling recv... +Calling recv... +Calling recv... +Calling recv... +Calling recv... +Calling recv... +Calling recv... +Calling recv... +recv returned 13 bytes +Calling recv... +Calling recv... +Calling recv... +Peer disconnected; I'm done. +``` + +As an exercise, add a timestamp to the printouts and convince yourself that the total time elapsed between fruitful calls to `recv` is more or less the delay in typing the lines into `nc` (rounded to the next 200 ms). + +So there we have it - using nonblocking `recv` makes it possible for the listener the check in with the socket, and regain control if no data is available yet. Another word to describe this in the domain of programming is _polling_  - the main program periodically polls the socket for its readiness. + +It may seem like a potential solution to the sequential serving issue. Nonblocking `recv` makes it possible to work with multiple sockets simulatenously, polling them for data and only handling those that have new data. This is true - concurrent servers  _could_  be written this way; but in reality they don't, because the polling approach scales very poorly. + +First, the 200 ms delay I introduced in the code above is nice for the demonstration (the listener prints only a few lines of "Calling recv..." between my typing into `nc` as opposed to thousands), but it also incurs a delay of up to 200 ms to the server's response time, which is almost certainly undesirable. In real programs the delay would have to be much shorter, and the shorter the sleep, the more CPU the process consumes. These are cycles consumed for just waiting, which isn't great, especially on mobile devices where power matters. + +But the bigger problem happens when we actually have to work with multiple sockets this way. Imagine this listener is handling 1000 clients concurrently. This means that in every loop iteration, it has to do a nonblocking `recv` on  _each and every one of those 1000 sockets_ , looking for one which has data ready. This is terribly inefficient, and severely limits the number of clients this server can handle concurrently. There's a catch-22 here: the longer we wait between polls, the less responsive the server is; the shorter we wait, the more CPU resources we burn on useless polling. + +Frankly, all this polling also feels like useless work. Surely somewhere in the OS it is known which socket is actually ready with data, so we don't have to scan all of them. Indeed, it is, and the rest of this post will showcase a couple of APIs that let us handle multiple clients much more gracefully. + +### select + +The `select` system call is a portable (POSIX), venerable part of the standard Unix API. It was designed precisely for the problem described towards the end of the previous section - to allow a single thread to "watch" a non-trivial number of file descriptors [[4]][17] for changes, without needlessly spinning in a polling loop. I don't plan to include a comprehensive tutorial for `select` in this post - there are many websites and book chapters for that - but I will describe its API in the context of the problem we're trying to solve, and will present a fairly complete example. + +`select` enables  _I/O multiplexing_  - monitoring multiple file descriptors to see if I/O is possible on any of them. + +``` +int select(int nfds, fd_set *readfds, fd_set *writefds, + fd_set *exceptfds, struct timeval *timeout); +``` + +`readfds` points to a buffer of file descriptors we're watching for read events; `fd_set` is an opaque data structure users manipulate using `FD_*` macros. `writefds` is the same for write events. `nfds` is the highest file descriptor number (file descriptors are just integers) in the watched buffers. `timeout` lets the user specify how long `select` should block waiting for one of the file descriptors to be ready (`timeout == NULL` means block indefinitely). I'll ignore `exceptfds` for now. + +The contract of calling `select` is as follows: + +1. Prior to the call, the user has to create `fd_set` instances for all the different kinds of descriptors to watch. If we want to watch for both read events and write events, both `readfds` and `writefds` should be created and populated. + +2. The user uses `FD_SET` to set specific descriptors to watch in the set. For example, if we want to watch descriptors 2, 7 and 10 for read events, we call `FD_SET` three times on `readfds`, once for each of 2, 7 and 10. + +3. `select` is called. + +4. When `select` returns (let's ignore timeouts for now), it says how many descriptors in the sets passed to it are ready. It also modifies the `readfds` and `writefds` sets to mark only those descriptors that are ready. All the other descriptors are cleared. + +5. At this point the user has to iterate over `readfds` and `writefds` to find which descriptors are ready (using `FD_ISSET`). + +As a complete example, I've reimplemented our protocol in a concurrent server that uses `select`. The [full code is here][18]; what follows is some highlights from the code, with explanations. Warning: this code sample is fairly substantial - so feel free to skip it on first reading if you're short on time. + +### A concurrent server using select + +Using an I/O multiplexing API like `select` imposes certain constraints on the design of our server; these may not be immediately obvious, but are worth discussing since they are key to understanding what event-driven programming is all about. + +Most importantly, always keep in mind that such an approach is, in its core, single-threaded [[5]][19]. The server really is just doing  _one thing at a time_ . Since we want to handle multiple clients concurrently, we'll have to structure the code in an unusual way. + +First, let's talk about the main loop. How would that look? To answer this question let's imagine our server during a flurry of activity - what should it watch for? Two kinds of socket activities: + +1. New clients trying to connect. These clients should be `accept`-ed. + +2. Existing client sending data. This data has to go through the usual protocol described in [part 1][11], with perhaps some data being sent back. + +Even though these two activities are somewhat different in nature, we'll have to mix them into the same loop, because there can only be one main loop. Our loop will revolve around calls to `select`. This `select` call will watch for the two kinds of events described above. + +Here's the part of the code that sets up the file descriptor sets and kicks off the main loop with a call to `select`: + +``` +// The "master" sets are owned by the loop, tracking which FDs we want to +// monitor for reading and which FDs we want to monitor for writing. +fd_set readfds_master; +FD_ZERO(&readfds_master); +fd_set writefds_master; +FD_ZERO(&writefds_master); + +// The listenting socket is always monitored for read, to detect when new +// peer connections are incoming. +FD_SET(listener_sockfd, &readfds_master); + +// For more efficiency, fdset_max tracks the maximal FD seen so far; this +// makes it unnecessary for select to iterate all the way to FD_SETSIZE on +// every call. +int fdset_max = listener_sockfd; + +while (1) { + // select() modifies the fd_sets passed to it, so we have to pass in copies. + fd_set readfds = readfds_master; + fd_set writefds = writefds_master; + + int nready = select(fdset_max + 1, &readfds, &writefds, NULL, NULL); + if (nready < 0) { + perror_die("select"); + } + ... +``` + +A couple of points of interest here: + +1. Since every call to `select` overwrites the sets given to the function, the caller has to maintain a "master" set to keep track of all the active sockets it monitors across loop iterations. + +2. Note how, initially, the only socket we care about is `listener_sockfd`, which is the original socket on which the server accepts new clients. + +3. The return value of `select` is the number of descriptors that are ready among those in the sets passed as arguments. The sets are modified by `select` to mark ready descriptors. The next step is iterating over the descriptors. + +``` +... +for (int fd = 0; fd <= fdset_max && nready > 0; fd++) { + // Check if this fd became readable. + if (FD_ISSET(fd, &readfds)) { + nready--; + + if (fd == listener_sockfd) { + // The listening socket is ready; this means a new peer is connecting. + ... + } else { + fd_status_t status = on_peer_ready_recv(fd); + if (status.want_read) { + FD_SET(fd, &readfds_master); + } else { + FD_CLR(fd, &readfds_master); + } + if (status.want_write) { + FD_SET(fd, &writefds_master); + } else { + FD_CLR(fd, &writefds_master); + } + if (!status.want_read && !status.want_write) { + printf("socket %d closing\n", fd); + close(fd); + } + } +``` + +This part of the loop checks the  _readable_  descriptors. Let's skip the listener socket (for the full scoop - [read the code][20]) and see what happens when one of the client sockets is ready. When this happens, we call a  _callback_ function named `on_peer_ready_recv` with the file descriptor for the socket. This call means the client connected to that socket sent some data and a call to `recv` on the socket isn't expected to block [[6]][21]. This callback returns a struct of type `fd_status_t`: + +``` +typedef struct { + bool want_read; + bool want_write; +} fd_status_t; +``` + +Which tells the main loop whether the socket should be watched for read events, write events, or both. The code above shows how `FD_SET` and `FD_CLR` are called on the appropriate descriptor sets accordingly. The code for a descriptor being ready for writing in the main loop is similar, except that the callback it invokes is called `on_peer_ready_send`. + +Now it's time to look at the code for the callback itself: + +``` +typedef enum { INITIAL_ACK, WAIT_FOR_MSG, IN_MSG } ProcessingState; + +#define SENDBUF_SIZE 1024 + +typedef struct { + ProcessingState state; + + // sendbuf contains data the server has to send back to the client. The + // on_peer_ready_recv handler populates this buffer, and on_peer_ready_send + // drains it. sendbuf_end points to the last valid byte in the buffer, and + // sendptr at the next byte to send. + uint8_t sendbuf[SENDBUF_SIZE]; + int sendbuf_end; + int sendptr; +} peer_state_t; + +// Each peer is globally identified by the file descriptor (fd) it's connected +// on. As long as the peer is connected, the fd is uqique to it. When a peer +// disconnects, a new peer may connect and get the same fd. on_peer_connected +// should initialize the state properly to remove any trace of the old peer on +// the same fd. +peer_state_t global_state[MAXFDS]; + +fd_status_t on_peer_ready_recv(int sockfd) { + assert(sockfd < MAXFDs); + peer_state_t* peerstate = &global_state[sockfd]; + + if (peerstate->state == INITIAL_ACK || + peerstate->sendptr < peerstate->sendbuf_end) { + // Until the initial ACK has been sent to the peer, there's nothing we + // want to receive. Also, wait until all data staged for sending is sent to + // receive more data. + return fd_status_W; + } + + uint8_t buf[1024]; + int nbytes = recv(sockfd, buf, sizeof buf, 0); + if (nbytes == 0) { + // The peer disconnected. + return fd_status_NORW; + } else if (nbytes < 0) { + if (errno == EAGAIN || errno == EWOULDBLOCK) { + // The socket is not *really* ready for recv; wait until it is. + return fd_status_R; + } else { + perror_die("recv"); + } + } + bool ready_to_send = false; + for (int i = 0; i < nbytes; ++i) { + switch (peerstate->state) { + case INITIAL_ACK: + assert(0 && "can't reach here"); + break; + case WAIT_FOR_MSG: + if (buf[i] == '^') { + peerstate->state = IN_MSG; + } + break; + case IN_MSG: + if (buf[i] == '$') { + peerstate->state = WAIT_FOR_MSG; + } else { + assert(peerstate->sendbuf_end < SENDBUF_SIZE); + peerstate->sendbuf[peerstate->sendbuf_end++] = buf[i] + 1; + ready_to_send = true; + } + break; + } + } + // Report reading readiness iff there's nothing to send to the peer as a + // result of the latest recv. + return (fd_status_t){.want_read = !ready_to_send, + .want_write = ready_to_send}; +} +``` + +A `peer_state_t` is the full state object used to represent a client connection between callback calls from the main loop. Since a callback is invoked on some partial data sent by the client, it cannot assume it will be able to communicate with the client continuously, and it has to run quickly without blocking. It never blocks because the socket is set to non-blocking mode and `recv` will always return quickly. Other than calling `recv`, all this handler does is manipulate the state - there are no additional calls that could potentially block. + +An an exercise, can you figure out why this code needs an extra state? Our servers so far in the series managed with just two states, but this one needs three. + +Let's also have a look at the "socket ready to send" callback: + +``` +fd_status_t on_peer_ready_send(int sockfd) { + assert(sockfd < MAXFDs); + peer_state_t* peerstate = &global_state[sockfd]; + + if (peerstate->sendptr >= peerstate->sendbuf_end) { + // Nothing to send. + return fd_status_RW; + } + int sendlen = peerstate->sendbuf_end - peerstate->sendptr; + int nsent = send(sockfd, peerstate->sendbuf, sendlen, 0); + if (nsent == -1) { + if (errno == EAGAIN || errno == EWOULDBLOCK) { + return fd_status_W; + } else { + perror_die("send"); + } + } + if (nsent < sendlen) { + peerstate->sendptr += nsent; + return fd_status_W; + } else { + // Everything was sent successfully; reset the send queue. + peerstate->sendptr = 0; + peerstate->sendbuf_end = 0; + + // Special-case state transition in if we were in INITIAL_ACK until now. + if (peerstate->state == INITIAL_ACK) { + peerstate->state = WAIT_FOR_MSG; + } + + return fd_status_R; + } +} +``` + +Same here - the callback calls a non-blocking `send` and performs state manipulation. In asynchronous code, it's critical for callbacks to do their work quickly - any delay blocks the main loop from making progress, and thus blocks the whole server from handling other clients. + +Let's once again repeat a run of the server with the script that connects 3 clients simultaneously. In one terminal window we'll run: + +``` +$ ./select-server +``` + +In another: + +``` +$ python3.6 simple-client.py -n 3 localhost 9090 +INFO:2017-09-26 05:29:15,864:conn1 connected... +INFO:2017-09-26 05:29:15,864:conn2 connected... +INFO:2017-09-26 05:29:15,864:conn0 connected... +INFO:2017-09-26 05:29:15,865:conn1 sending b'^abc$de^abte$f' +INFO:2017-09-26 05:29:15,865:conn2 sending b'^abc$de^abte$f' +INFO:2017-09-26 05:29:15,865:conn0 sending b'^abc$de^abte$f' +INFO:2017-09-26 05:29:15,865:conn1 received b'bcdbcuf' +INFO:2017-09-26 05:29:15,865:conn2 received b'bcdbcuf' +INFO:2017-09-26 05:29:15,865:conn0 received b'bcdbcuf' +INFO:2017-09-26 05:29:16,866:conn1 sending b'xyz^123' +INFO:2017-09-26 05:29:16,867:conn0 sending b'xyz^123' +INFO:2017-09-26 05:29:16,867:conn2 sending b'xyz^123' +INFO:2017-09-26 05:29:16,867:conn1 received b'234' +INFO:2017-09-26 05:29:16,868:conn0 received b'234' +INFO:2017-09-26 05:29:16,868:conn2 received b'234' +INFO:2017-09-26 05:29:17,868:conn1 sending b'25$^ab0000$abab' +INFO:2017-09-26 05:29:17,869:conn1 received b'36bc1111' +INFO:2017-09-26 05:29:17,869:conn0 sending b'25$^ab0000$abab' +INFO:2017-09-26 05:29:17,870:conn0 received b'36bc1111' +INFO:2017-09-26 05:29:17,870:conn2 sending b'25$^ab0000$abab' +INFO:2017-09-26 05:29:17,870:conn2 received b'36bc1111' +INFO:2017-09-26 05:29:18,069:conn1 disconnecting +INFO:2017-09-26 05:29:18,070:conn0 disconnecting +INFO:2017-09-26 05:29:18,070:conn2 disconnecting +``` + +Similarly to the threaded case, there's no delay between clients - they are all handled concurrently. And yet, there are no threads in sight in `select-server`! The main loop  _multiplexes_  all the clients by efficient polling of multiple sockets using `select`. Recall the sequential vs. multi-threaded client handling diagrams from [part 2][22]. For our `select-server`, the time flow for three clients looks something like this: + +![Multiplexed client-handling flow](https://eli.thegreenplace.net/images/2017/multiplexed-flow.png) + +All clients are handled concurrently within the same thread, by multiplexing - doing some work for a client, switching to another, then another, then going back to the original client, etc. Note that there's no specific round-robin order here - the clients are handled when they send data to the server, which really depends on the client. + +### Synchronous, asynchronous, event-driven, callback-based + +The `select-server` code sample provides a good background for discussing just what is meant by "asynchronous" programming, and how it relates to event-driven and callback-based programming, because all these terms are common in the (rather inconsistent) discussion of concurrent servers. + +Let's start with a quote from `select`'s man page: + +> select, pselect, FD_CLR, FD_ISSET, FD_SET, FD_ZERO - synchronous I/O multiplexing + +So `select` is for  _synchronous_  multiplexing. But I've just presented a substantial code sample using `select` as an example of an  _asynchronous_  server; what gives? + +The answer is: it depends on your point of view. Synchronous is often used as a synonym for blocking, and the calls to `select` are, indeed, blocking. So are the calls to `send` and `recv` in the sequential and threaded servers presented in parts 1 and 2\. So it is fair to say that `select` is a  _synchronous_  API. However, the server design emerging from the use of `select` is actually  _asynchronous_ , or  _callback-based_ , or  _event-driven_ . Note that the `on_peer_*` functions presented in this post are callbacks; they should never block, and they get invoked due to network events. They can get partial data, and are expected to retain coherent state in-between invocations. + +If you've done any amont of GUI programming in the past, all of this is very familiar. There's an "event loop" that's often entirely hidden in frameworks, and the application's "business logic" is built out of callbacks that get invoked by the event loop due to various events - user mouse clicks, menu selections, timers firing, data arriving on sockets, etc. The most ubiquitous model of programming these days is, of course, client-side Javascript, which is written as a bunch of callbacks invoked by user activity on a web page. + +### The limitations of select + +Using `select` for our first example of an asynchronous server makes sense to present the concept, and also because `select` is such an ubiquitous and portable API. But it also has some significant limitations that manifest when the number of watched file descriptors is very large: + +1. Limited file descriptor set size. + +2. Bad performance. + +Let's start with the file descriptor size. `FD_SETSIZE` is a compile-time constant that's usually equal to 1024 on modern systems. It's hard-coded deep in the guts of `glibc`, and isn't easy to modify. It limits the number of file descriptors a `select` call can watch to 1024\. These days folks want to write servers that handle 10s of thousands of concurrent clients and more, so this problem is real. There are workarounds, but they aren't portable and aren't easy. + +The bad performance issue is a bit more subtle, but still very serious. Note that when `select` returns, the information it provides to the caller is the number of "ready" descriptors, and the updated descriptor sets. The descriptor sets map from desrciptor to "ready/not ready" but they don't provide a way to iterate over all the ready descriptors efficiently. If there's only a single descriptor that is ready in the set, in the worst case the caller has to iterate over  _the entire set_  to find it. This works OK when the number of descriptors watched is small, but if it gets to high numbers this overhead starts hurting [[7]][23]. + +For these reasons `select` has recently fallen out of favor for writing high-performance concurrent servers. Every popular OS has its own, non-portable APIs that permit users to write much more performant event loops; higher-level interfaces like frameworks and high-level languages usually wrap these APIs in a single portable interface. + +### epoll + +As an example, let's look at `epoll`, Linux's solution to the high-volume I/O event notification problem. The key to `epoll`'s efficiency is greater cooperation from the kernel. Instead of using a file descriptor set, `epoll_wait`fills a buffer with events that are currently ready. Only the ready events are added to the buffer, so there is no need to iterate over  _all_  the currently watched file descriptors in the client. This changes the process of discovering which descriptors are ready from O(N) in `select`'s case to O(1). + +A full presentation of the `epoll` API is not the goal here - there are plenty of online resources for that. As you may have guessed, though, I am going to write yet another version of our concurrent server - this time using `epoll` instead of `select`. The full code sample [is here][24]. In fact, since the vast majority of the code is the same as `select-server`, I'll only focus on the novelty - the use of `epoll` in the main loop: + +``` +struct epoll_event accept_event; +accept_event.data.fd = listener_sockfd; +accept_event.events = EPOLLIN; +if (epoll_ctl(epollfd, EPOLL_CTL_ADD, listener_sockfd, &accept_event) < 0) { + perror_die("epoll_ctl EPOLL_CTL_ADD"); +} + +struct epoll_event* events = calloc(MAXFDS, sizeof(struct epoll_event)); +if (events == NULL) { + die("Unable to allocate memory for epoll_events"); +} + +while (1) { + int nready = epoll_wait(epollfd, events, MAXFDS, -1); + for (int i = 0; i < nready; i++) { + if (events[i].events & EPOLLERR) { + perror_die("epoll_wait returned EPOLLERR"); + } + + if (events[i].data.fd == listener_sockfd) { + // The listening socket is ready; this means a new peer is connecting. + ... + } else { + // A peer socket is ready. + if (events[i].events & EPOLLIN) { + // Ready for reading. + ... + } else if (events[i].events & EPOLLOUT) { + // Ready for writing. + ... + } + } + } +} +``` + +We start by configuring `epoll` with a call to `epoll_ctl`. In this case, the configuration amounts to adding the listening socket to the descriptors `epoll` is watching for us. We then allocate a buffer of ready events to pass to `epoll` for modification. The call to `epoll_wait` in the main loop is where the magic's at. It blocks until one of the watched descriptors is ready (or until a timeout expires), and returns the number of ready descriptors. This time, however, instead of blindly iterating over all the watched sets, we know that `epoll_write` populated the `events` buffer passed to it with the ready events, from 0 to `nready-1`, so we iterate only the strictly necessary number of times. + +To reiterate this critical difference from `select`: if we're watching 1000 descriptors and two become ready, `epoll_waits` returns `nready=2` and populates the first two elements of the `events` buffer - so we only "iterate" over two descriptors. With `select` we'd still have to iterate over 1000 descriptors to find out which ones are ready. For this reason `epoll` scales much better than `select` for busy servers with many active sockets. + +The rest of the code is straightforward, since we're already familiar with `select-server`. In fact, all the "business logic" of `epoll-server` is exactly the same as for `select-server` - the callbacks consist of the same code. + +This similarity is tempting to exploit by abstracting away the event loop into a library/framework. I'm going to resist this itch, because so many great programmers succumbed to it in the past. Instead, in the next post we're going to look at `libuv` - one of the more popular event loop abstractions emerging recently. Libraries like `libuv` allow us to write concurrent asynchronous servers without worrying about the greasy details of the underlying system calls. + +* * * + + +[[1]][1] I tried enlightening myself on the actual semantic difference between the two by doing some web browsing and reading, but got a headache fairly quickly. There are many different opinions ranging from "they're the same thing", to "one is a subset of another" to "they're completely different things". When faced with such divergent views on the semantics, it's best to abandon the issue entirely, focusing instead on specific examples and use cases. + +[[2]][2] POSIX mandates that this can be either `EAGAIN` or `EWOULDBLOCK`, and portable applications should check for both. + +[[3]][3] Similarly to all C samples in this series, this code uses some helper utilities to set up listening sockets. The full code for these utilities lives in the `utils` module [in the repository][4]. + +[[4]][5] `select` is not a network/socket-specific function; it watches arbitrary file descriptors, which could be disk files, pipes, terminals, sockets or anything else Unix systems represent with file descriptors. In this post we're focusing on its uses for sockets, of course. + +[[5]][6] There are ways to intermix event-driven programming with multiple threads, but I'll defer this discussion to later in the series. + + +[[6]][7] Due to various non-trivial reasons it could  _still_  block, even after `select` says it's ready. Therefore, all sockets opened by this server are set to nonblocking mode, and if the call to `recv` or `send` returns `EAGAIN` or `EWOULDBLOCK`, the callbacks just assumed no event really happened. Read the code sample comments for more details. + + +[[7]][8] Note that this still isn't as bad as the asynchronous polling example presented earlier in the post. The polling has to happen  _all the time_ , while `select` actually blocks until one or more sockets are ready for reading/writing; far less CPU time is wasted with `select` than with repeated polling. + + +-------------------------------------------------------------------------------- + +via: https://eli.thegreenplace.net/2017/concurrent-servers-part-3-event-driven/ + +作者:[Eli Bendersky][a] +译者:[译者ID](https://github.com/译者ID) +校对:[校对者ID](https://github.com/校对者ID) + +本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 + +[a]:https://eli.thegreenplace.net/pages/about +[1]:https://eli.thegreenplace.net/2017/concurrent-servers-part-3-event-driven/#id1 +[2]:https://eli.thegreenplace.net/2017/concurrent-servers-part-3-event-driven/#id3 +[3]:https://eli.thegreenplace.net/2017/concurrent-servers-part-3-event-driven/#id4 +[4]:https://github.com/eliben/code-for-blog/blob/master/2017/async-socket-server/utils.h +[5]:https://eli.thegreenplace.net/2017/concurrent-servers-part-3-event-driven/#id5 +[6]:https://eli.thegreenplace.net/2017/concurrent-servers-part-3-event-driven/#id6 +[7]:https://eli.thegreenplace.net/2017/concurrent-servers-part-3-event-driven/#id8 +[8]:https://eli.thegreenplace.net/2017/concurrent-servers-part-3-event-driven/#id9 +[9]:https://eli.thegreenplace.net/tag/concurrency +[10]:https://eli.thegreenplace.net/tag/c-c +[11]:http://eli.thegreenplace.net/2017/concurrent-servers-part-1-introduction/ +[12]:http://eli.thegreenplace.net/2017/concurrent-servers-part-1-introduction/ +[13]:http://eli.thegreenplace.net/2017/concurrent-servers-part-2-threads/ +[14]:http://eli.thegreenplace.net/2017/concurrent-servers-part-3-event-driven/ +[15]:https://eli.thegreenplace.net/2017/concurrent-servers-part-3-event-driven/#id11 +[16]:https://eli.thegreenplace.net/2017/concurrent-servers-part-3-event-driven/#id12 +[17]:https://eli.thegreenplace.net/2017/concurrent-servers-part-3-event-driven/#id13 +[18]:https://github.com/eliben/code-for-blog/blob/master/2017/async-socket-server/select-server.c +[19]:https://eli.thegreenplace.net/2017/concurrent-servers-part-3-event-driven/#id14 +[20]:https://github.com/eliben/code-for-blog/blob/master/2017/async-socket-server/select-server.c +[21]:https://eli.thegreenplace.net/2017/concurrent-servers-part-3-event-driven/#id15 +[22]:http://eli.thegreenplace.net/2017/concurrent-servers-part-2-threads/ +[23]:https://eli.thegreenplace.net/2017/concurrent-servers-part-3-event-driven/#id16 +[24]:https://github.com/eliben/code-for-blog/blob/master/2017/async-socket-server/epoll-server.c +[25]:https://eli.thegreenplace.net/2017/concurrent-servers-part-3-event-driven/ +[26]:http://eli.thegreenplace.net/2017/concurrent-servers-part-1-introduction/ +[27]:http://eli.thegreenplace.net/2017/concurrent-servers-part-2-threads/ +[28]:https://eli.thegreenplace.net/2017/concurrent-servers-part-3-event-driven/#id10 From 20209cc8b796efb97e4acc9bf037a36d2b8f1e64 Mon Sep 17 00:00:00 2001 From: Ezio Date: Wed, 11 Oct 2017 14:19:14 +0800 Subject: [PATCH 13/79] =?UTF-8?q?20171011-2=20=E9=80=89=E9=A2=98?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- sources/tech/20171004 Concurrent Servers Part 2 - Threads.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sources/tech/20171004 Concurrent Servers Part 2 - Threads.md b/sources/tech/20171004 Concurrent Servers Part 2 - Threads.md index 8ac1e9f490..e24ef5e8dd 100644 --- a/sources/tech/20171004 Concurrent Servers Part 2 - Threads.md +++ b/sources/tech/20171004 Concurrent Servers Part 2 - Threads.md @@ -138,6 +138,8 @@ The idea of a [thread pool][15] is simple, yet powerful. The server creates a Here's a diagram showing a pool of 4 threads, each processing a task. Tasks (client connections in our case) are waiting until one of the threads in the pool is ready to accept new tasks. +![](https://raw.githubusercontent.com/LCTT/wiki-images/master/TranslateProject/ref_img/006.png) + It should be fairly obvious that the thread pool approach provides a rate-limiting mechanism in its very definition. We can decide ahead of time how many threads we want our server to have. Then, this is the maximal number of clients processed concurrently - the rest are waiting until one of the threads becomes free. If we have 8 threads in the pool, 8 is the maximal number of concurrent clients the server handles - even if thousands are attempting to connect simultaneously. How do we decide how many threads should be in the pool? By a careful analysis of the problem domain, benchmarking, experimentation and also by the HW we have. If we have a single-core cloud instance that's one answer, if we have a 100-core dual socket server available, the answer is different. Picking the thread pool size can also be done dynamically at runtime based on load - I'll touch upon this topic in future posts in this series. From 2d67cfd3d41c229aeab66b1a4b681a5560e4d8f7 Mon Sep 17 00:00:00 2001 From: GitFuture <752736341@qq.com> Date: Wed, 11 Oct 2017 14:28:42 +0800 Subject: [PATCH 14/79] Translating... Concurrent servers --- .../tech/20171002 Concurrent Servers Part 1 - Introduction.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sources/tech/20171002 Concurrent Servers Part 1 - Introduction.md b/sources/tech/20171002 Concurrent Servers Part 1 - Introduction.md index c6a6983b37..d2efe488a2 100644 --- a/sources/tech/20171002 Concurrent Servers Part 1 - Introduction.md +++ b/sources/tech/20171002 Concurrent Servers Part 1 - Introduction.md @@ -1,6 +1,8 @@ [Concurrent Servers: Part 1 - Introduction][18] ============================================================ +GitFuture is Translating + This is the first post in a series about concurrent network servers. My plan is to examine several popular concurrency models for network servers that handle multiple clients simultaneously, and judge those models on scalability and ease of implementation. All servers will listen for socket connections and implement a simple protocol to interact with clients. All posts in the series: From f941b00a8a6dcacdc961a1bdcfd67f3f053f1d91 Mon Sep 17 00:00:00 2001 From: GitFuture <752736341@qq.com> Date: Wed, 11 Oct 2017 14:29:13 +0800 Subject: [PATCH 15/79] Translating... Concurrent servers -2 --- sources/tech/20171004 Concurrent Servers Part 2 - Threads.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/sources/tech/20171004 Concurrent Servers Part 2 - Threads.md b/sources/tech/20171004 Concurrent Servers Part 2 - Threads.md index e24ef5e8dd..655c7ea3da 100644 --- a/sources/tech/20171004 Concurrent Servers Part 2 - Threads.md +++ b/sources/tech/20171004 Concurrent Servers Part 2 - Threads.md @@ -1,5 +1,8 @@ [Concurrent Servers: Part 2 - Threads][19] ============================================================ + +GitFuture is Translating + This is part 2 of a series on writing concurrent network servers. [Part 1][20] presented the protocol implemented by the server, as well as the code for a simple sequential server, as a baseline for the series. In this part, we're going to look at multi-threading as one approach to concurrency, with a bare-bones threaded server implementation in C, as well as a thread pool based implementation in Python. From aec90d07ab4fb5adff3676205e65d28d9ebecdd5 Mon Sep 17 00:00:00 2001 From: GitFuture <752736341@qq.com> Date: Wed, 11 Oct 2017 14:29:46 +0800 Subject: [PATCH 16/79] Translating... Concurrent servers -3 --- .../tech/20171006 Concurrent Servers Part 3 - Event-driven.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sources/tech/20171006 Concurrent Servers Part 3 - Event-driven.md b/sources/tech/20171006 Concurrent Servers Part 3 - Event-driven.md index dc8e1ebb75..4fb433c9cb 100644 --- a/sources/tech/20171006 Concurrent Servers Part 3 - Event-driven.md +++ b/sources/tech/20171006 Concurrent Servers Part 3 - Event-driven.md @@ -1,6 +1,8 @@ [Concurrent Servers: Part 3 - Event-driven][25] ============================================================ +GitFuture is Translating + This is part 3 of a series of posts on writing concurrent network servers. [Part 1][26] introduced the series with some building blocks, and [part 2 - Threads][27] discussed multiple threads as one viable approach for concurrency in the server. Another common approach to achieve concurrency is called  _event-driven programming_ , or alternatively  _asynchronous_  programming [[1]][28]. The range of variations on this approach is very large, so we're going to start by covering the basics - using some of the fundamental APIs than form the base of most higher-level approaches. Future posts in the series will cover higher-level abstractions, as well as various hybrid approaches. From 92e20758c8aeca95404d08d219cf0e9bab861da5 Mon Sep 17 00:00:00 2001 From: Ezio Date: Wed, 11 Oct 2017 20:42:09 +0800 Subject: [PATCH 17/79] =?UTF-8?q?20171011-4=20=E9=80=89=E9=A2=98?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- ...20171008 8 best languages to blog about.md | 351 ++++++++++++++++++ 1 file changed, 351 insertions(+) create mode 100644 sources/tech/20171008 8 best languages to blog about.md diff --git a/sources/tech/20171008 8 best languages to blog about.md b/sources/tech/20171008 8 best languages to blog about.md new file mode 100644 index 0000000000..a3f0c665df --- /dev/null +++ b/sources/tech/20171008 8 best languages to blog about.md @@ -0,0 +1,351 @@ +8 best languages to blog about +============================================================ + + +TL;DR: In this post we’re going to do some metablogging and analyze different blogs popularity against their ranking in Google. All the code is on [GitHub repo][38]. + +### The idea + +I’ve been wondering, how many page views actually do different blogs get daily, as well as what programming languages are most popular today among blog reading audience. It was also interesting to me, whether Google ranking of websites directly correlates with their popularity. + +In order to answer these questions, I decided to make a Scrapy project that will scrape some data and then perform certain Data Analysis and Data Visualization on the obtained information. + +### Part I: Scraping + +We will use [Scrapy][39] for our endeavors, as it provides clean and robust framework for scraping and managing feeds of processed requests. We’ll also use [Splash][40] in order to parse Javascript pages we’ll have to deal with. Splash uses its own Web server that acts like a proxy and processes the Javascript response before redirecting it further to our Spider process. + +I don’t describe Scrapy project setup here as well as Splash integration. You can find example of Scrapy project backbone [here][34] and Scrapy+Splash guide [here][35]. + +### Getting relevant blogs + +The first step is obviously getting the data. We’ll need Google search results about programming blogs. See, if we just start scraping Google itself with, let’s say query “Python”, we’ll get lots of other stuff besides blogs. What we need is some kind of filtering that leaves exclusively blogs in the results set. Luckily, there is a thing called [Google Custom Search Engine][41], that achieves exactly that. There’s also this website [www.blogsearchengine.org][42] that performs exactly what we need, delegating user requests to CSE, so we can look at its queries and repeat them. + +So what we’re going to do is go to [www.blogsearchengine.org][43] and search for “python” having Network tab in Chrome Developer tools open by our side. Here’s the screenshot of what we’re going to see. + +![](https://i1.wp.com/www.databrawl.com/wp-content/uploads/2017/10/CSE_request.png?zoom=1.25&w=750&ssl=1) + +The highlighted query is the one that blogsearchengine delegates to Google, so we’re just going to copy it and use in our scraper. + +The blog scraping spider class would then look like this: + +``` +class BlogsSpider(scrapy.Spider): + name = 'blogs' + allowed_domains = ['cse.google.com'] + + def __init__(self, queries): + super(BlogsSpider, self).__init__() + self.queries = queries +``` + +[view raw][3][blogs.py][4] hosted with  + + by [GitHub][5] + +Unlike typical Scrapy spiders, ours has overridden `__init__` method that accepts additional argument `queries` that specifies the list of queries we want to perform. + +Now, the most important part is the actual query building and execution. This process is performed in the `start_requests` Spider’s method, which we happily override as well: + + +``` + def start_requests(self): + params_dict = { + 'cx': ['partner-pub-9634067433254658:5laonibews6'], + 'cof': ['FORID:10'], + 'ie': ['ISO-8859-1'], + 'q': ['query'], + 'sa.x': ['0'], + 'sa.y': ['0'], + 'sa': ['Search'], + 'ad': ['n9'], + 'num': ['10'], + 'rurl': [ + 'http://www.blogsearchengine.org/search.html?cx=partner-pub' + '-9634067433254658%3A5laonibews6&cof=FORID%3A10&ie=ISO-8859-1&' + 'q=query&sa.x=0&sa.y=0&sa=Search' + ], + 'siteurl': ['http://www.blogsearchengine.org/'] + } + + params = urllib.parse.urlencode(params_dict, doseq=True) + url_template = urllib.parse.urlunparse( + ['https', self.allowed_domains[0], '/cse', + '', params, 'gsc.tab=0&gsc.q=query&gsc.page=page_num']) + for query in self.queries: + for page_num in range(1, 11): + url = url_template.replace('query', urllib.parse.quote(query)) + url = url.replace('page_num', str(page_num)) + yield SplashRequest(url, self.parse, endpoint='render.html', + args={'wait': 0.5}) +``` + +[view raw][6][blogs.py][7] hosted with  + + by [GitHub][8] + +Here you can see quite complex `params_dict` dictionary holding all the parameters of the Google CSE URL we found earlier. We then prepare `url_template` with everything but query and page number filled. We request 10 pages about each programming language, each page contains 10 links, so it’s 100 different blogs for each language to analyze. + +On lines `42-43` we use special `SplashRequest` instead of Scrapy’s own Request class, which wraps internal redirect logic of Splash library, so we don’t have to worry about that. Neat. + +Finally, here’s the parsing routine: + +``` + def parse(self, response): + urls = response.css('div.gs-title.gsc-table-cell-thumbnail') \ + .xpath('./a/@href').extract() + gsc_fragment = urllib.parse.urlparse(response.url).fragment + fragment_dict = urllib.parse.parse_qs(gsc_fragment) + page_num = int(fragment_dict['gsc.page'][0]) + query = fragment_dict['gsc.q'][0] + page_size = len(urls) + for i, url in enumerate(urls): + parsed_url = urllib.parse.urlparse(url) + rank = (page_num - 1) * page_size + i + yield { + 'rank': rank, + 'url': parsed_url.netloc, + 'query': query + } +``` + +[view raw][9][blogs.py][10] hosted with  + + by [GitHub][11] + +The heart and soul of any scraper is parser’s logic. There are multiple ways to understand the response page structure and build the XPath query string. You can use [Scrapy shell][44] to try and adjust your XPath query on the fly, without running a spider. I prefer a more visual method though. It involves Google Chrome’s Developer console again. Simply right-click the element you want to get in your spider and press Inspect. It opens the console with HTML code set to the place where it’s being defined. In our case, we want to get the actual search result links. Their source location looks like this: + +![](https://i0.wp.com/www.databrawl.com/wp-content/uploads/2017/10/result_inspection.png?zoom=1.25&w=750&ssl=1) + +So, after looking at the element description we see that the 
 we’re searching for has `.gsc-table-cell-thumbnail` CSS class and is a child of the `.gs-title` 
, so we put it into the `css`method of response object we have (line `46`). After that, we just need to get the URL of the blog post. It is easily achieved by `'./a/@href'` XPath string, which takes the `href` attribute of tag found as direct child of our 
. + +### Finding traffic data + +The next task is estimating the number of views per day each of the blogs receives. There are [various options][45] to get such data, both free and paid. After quick googling I decided to stick to this simple and free to use website [www.statshow.com][46]. The Spider for this website should take as an input blog URLs we’ve obtained in the previous step, go through them and add traffic information. Spider initialization looks like this: + +``` +class TrafficSpider(scrapy.Spider): + name = 'traffic' + allowed_domains = ['www.statshow.com'] + + def __init__(self, blogs_data): + super(TrafficSpider, self).__init__() + self.blogs_data = blogs_data +``` + +[view raw][12][traffic.py][13] hosted with  + + by [GitHub][14] + +`blogs_data` is expected to be list of dictionaries in the form: `{"rank": 70, "url": "www.stat.washington.edu", "query": "Python"}`. + +Request building function looks like this: + +``` + def start_requests(self): + url_template = urllib.parse.urlunparse( + ['http', self.allowed_domains[0], '/www/{path}', '', '', '']) + for blog in self.blogs_data: + url = url_template.format(path=blog['url']) + request = SplashRequest(url, endpoint='render.html', + args={'wait': 0.5}, meta={'blog': blog}) + yield request +``` + +[view raw][15][traffic.py][16] hosted with  + + by [GitHub][17] + +It’s quite simple, we just add `/www/web-site-url/` string to the `'www.statshow.com'` url. + +Now let’s see how does the parser look: + +``` + def parse(self, response): + site_data = response.xpath('//div[@id="box_1"]/span/text()').extract() + views_data = list(filter(lambda r: '$' not in r, site_data)) + if views_data: + blog_data = response.meta.get('blog') + traffic_data = { + 'daily_page_views': int(views_data[0].translate({ord(','): None})), + 'daily_visitors': int(views_data[1].translate({ord(','): None})) + } + blog_data.update(traffic_data) + yield blog_data +``` + +[view raw][18][traffic.py][19] hosted with  + + by [GitHub][20] + +Similarly to the blog parsing routine, we just make our way through the sample return page of the StatShow and track down the elements containing daily page views and daily visitors. Both of these parameters identify website popularity, so we’ll just pick page views for our analysis. + +### Part II: Analysis + +The next part is analyzing all the data we got after scraping. We then visualize the prepared data sets with the lib called [Bokeh][47]. I don’t give the runner/visualization code here but it can be found in the [GitHub repo][48] in addition to everything else you see in this post. + +The initial result set has few outlying items representing websites with HUGE amount of traffic (such as google.com, linkedin.com, Oracle.com etc.). They obviously shouldn’t be considered. Even if some of those have blogs, they aren’t language specific. That’s why we filter the outliers based on the approach suggested in [this StackOverflow answer][36]. + +### Language popularity comparison + +At first, let’s just make a head-to-head comparison of all the languages we have and see which one has most daily views among the top 100 blogs. + +Here’s the function that can take care of such a task: + + +``` +def get_languages_popularity(data): + query_sorted_data = sorted(data, key=itemgetter('query')) + result = {'languages': [], 'views': []} + popularity = [] + for k, group in groupby(query_sorted_data, key=itemgetter('query')): + group = list(group) + daily_page_views = map(lambda r: int(r['daily_page_views']), group) + total_page_views = sum(daily_page_views) + popularity.append((group[0]['query'], total_page_views)) + sorted_popularity = sorted(popularity, key=itemgetter(1), reverse=True) + languages, views = zip(*sorted_popularity) + result['languages'] = languages + result['views'] = views + return result + +``` + +[view raw][21][analysis.py][22] hosted with  + + by [GitHub][23] + +Here we first group our data by languages (‘query’ key in the dict) and then use python’s `groupby`wonderful function borrowed from SQL to generate groups of items from our data list, each representing some programming language. Afterwards, we calculate total page views for each language on line `14` and then add tuples of the form `('Language', rank)` in the `popularity`list. After the loop, we sort the popularity data based on the total views and unpack these tuples in 2 separate lists and return those in the `result` variable. + +There was some huge deviation in the initial dataset. I checked what was going on and realized that if I make query “C” in the [blogsearchengine.org][37], I get lots of irrelevant links, containing “C” letter somewhere. So, I had to exclude C from the analysis. It almost doesn’t happen with “R” in contrast as well as other C-like names: “C++”, “C#”. + +So, if we remove C from the consideration and look at other languages, we can see the following picture: + +![](https://raw.githubusercontent.com/LCTT/wiki-images/master/TranslateProject/ref_img/8%20best%20languages%20to%20blog%20about%201.png) + +Evaluation. Java made it with over 4 million views daily, PHP and Go have over 2 million, R and JavaScript close up the “million scorers” list. + +### Daily Page Views vs Google Ranking + +Let’s now take a look at the connection between the number of daily views and Google ranking of blogs. Logically, less popular blogs should be further in ranking, It’s not so easy though, as other factors influence ranking as well, for example, if the article in the less popular blog is more recent, it’ll likely pop up first. + +The data preparation is performed in the following fashion: + +``` +def get_languages_popularity(data): + query_sorted_data = sorted(data, key=itemgetter('query')) + result = {'languages': [], 'views': []} + popularity = [] + for k, group in groupby(query_sorted_data, key=itemgetter('query')): + group = list(group) + daily_page_views = map(lambda r: int(r['daily_page_views']), group) + total_page_views = sum(daily_page_views) + popularity.append((group[0]['query'], total_page_views)) + sorted_popularity = sorted(popularity, key=itemgetter(1), reverse=True) + languages, views = zip(*sorted_popularity) + result['languages'] = languages + result['views'] = views + return result +``` + +[view raw][24][analysis.py][25] hosted with  + + by [GitHub][26] + +The function accepts scraped data and list of languages to consider. We sort the data in the same way we did for languages popularity. Afterwards, in a similar language grouping loop, we build `(rank, views_number)` tuples (with 1-based ranks) that are being converted to 2 separate lists. This pair of lists is then written to the resulting dictionary. + +The results for the top 8 GitHub languages (except C) are the following: + +![](https://raw.githubusercontent.com/LCTT/wiki-images/master/TranslateProject/ref_img/8%20best%20languages%20to%20blog%20about%202.png) + +![](https://raw.githubusercontent.com/LCTT/wiki-images/master/TranslateProject/ref_img/8%20best%20languages%20to%20blog%20about%203.png) + +Evaluation. We see that the [PCC (Pearson correlation coefficient)][49] of all graphs is far from 1/-1, which signifies lack of correlation between the daily views and the ranking. It’s important to note though that in most of the graphs (7 out of 8) the correlation is negative, which means that decrease in ranking leads to decrease in views indeed. + +### Conclusion + +So, according to our analysis, Java is by far most popular programming language, followed by PHP, Go, R and JavaScript. Neither of top 8 languages has a strong correlation between daily views and ranking in Google, so you can definitely get high in search results even if you’re just starting your blogging path. What exactly is required for that top hit a topic for another discussion though. + +These results are quite biased and can’t be taken into consideration without additional analysis. At first, it would be a good idea to collect more traffic feeds for an extended period of time and then analyze the mean (median?) values of daily views and rankings. Maybe I’ll return to it sometime in the future. + +### References + +1. Scraping: + +1. [blog.scrapinghub.com: Handling Javascript In Scrapy With Splash][27] + +2. [BlogSearchEngine.org][28] + +3. [twingly.com: Twingly Real-Time Blog Search][29] + +4. [searchblogspot.com: finding blogs on blogspot platform][30] + +3. Traffic estimation: + +1. [labnol.org: Find Out How Much Traffic a Website Gets][31] + +2. [quora.com: What are the best free tools that estimate visitor traffic…][32] + +3. [StatShow.com: The Stats Maker][33] + +-------------------------------------------------------------------------------- + +via: https://www.databrawl.com/2017/10/08/blog-analysis/ + +作者:[Serge Mosin ][a] +译者:[译者ID](https://github.com/译者ID) +校对:[校对者ID](https://github.com/校对者ID) + +本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 + +[a]:https://www.databrawl.com/author/svmosingmail-com/ +[1]:https://bokeh.pydata.org/ +[2]:https://bokeh.pydata.org/ +[3]:https://gist.github.com/Greyvend/f730ccd5dc1e7eacc4f27b0c9da86eee/raw/4ebb94aa41e9ab25fc79af26b49272b2eff47e00/blogs.py +[4]:https://gist.github.com/Greyvend/f730ccd5dc1e7eacc4f27b0c9da86eee#file-blogs-py +[5]:https://github.com/ +[6]:https://gist.github.com/Greyvend/f730ccd5dc1e7eacc4f27b0c9da86eee/raw/4ebb94aa41e9ab25fc79af26b49272b2eff47e00/blogs.py +[7]:https://gist.github.com/Greyvend/f730ccd5dc1e7eacc4f27b0c9da86eee#file-blogs-py +[8]:https://github.com/ +[9]:https://gist.github.com/Greyvend/f730ccd5dc1e7eacc4f27b0c9da86eee/raw/4ebb94aa41e9ab25fc79af26b49272b2eff47e00/blogs.py +[10]:https://gist.github.com/Greyvend/f730ccd5dc1e7eacc4f27b0c9da86eee#file-blogs-py +[11]:https://github.com/ +[12]:https://gist.github.com/Greyvend/f730ccd5dc1e7eacc4f27b0c9da86eee/raw/4ebb94aa41e9ab25fc79af26b49272b2eff47e00/traffic.py +[13]:https://gist.github.com/Greyvend/f730ccd5dc1e7eacc4f27b0c9da86eee#file-traffic-py +[14]:https://github.com/ +[15]:https://gist.github.com/Greyvend/f730ccd5dc1e7eacc4f27b0c9da86eee/raw/4ebb94aa41e9ab25fc79af26b49272b2eff47e00/traffic.py +[16]:https://gist.github.com/Greyvend/f730ccd5dc1e7eacc4f27b0c9da86eee#file-traffic-py +[17]:https://github.com/ +[18]:https://gist.github.com/Greyvend/f730ccd5dc1e7eacc4f27b0c9da86eee/raw/4ebb94aa41e9ab25fc79af26b49272b2eff47e00/traffic.py +[19]:https://gist.github.com/Greyvend/f730ccd5dc1e7eacc4f27b0c9da86eee#file-traffic-py +[20]:https://github.com/ +[21]:https://gist.github.com/Greyvend/f730ccd5dc1e7eacc4f27b0c9da86eee/raw/4ebb94aa41e9ab25fc79af26b49272b2eff47e00/analysis.py +[22]:https://gist.github.com/Greyvend/f730ccd5dc1e7eacc4f27b0c9da86eee#file-analysis-py +[23]:https://github.com/ +[24]:https://gist.github.com/Greyvend/f730ccd5dc1e7eacc4f27b0c9da86eee/raw/4ebb94aa41e9ab25fc79af26b49272b2eff47e00/analysis.py +[25]:https://gist.github.com/Greyvend/f730ccd5dc1e7eacc4f27b0c9da86eee#file-analysis-py +[26]:https://github.com/ +[27]:https://blog.scrapinghub.com/2015/03/02/handling-javascript-in-scrapy-with-splash/ +[28]:http://www.blogsearchengine.org/ +[29]:https://www.twingly.com/ +[30]:http://www.searchblogspot.com/ +[31]:https://www.labnol.org/internet/find-website-traffic-hits/8008/ +[32]:https://www.quora.com/What-are-the-best-free-tools-that-estimate-visitor-traffic-for-a-given-page-on-a-particular-website-that-you-do-not-own-or-operate-3rd-party-sites +[33]:http://www.statshow.com/ +[34]:https://docs.scrapy.org/en/latest/intro/tutorial.html +[35]:https://blog.scrapinghub.com/2015/03/02/handling-javascript-in-scrapy-with-splash/ +[36]:https://stackoverflow.com/a/16562028/1573766 +[37]:http://blogsearchengine.org/ +[38]:https://github.com/Databrawl/blog_analysis +[39]:https://scrapy.org/ +[40]:https://github.com/scrapinghub/splash +[41]:https://en.wikipedia.org/wiki/Google_Custom_Search +[42]:http://www.blogsearchengine.org/ +[43]:http://www.blogsearchengine.org/ +[44]:https://doc.scrapy.org/en/latest/topics/shell.html +[45]:https://www.labnol.org/internet/find-website-traffic-hits/8008/ +[46]:http://www.statshow.com/ +[47]:https://bokeh.pydata.org/en/latest/ +[48]:https://github.com/Databrawl/blog_analysis +[49]:https://en.wikipedia.org/wiki/Pearson_correlation_coefficient +[50]:https://www.databrawl.com/author/svmosingmail-com/ +[51]:https://www.databrawl.com/2017/10/08/ From fdd02cbdaf1febeb1e81338b2769b6a28302d91b Mon Sep 17 00:00:00 2001 From: Ezio Date: Wed, 11 Oct 2017 20:44:36 +0800 Subject: [PATCH 18/79] =?UTF-8?q?20171011-5=20=E9=80=89=E9=A2=98?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- ...009 Considering Pythons Target Audience.md | 283 ++++++++++++++++++ 1 file changed, 283 insertions(+) create mode 100644 sources/tech/20171009 Considering Pythons Target Audience.md diff --git a/sources/tech/20171009 Considering Pythons Target Audience.md b/sources/tech/20171009 Considering Pythons Target Audience.md new file mode 100644 index 0000000000..8ca5c86be7 --- /dev/null +++ b/sources/tech/20171009 Considering Pythons Target Audience.md @@ -0,0 +1,283 @@ +[Considering Python's Target Audience][40] +============================================================ + +Who is Python being designed for? + +* [Use cases for Python's reference interpreter][8] + +* [Which audience does CPython primarily serve?][9] + +* [Why is this relevant to anything?][10] + +* [Where does PyPI fit into the picture?][11] + +* [Why are some APIs changed when adding them to the standard library?][12] + +* [Why are some APIs added in provisional form?][13] + +* [Why are only some standard library APIs upgraded?][14] + +* [Will any parts of the standard library ever be independently versioned?][15] + +* [Why do these considerations matter?][16] + +Several years ago, I [highlighted][38] "CPython moves both too fast and too slowly" as one of the more common causes of conflict both within the python-dev mailing list, as well as between the active CPython core developers and folks that decide that participating in that process wouldn't be an effective use of their personal time and energy. + +I still consider that to be the case, but it's also a point I've spent a lot of time reflecting on in the intervening years, as I wrote that original article while I was still working for Boeing Defence Australia. The following month, I left Boeing for Red Hat Asia-Pacific, and started gaining a redistributor level perspective on [open source supply chain management][39] in large enterprises. + +### [Use cases for Python's reference interpreter][17] + +While it's a gross oversimplification, I tend to break down CPython's use cases as follows (note that these categories aren't fully distinct, they're just aimed at focusing my thinking on different factors influencing the rollout of new software features and versions): + +* Education: educator's main interest is in teaching ways of modelling and manipulating the world computationally,  _not_  writing or maintaining production software). Examples: + * Australia's [Digital Curriculum][1] + + * Lorena A. Barba's [AeroPython][2] + +* Personal automation & hobby projects: software where the main, and often only, user is the individual that wrote it. Examples: + * my Digital Blasphemy [image download notebook][3] + + * Paul Fenwick's (Inter)National [Rick Astley Hotline][4] + +* Organisational process automation: software where the main, and often only, user is the organisation it was originally written to benefit. Examples: + * CPython's [core workflow tools][5] + + * Development, build & release management tooling for Linux distros + +* Set-and-forget infrastructure: software where, for sometimes debatable reasons, in-life upgrades to the software itself are nigh impossible, but upgrades to the underlying platform may be feasible. Examples: + * most self-managed corporate and institutional infrastructure (where properly funded sustaining engineering plans are disturbingly rare) + + * grant funded software (where maintenance typically ends when the initial grant runs out) + + * software with strict certification requirements (where recertification is too expensive for routine updates to be economically viable unless absolutely essential) + + * Embedded software systems without auto-upgrade capabilities + +* Continuously upgraded infrastructure: software with a robust sustaining engineering model, where dependency and platform upgrades are considered routine, and no more concerning than any other code change. Examples: + * Facebook's Python service infrastructure + + * Rolling release Linux distributions + + * most public PaaS and serverless environments (Heroku, OpenShift, AWS Lambda, Google Cloud Functions, Azure Cloud Functions, etc) + +* Intermittently upgraded standard operating environments: environments that do carry out routine upgrades to their core components, but those upgrades occur on a cycle measured in years, rather than weeks or months. Examples: + * [VFX Platform][6] + + * LTS Linux distributions + + * CPython and the Python standard library + + * Infrastructure management & orchestration tools (e.g. OpenStack, Ansible) + + * Hardware control systems + +* Ephemeral software: software that tends to be used once and then discarded or ignored, rather than being subsequently upgraded in place. Examples: + * Ad hoc automation scripts + + * Single-player games with a defined "end" (once you've finished them, even if you forget to uninstall them, you probably won't reinstall them on a new device) + + * Single-player games with little or no persistent state (if you uninstall and reinstall them, it doesn't change much about your play experience) + + * Event-specific applications (the application was tied to a specific physical event, and once the event is over, that app doesn't matter any more) + +* Regular use applications: software that tends to be regularly upgraded after deployment. Examples: + * Business management software + + * Personal & professional productivity applications (e.g. Blender) + + * Developer tools & services (e.g. Mercurial, Buildbot, Roundup) + + * Multi-player games, and other games with significant persistent state, but no real defined "end" + + * Embedded software systems with auto-upgrade capabilities + +* Shared abstraction layers: software components that are designed to make it possible to work effectively in a particular problem domain even if you don't personally grasp all the intricacies of that domain yet. Examples: + * most runtime libraries and frameworks fall into this category (e.g. Django, Flask, Pyramid, SQL Alchemy, NumPy, SciPy, requests) + + * many testing and type inference tools also fit here (e.g. pytest, Hypothesis, vcrpy, behave, mypy) + + * plugins for other applications (e.g. Blender plugins, OpenStack hardware adapters) + + * the standard library itself represents the baseline "world according to Python" (and that's an [incredibly complex][7] world view) + +### [Which audience does CPython primarily serve?][18] + +Ultimately, the main audiences that CPython and the standard library specifically serve are those that, for whatever reason, aren't adequately served by the combination of a more limited standard library and the installation of explicitly declared third party dependencies from PyPI. + +To oversimplify the above review of different usage and deployment models even further, it's possible to summarise the single largest split in Python's user base as the one between those that are using Python as a  _scripting language_  for some environment of interest, and those that are using it as an  _application development language_ , where the eventual artifact that will be distributed is something other than the script that they're working on. + +Typical developer behaviours when using Python as a scripting language include: + +* the main working unit consists of a single Python file (or Jupyter notebook!), rather than a directory of Python and metadata files + +* there's no separate build step of any kind - the script is distributed  _as_  a script, similar to the way standalone shell scripts are distributed + +* there's no separate install step (other than downloading the file to an appropriate location), as it is expected that the required runtime environment will be preconfigured on the destination system + +* no explicit dependencies stated, except perhaps a minimum Python version, or else a statement of the expected execution environment. If dependencies outside the standard library are needed, they're expected to be provided by the environment being scripted (whether that's an operating system, a data analysis platform, or an application that embeds a Python runtime) + +* no separate test suite, with the main test of correctness being "Did the script do what you wanted it to do with the input that you gave it?" + +* if testing prior to live execution is needed, it will be in the form of a "dry run" or "preview" mode that conveys to the user what the software  _would_  do if run that way + +* if static code analysis tools are used at all, it's via integration into the user's software development environment, rather than being set up separately for each individual script + +By contrast, typical developer behaviours when using Python as an application development language include: + +* the main working unit consists of a directory of Python and metadata files, rather than a single Python file + +* these is a separate build step to prepare the application for publication, even if it's just bundling the files together into a Python sdist, wheel or zipapp archive + +* whether there's a separate install step to prepare the application for use will depend on how the application is packaged, and what the supported target environments are + +* external dependencies are expressed in a metadata file, either directly in the project directory (e.g. `pyproject.toml`, `requirements.txt`, `Pipfile`), or as part of the generated publication archive (e.g. `setup.py`, `flit.ini`) + +* a separate test suite exists, either as unit tests for the Python API, integration tests for the functional interfaces, or a combination of the two + +* usage of static analysis tools is configured at the project level as part of its testing regime, rather than being dependent on + +As a result of that split, the main purpose that CPython and the standard library end up serving is to define the redistributor independent baseline of assumed functionality for educational and ad hoc Python scripting environments 3-5 years after the corresponding CPython feature release. + +For ad hoc scripting use cases, that 3-5 year latency stems from a combination of delays in redistributors making new releases available to their users, and users of those redistributed versions taking time to revise their standard operating environments. + +In the case of educational environments, educators need that kind of time to review the new features and decide whether or not to incorporate them into the courses they offer their students. + +### [Why is this relevant to anything?][19] + +This post was largely inspired by the Twitter discussion following on from [this comment of mine][20] citing the Provisional API status defined in [PEP 411][21] as an example of an open source project issuing a de facto invitation to users to participate more actively in the design & development process as co-creators, rather than only passively consuming already final designs. + +The responses included several expressions of frustration regarding the difficulty of supporting provisional APIs in higher level libraries, without those libraries making the provisional status transitive, and hence limiting support for any related features to only the latest version of the provisional API, and not any of the earlier iterations. + +My [main reaction][22] was to suggest that open source publishers should impose whatever support limitations they need to impose to make their ongoing maintenance efforts personally sustainable. That means that if supporting older iterations of provisional APIs is a pain, then they should only be supported if the project developers themselves need that, or if somebody is paying them for the inconvenience. This is similar to my view on whether or not volunteer-driven projects should support older commercial LTS Python releases for free when it's a hassle for them to do: I [don't think they should][23], as I expect most such demands to be stemming from poorly managed institutional inertia, rather than from genuine need (and if the need  _is_  genuine, then it should instead be possible to find some means of paying to have it addressed). + +However, my [second reaction][24], was to realise that even though I've touched on this topic over the years (e.g. in the original 2011 article linked above, as well as in Python 3 Q & A answers [here][25], [here][26], and [here][27], and to a lesser degree in last year's article on the [Python Packaging Ecosystem][28]), I've never really attempted to directly explain the impact it has on the standard library design process. + +And without that background, some aspects of the design process, such as the introduction of provisional APIs, or the introduction of inspired-by-but-not-the-same-as, seem completely nonsensical, as they appear to be an attempt to standardise APIs without actually standardising them. + +### [Where does PyPI fit into the picture?][29] + +The first hurdle that  _any_  proposal sent to python-ideas or python-dev has to clear is answering the question "Why isn't a module on PyPI good enough?". The vast majority of proposals fail at this step, but there are several common themes for getting past it: + +* rather than downloading a suitable third party library, novices may be prone to copying & pasting bad advice from the internet at large (e.g. this is why the `secrets` library now exists: to make it less likely people will use the `random` module, which is intended for games and statistical simulations, for security-sensitive purposes) + +* the module is intended to provide a reference implementation and to enable interoperability between otherwise competing implementations, rather than necessarily being all things to all people (e.g. `asyncio`, `wsgiref`, `unittest``, and `logging` all fall into this category) + +* the module is intended for use in other parts of the standard library (e.g. `enum` falls into this category, as does `unittest`) + +* the module is designed to support a syntactic addition to the language (e.g. the `contextlib`, `asyncio` and `typing` modules fall into this category) + +* the module is just plain useful for ad hoc scripting purposes (e.g. `pathlib`, and `ipaddress` fall into this category) + +* the module is useful in an educational context (e.g. the `statistics` module allows for interactive exploration of statistic concepts, even if you wouldn't necessarily want to use it for full-fledged statistical analysis) + +Passing this initial "Is PyPI obviously good enough?" check isn't enough to ensure that a module will be accepted for inclusion into the standard library, but it's enough to shift the question to become "Would including the proposed library result in a net improvement to the typical introductory Python software developer experience over the next few years?" + +The introduction of `ensurepip` and `venv` modules into the standard library also makes it clear to redistributors that we expect Python level packaging and installation tools to be supported in addition to any platform specific distribution mechanisms. + +### [Why are some APIs changed when adding them to the standard library?][30] + +While existing third party modules are sometimes adopted wholesale into the standard library, in other cases, what actually gets added is a redesigned and reimplemented API that draws on the user experience of the existing API, but drops or revises some details based on the additional design considerations and privileges that go with being part of the language's reference implementation. + +For example, unlike its popular third party predecessor, `path.py`, ``pathlib` does  _not_  define string subclasses, but instead independent types. Solving the resulting interoperability challenges led to the definition of the filesystem path protocol, allowing a wider range of objects to be used with interfaces that work with filesystem paths. + +The API design for the `ipaddress` module was adjusted to explicitly separate host interface definitions (IP addresses associated with particular IP networks) from the definitions of addresses and networks in order to serve as a better tool for teaching IP addressing concepts, whereas the original `ipaddr` module is less strict in the way it uses networking terminology. + +In other cases, standard library modules are constructed as a synthesis of multiple existing approaches, and may also rely on syntactic features that didn't exist when the APIs for pre-existing libraries were defined. Both of these considerations apply for the `asyncio` and `typing` modules, while the latter consideration applies for the `dataclasses` API being considered in PEP 557 (which can be summarised as "like attrs, but using variable annotations for field declarations"). + +The working theory for these kinds of changes is that the existing libraries aren't going away, and their maintainers often aren't all that interested in putitng up with the constraints associated with standard library maintenance (in particular, the relatively slow release cadence). In such cases, it's fairly common for the documentation of the standard library version to feature a "See Also" link pointing to the original module, especially if the third party version offers additional features and flexibility that were omitted from the standard library module. + +### [Why are some APIs added in provisional form?][31] + +While CPython does maintain an API deprecation policy, we generally prefer not to use it without a compelling justification (this is especially the case while other projects are attempting to maintain compatibility with Python 2.7). + +However, when adding new APIs that are inspired by existing third party ones without being exact copies of them, there's a higher than usual risk that some of the design decisions may turn out to be problematic in practice. + +When we consider the risk of such changes to be higher than usual, we'll mark the related APIs as provisional, indicating that conservative end users may want to avoid relying on them at all, and that developers of shared abstraction layers may want to consider imposing stricter than usual constraints on which versions of the provisional API they're prepared to support. + +### [Why are only some standard library APIs upgraded?][32] + +The short answer here is that the main APIs that get upgraded are those where: + +* there isn't likely to be a lot of external churn driving additional updates + +* there are clear benefits for either ad hoc scripting use cases or else in encouraging future interoperability between multiple third party solutions + +* a credible proposal is submitted by folks interested in doing the work + +If the limitations of an existing module are mainly noticeable when using the module for application development purposes (e.g. `datetime`), if redistributors already tend to make an improved alternative third party option readily available (e.g. `requests`), or if there's a genuine conflict between the release cadence of the standard library and the needs of the package in question (e.g. `certifi`), then the incentives to propose a change to the standard library version tend to be significantly reduced. + +This is essentially the inverse to the question about PyPI above: since PyPI usually  _is_  a sufficiently good distribution mechanism for application developer experience enhancements, it makes sense for such enhancements to be distributed that way, allowing redistributors and platform providers to make their own decisions about what they want to include as part of their default offering. + +Changing CPython and the standard library only comes into play when there is perceived value in changing the capabilities that can be assumed to be present by default in 3-5 years time. + +### [Will any parts of the standard library ever be independently versioned?][33] + +Yes, it's likely the bundling model used for `ensurepip` (where CPython releases bundle a recent version of `pip` without actually making it part of the standard library) may be applied to other modules in the future. + +The most probable first candidate for that treatment would be the `distutils` build system, as switching to such a model would allow the build system to be more readily kept consistent across multiple releases. + +Other potential candidates for this kind of treatment would be the Tcl/Tk graphics bindings, and the IDLE editor, which are already unbundled and turned into an optional addon installations by a number of redistributors. + +### [Why do these considerations matter?][34] + +By the very nature of things, the folks that tend to be most actively involved in open source development are those folks working on open source applications and shared abstraction layers. + +The folks writing ad hoc scripts or designing educational exercises for their students often won't even think of themselves as software developers - they're teachers, system administrators, data analysts, quants, epidemiologists, physicists, biologists, business analysts, market researchers, animators, graphical designers, etc. + +When all we have to worry about for a language is the application developer experience, then we can make a lot of simplifying assumptions around what people know, the kinds of tools they're using, the kinds of development processes they're following, and the ways they're going to be building and deploying their software. + +Things get significantly more complicated when an application runtime  _also_  enjoys broad popularity as a scripting engine. Doing either job well is already difficult, and balancing the needs of both audiences as part of a single project leads to frequent incomprehension and disbelief on both sides. + +This post isn't intended to claim that we never make incorrect decisions as part of the CPython development process - it's merely pointing out that the most reasonable reaction to seemingly nonsensical feature additions to the Python standard library is going to be "I'm not part of the intended target audience for that addition" rather than "I have no interest in that, so it must be a useless and pointless addition of no value to anyone, added purely to annoy me". + +-------------------------------------------------------------------------------- + +via: http://www.curiousefficiency.org/posts/2017/10/considering-pythons-target-audience.html + +作者:[Nick Coghlan ][a] +译者:[译者ID](https://github.com/译者ID) +校对:[校对者ID](https://github.com/校对者ID) + +本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 + +[a]:http://www.curiousefficiency.org/pages/about.html +[1]:https://aca.edu.au/#home-unpack +[2]:https://github.com/barbagroup/AeroPython +[3]:https://nbviewer.jupyter.org/urls/bitbucket.org/ncoghlan/misc/raw/default/notebooks/Digital%20Blasphemy.ipynb +[4]:https://github.com/pjf/rickastley +[5]:https://github.com/python/core-workflow +[6]:http://www.vfxplatform.com/ +[7]:http://www.curiousefficiency.org/posts/2015/10/languages-to-improve-your-python.html#broadening-our-horizons +[8]:http://www.curiousefficiency.org/posts/2017/10/considering-pythons-target-audience.html#use-cases-for-python-s-reference-interpreter +[9]:http://www.curiousefficiency.org/posts/2017/10/considering-pythons-target-audience.html#which-audience-does-cpython-primarily-serve +[10]:http://www.curiousefficiency.org/posts/2017/10/considering-pythons-target-audience.html#why-is-this-relevant-to-anything +[11]:http://www.curiousefficiency.org/posts/2017/10/considering-pythons-target-audience.html#where-does-pypi-fit-into-the-picture +[12]:http://www.curiousefficiency.org/posts/2017/10/considering-pythons-target-audience.html#why-are-some-apis-changed-when-adding-them-to-the-standard-library +[13]:http://www.curiousefficiency.org/posts/2017/10/considering-pythons-target-audience.html#why-are-some-apis-added-in-provisional-form +[14]:http://www.curiousefficiency.org/posts/2017/10/considering-pythons-target-audience.html#why-are-only-some-standard-library-apis-upgraded +[15]:http://www.curiousefficiency.org/posts/2017/10/considering-pythons-target-audience.html#will-any-parts-of-the-standard-library-ever-be-independently-versioned +[16]:http://www.curiousefficiency.org/posts/2017/10/considering-pythons-target-audience.html#why-do-these-considerations-matter +[17]:http://www.curiousefficiency.org/posts/2017/10/considering-pythons-target-audience.html#id1 +[18]:http://www.curiousefficiency.org/posts/2017/10/considering-pythons-target-audience.html#id2 +[19]:http://www.curiousefficiency.org/posts/2017/10/considering-pythons-target-audience.html#id3 +[20]:https://twitter.com/ncoghlan_dev/status/916994106819088384 +[21]:https://www.python.org/dev/peps/pep-0411/ +[22]:https://twitter.com/ncoghlan_dev/status/917092464355241984 +[23]:http://www.curiousefficiency.org/posts/2015/04/stop-supporting-python26.html +[24]:https://twitter.com/ncoghlan_dev/status/917088410162012160 +[25]:http://python-notes.curiousefficiency.org/en/latest/python3/questions_and_answers.html#wouldn-t-a-python-2-8-release-help-ease-the-transition +[26]:http://python-notes.curiousefficiency.org/en/latest/python3/questions_and_answers.html#doesn-t-this-make-python-look-like-an-immature-and-unstable-platform +[27]:http://python-notes.curiousefficiency.org/en/latest/python3/questions_and_answers.html#what-about-insert-other-shiny-new-feature-here +[28]:http://www.curiousefficiency.org/posts/2016/09/python-packaging-ecosystem.html +[29]:http://www.curiousefficiency.org/posts/2017/10/considering-pythons-target-audience.html#id4 +[30]:http://www.curiousefficiency.org/posts/2017/10/considering-pythons-target-audience.html#id5 +[31]:http://www.curiousefficiency.org/posts/2017/10/considering-pythons-target-audience.html#id6 +[32]:http://www.curiousefficiency.org/posts/2017/10/considering-pythons-target-audience.html#id7 +[33]:http://www.curiousefficiency.org/posts/2017/10/considering-pythons-target-audience.html#id8 +[34]:http://www.curiousefficiency.org/posts/2017/10/considering-pythons-target-audience.html#id9 +[35]:http://www.curiousefficiency.org/posts/2017/10/considering-pythons-target-audience.html# +[36]:http://www.curiousefficiency.org/posts/2017/10/considering-pythons-target-audience.html#disqus_thread +[37]:http://www.curiousefficiency.org/posts/2017/10/considering-pythons-target-audience.rst +[38]:http://www.curiousefficiency.org/posts/2011/04/musings-on-culture-of-python-dev.html +[39]:http://community.redhat.com/blog/2015/02/the-quid-pro-quo-of-open-infrastructure/ +[40]:http://www.curiousefficiency.org/posts/2017/10/considering-pythons-target-audience.html# From 8acebc67eb59ee599ac0fe7b7ee80648dcf0af55 Mon Sep 17 00:00:00 2001 From: Ezio Date: Wed, 11 Oct 2017 20:50:39 +0800 Subject: [PATCH 19/79] =?UTF-8?q?20171011-6=20=E9=80=89=E9=A2=98?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- ...c Range Imaging using OpenCV Cpp python.md | 421 ++++++++++++++++++ 1 file changed, 421 insertions(+) create mode 100644 sources/tech/20171002 High Dynamic Range Imaging using OpenCV Cpp python.md diff --git a/sources/tech/20171002 High Dynamic Range Imaging using OpenCV Cpp python.md b/sources/tech/20171002 High Dynamic Range Imaging using OpenCV Cpp python.md new file mode 100644 index 0000000000..9a9bd9d543 --- /dev/null +++ b/sources/tech/20171002 High Dynamic Range Imaging using OpenCV Cpp python.md @@ -0,0 +1,421 @@ +High Dynamic Range (HDR) Imaging using OpenCV (C++/Python) +============================================================ + + + +In this tutorial, we will learn how to create a High Dynamic Range (HDR) image using multiple images taken with different exposure settings. We will share code in both C++ and Python. + +### What is High Dynamic Range (HDR) imaging? + +Most digital cameras and displays capture or display color images as 24-bits matrices. There are 8-bits per color channel and the pixel values are therefore in the range 0 – 255 for each channel. In other words, a regular camera or a display has a limited dynamic range. + +However, the world around us has a very large dynamic range. It can get pitch black inside a garage when the lights are turned off and it can get really bright if you are looking directly at the Sun. Even without considering those extremes, in everyday situations, 8-bits are barely enough to capture the scene. So, the camera tries to estimate the lighting and automatically sets the exposure so that the most interesting aspect of the image has good dynamic range, and the parts that are too dark and too bright are clipped off to 0 and 255 respectively. + +In the Figure below, the image on the left is a normally exposed image. Notice the sky in the background is completely washed out because the camera decided to use a setting where the subject (my son) is properly photographed, but the bright sky is washed out. The image on the right is an HDR image produced by the iPhone. + + [![High Dynamic Range (HDR)](http://www.learnopencv.com/wp-content/uploads/2017/09/high-dynamic-range-hdr.jpg)][3] + +How does an iPhone capture an HDR image? It actually takes 3 images at three different exposures. The images are taken in quick succession so there is almost no movement between the three shots. The three images are then combined to produce the HDR image. We will see the details in the next section. + +The process of combining different images of the same scene acquired under different exposure settings is called High Dynamic Range (HDR) imaging. + +### How does High Dynamic Range (HDR) imaging work? + +In this section, we will go through the steps of creating an HDR image using OpenCV. + +To easily follow this tutorial, please [download][4] the C++ and Python code and images by clicking [here][5]. If you are interested to learn more about AI, Computer Vision and Machine Learning, please [subscribe][6] to our newsletter. + +### Step 1: Capture multiple images with different exposures + +When we take a picture using a camera, we have only 8-bits per channel to represent the dynamic range ( brightness range ) of the scene. But we can take multiple images of the scene at different exposures by changing the shutter speed. Most SLR cameras have a feature called Auto Exposure Bracketing (AEB) that allows us to take multiple pictures at different exposures with just one press of a button. If you are using an iPhone, you can use this [AutoBracket HDR app][7] and if you are an android user you can try [A Better Camera app][8]. + +Using AEB on a camera or an auto bracketing app on the phone, we can take multiple pictures quickly one after the other so the scene does not change. When we use HDR mode in an iPhone, it takes three pictures. + +1. An underexposed image: This image is darker than the properly exposed image. The goal is the capture parts of the image that very bright. + +2. A properly exposed image: This is the regular image the camera would have taken based on the illumination it has estimated. + +3. An overexposed image: This image is brighter than the properly exposed image. The goal is the capture parts of the image that very dark. + +However, if the dynamic range of the scene is very large, we can take more than three pictures to compose the HDR image. In this tutorial, we will use 4 images taken with exposure time 1/30, 0.25, 2.5 and 15 seconds. The thumbnails are shown below. + + [![Auto Exposure Bracketed HDR image sequence](http://www.learnopencv.com/wp-content/uploads/2017/10/hdr-image-sequence.jpg)][9] + +The information about the exposure time and other settings used by an SLR camera or a Phone are usually stored in the EXIF metadata of the JPEG file. Check out this [link][10] to see EXIF metadata stored in a JPEG file in Windows and Mac. Alternatively, you can use my favorite command line utility for EXIF called [EXIFTOOL ][11]. + +Let’s start by reading in the images are assigning the exposure times + +C++ + +``` +void readImagesAndTimes(vector &images, vector ×) +{ + + int numImages = 4; + + // List of exposure times + static const float timesArray[] = {1/30.0f,0.25,2.5,15.0}; + times.assign(timesArray, timesArray + numImages); + + // List of image filenames + static const char* filenames[] = {"img_0.033.jpg", "img_0.25.jpg", "img_2.5.jpg", "img_15.jpg"}; + for(int i=0; i < numImages; i++) + { + Mat im = imread(filenames[i]); + images.push_back(im); + } + +} +``` + +Python + +``` +def readImagesAndTimes(): + # List of exposure times + times = np.array([ 1/30.0, 0.25, 2.5, 15.0 ], dtype=np.float32) + + # List of image filenames + filenames = ["img_0.033.jpg", "img_0.25.jpg", "img_2.5.jpg", "img_15.jpg"] + images = [] + for filename in filenames: + im = cv2.imread(filename) + images.append(im) + + return images, times +``` + +### Step 2: Align Images + +Misalignment of images used in composing the HDR image can result in severe artifacts. In the Figure below, the image on the left is an HDR image composed using unaligned images and the image on the right is one using aligned images. By zooming into a part of the image, shown using red circles, we see severe ghosting artifacts in the left image. + + [![Misalignment problem in HDR](http://www.learnopencv.com/wp-content/uploads/2017/10/aligned-unaligned-hdr-comparison.jpg)][12] + +Naturally, while taking the pictures for creating an HDR image, professional photographer mount the camera on a tripod. They also use a feature called [mirror lockup][13] to reduce additional vibrations. Even then, the images may not be perfectly aligned because there is no way to guarantee a vibration-free environment. The problem of alignment gets a lot worse when images are taken using a handheld camera or a phone. + +Fortunately, OpenCV provides an easy way to align these images using `AlignMTB`. This algorithm converts all the images to median threshold bitmaps (MTB). An MTB for an image is calculated by assigning the value 1 to pixels brighter than median luminance and 0 otherwise. An MTB is invariant to the exposure time. Therefore, the MTBs can be aligned without requiring us to specify the exposure time. + +MTB based alignment is performed using the following lines of code. + +C++ + +``` +// Align input images +Ptr alignMTB = createAlignMTB(); +alignMTB->process(images, images); +``` + +Python + +``` +# Align input images +alignMTB = cv2.createAlignMTB() +alignMTB.process(images, images) +``` + +### Step 3: Recover the Camera Response Function + +The response of a typical camera is not linear to scene brightness. What does that mean? Suppose, two objects are photographed by a camera and one of them is twice as bright as the other in the real world. When you measure the pixel intensities of the two objects in the photograph, the pixel values of the brighter object will not be twice that of the darker object! Without estimating the Camera Response Function (CRF), we will not be able to merge the images into one HDR image. + +What does it mean to merge multiple exposure images into an HDR image? + +Consider just ONE pixel at some location (x,y) of the images. If the CRF was linear, the pixel value would be directly proportional to the exposure time unless the pixel is too dark ( i.e. nearly 0 ) or too bright ( i.e. nearly 255) in a particular image. We can filter out these bad pixels ( too dark or too bright ), and estimate the brightness at a pixel by dividing the pixel value by the exposure time and then averaging this brightness value across all images where the pixel is not bad ( too dark or too bright ). We can do this for all pixels and obtain a single image where all pixels are obtained by averaging “good” pixels. + +But the CRF is not linear and we need to make the image intensities linear before we can merge/average them by first estimating the CRF. + +The good news is that the CRF can be estimated from the images if we know the exposure times for each image. Like many problems in computer vision, the problem of finding the CRF is set up as an optimization problem where the goal is to minimize an objective function consisting of a data term and a smoothness term. These problems usually reduce to a linear least squares problem which are solved using Singular Value Decomposition (SVD) that is part of all linear algebra packages. The details of the CRF recovery algorithm are in the paper titled [Recovering High Dynamic Range Radiance Maps from Photographs][14]. + +Finding the CRF is done using just two lines of code in OpenCV using `CalibrateDebevec` or `CalibrateRobertson`. In this tutorial we will use `CalibrateDebevec` + +C++ + +``` +// Obtain Camera Response Function (CRF) +Mat responseDebevec; +Ptr calibrateDebevec = createCalibrateDebevec(); +calibrateDebevec->process(images, responseDebevec, times); + +``` + +Python + +``` +# Obtain Camera Response Function (CRF) +calibrateDebevec = cv2.createCalibrateDebevec() +responseDebevec = calibrateDebevec.process(images, times) +``` + +The figure below shows the CRF recovered using the images for the red, green and blue channels. + + [![Camera Response Function](http://www.learnopencv.com/wp-content/uploads/2017/10/camera-response-function.jpg)][15] + +### Step 4: Merge Images + +Once the CRF has been estimated, we can merge the exposure images into one HDR image using `MergeDebevec`. The C++ and Python code is shown below. + +C++ + +``` +// Merge images into an HDR linear image +Mat hdrDebevec; +Ptr mergeDebevec = createMergeDebevec(); +mergeDebevec->process(images, hdrDebevec, times, responseDebevec); +// Save HDR image. +imwrite("hdrDebevec.hdr", hdrDebevec); +``` + +Python + +``` +# Merge images into an HDR linear image +mergeDebevec = cv2.createMergeDebevec() +hdrDebevec = mergeDebevec.process(images, times, responseDebevec) +# Save HDR image. +cv2.imwrite("hdrDebevec.hdr", hdrDebevec) +``` + +The HDR image saved above can be loaded in Photoshop and tonemapped. An example is shown below. + + [![HDR Photoshop tone mapping](http://www.learnopencv.com/wp-content/uploads/2017/10/hdr-Photoshop-Tonemapping-1024x770.jpg)][16] HDR Photoshop tone mapping + +### Step 5: Tone mapping + +Now we have merged our exposure images into one HDR image. Can you guess the minimum and maximum pixel values for this image? The minimum value is obviously 0 for a pitch black condition. What is the theoretical maximum value? Infinite! In practice, the maximum value is different for different situations. If the scene contains a very bright light source, we will see a very large maximum value. + +Even though we have recovered the relative brightness information using multiple images, we now have the challenge of saving this information as a 24-bit image for display purposes. + +The process of converting a High Dynamic Range (HDR) image to an 8-bit per channel image while preserving as much detail as possible is called Tone mapping. + +There are several tone mapping algorithms. OpenCV implements four of them. The thing to keep in mind is that there is no right way to do tone mapping. Usually, we want to see more detail in the tonemapped image than in any one of the exposure images. Sometimes the goal of tone mapping is to produce realistic images and often times the goal is to produce surreal images. The algorithms implemented in OpenCV tend to produce realistic and therefore less dramatic results. + +Let’s look at the various options. Some of the common parameters of the different tone mapping algorithms are listed below. + +1. gamma : This parameter compresses the dynamic range by applying a gamma correction. When gamma is equal to 1, no correction is applied. A gamma of less than 1 darkens the image, while a gamma greater than 1 brightens the image. + +2. saturation : This parameter is used to increase or decrease the amount of saturation. When saturation is high, the colors are richer and more intense. Saturation value closer to zero, makes the colors fade away to grayscale. + +3. contrast : Controls the contrast ( i.e. log (maxPixelValue/minPixelValue) ) of the output image. + +Let us explore the four tone mapping algorithms available in OpenCV. + +#### Drago Tonemap + +The parameters for Drago Tonemap are shown below + +``` +createTonemapDrago +( +float gamma = 1.0f, +float saturation = 1.0f, +float bias = 0.85f +) +``` + +Here, bias is the value for bias function in [0, 1] range. Values from 0.7 to 0.9 usually give the best results. The default value is 0.85\. For more technical details, please see this [paper][17]. + +The C++ and Python code are shown below. The parameters were obtained by trial and error. The final output is multiplied by 3 just because it gave the most pleasing results. + +C++ + +``` +// Tonemap using Drago's method to obtain 24-bit color image +Mat ldrDrago; +Ptr tonemapDrago = createTonemapDrago(1.0, 0.7); +tonemapDrago->process(hdrDebevec, ldrDrago); +ldrDrago = 3 * ldrDrago; +imwrite("ldr-Drago.jpg", ldrDrago * 255); +``` + +Python + +``` +# Tonemap using Drago's method to obtain 24-bit color image +tonemapDrago = cv2.createTonemapDrago(1.0, 0.7) +ldrDrago = tonemapDrago.process(hdrDebevec) +ldrDrago = 3 * ldrDrago +cv2.imwrite("ldr-Drago.jpg", ldrDrago * 255) +``` + +Result + + [![HDR tone mapping using Drago's algorithm](http://www.learnopencv.com/wp-content/uploads/2017/10/hdr-Drago-1024x770.jpg)][18] HDR tone mapping using Drago’s algorithm + +#### Durand Tonemap + +The parameters for Durand Tonemap are shown below. + +``` +createTonemapDurand +( + float gamma = 1.0f, + float contrast = 4.0f, + float saturation = 1.0f, + float sigma_space = 2.0f, + float sigma_color = 2.0f +); +``` +The algorithm is based on the decomposition of the image into a base layer and a detail layer. The base layer is obtained using an edge-preserving filter called the bilateral filter. sigma_space and sigma_color are the parameters of the bilateral filter that control the amount of smoothing in the spatial and color domains respectively. + +For more details, check out this [paper][19]. + +C++ + +``` +// Tonemap using Durand's method obtain 24-bit color image +Mat ldrDurand; +Ptr tonemapDurand = createTonemapDurand(1.5,4,1.0,1,1); +tonemapDurand->process(hdrDebevec, ldrDurand); +ldrDurand = 3 * ldrDurand; +imwrite("ldr-Durand.jpg", ldrDurand * 255); +``` +Python + +``` +# Tonemap using Durand's method obtain 24-bit color image + tonemapDurand = cv2.createTonemapDurand(1.5,4,1.0,1,1) + ldrDurand = tonemapDurand.process(hdrDebevec) + ldrDurand = 3 * ldrDurand + cv2.imwrite("ldr-Durand.jpg", ldrDurand * 255) +``` + +Result + + [![HDR tone mapping using Durand's algorithm](http://www.learnopencv.com/wp-content/uploads/2017/10/hdr-Durand-1024x770.jpg)][20] HDR tone mapping using Durand’s algorithm + +#### Reinhard Tonemap + +``` + +createTonemapReinhard +( +float gamma = 1.0f, +float intensity = 0.0f, +float light_adapt = 1.0f, +float color_adapt = 0.0f +) +``` + +The parameter intensity should be in the [-8, 8] range. Greater intensity value produces brighter results. light_adapt controls the light adaptation and is in the [0, 1] range. A value of 1 indicates adaptation based only on pixel value and a value of 0 indicates global adaptation. An in-between value can be used for a weighted combination of the two. The parameter color_adapt controls chromatic adaptation and is in the [0, 1] range. The channels are treated independently if the value is set to 1 and the adaptation level is the same for every channel if the value is set to 0\. An in-between value can be used for a weighted combination of the two. + +For more details, check out this [paper][21]. + +C++ + +``` +// Tonemap using Reinhard's method to obtain 24-bit color image +Mat ldrReinhard; +Ptr tonemapReinhard = createTonemapReinhard(1.5, 0,0,0); +tonemapReinhard->process(hdrDebevec, ldrReinhard); +imwrite("ldr-Reinhard.jpg", ldrReinhard * 255); +``` + +Python + +``` +# Tonemap using Reinhard's method to obtain 24-bit color image +tonemapReinhard = cv2.createTonemapReinhard(1.5, 0,0,0) +ldrReinhard = tonemapReinhard.process(hdrDebevec) +cv2.imwrite("ldr-Reinhard.jpg", ldrReinhard * 255) +``` + +Result + + [![HDR tone mapping using Reinhard's algorithm](http://www.learnopencv.com/wp-content/uploads/2017/10/hdr-Reinhard-1024x770.jpg)][22] HDR tone mapping using Reinhard’s algorithm + +#### Mantiuk Tonemap + +``` +createTonemapMantiuk +( +float gamma = 1.0f, +float scale = 0.7f, +float saturation = 1.0f +) +``` + +The parameter scale is the contrast scale factor. Values from 0.6 to 0.9 produce best results. + +For more details, check out this [paper][23] + +C++ + +``` +// Tonemap using Mantiuk's method to obtain 24-bit color image +Mat ldrMantiuk; +Ptr tonemapMantiuk = createTonemapMantiuk(2.2,0.85, 1.2); +tonemapMantiuk->process(hdrDebevec, ldrMantiuk); +ldrMantiuk = 3 * ldrMantiuk; +imwrite("ldr-Mantiuk.jpg", ldrMantiuk * 255); +``` + +Python + +``` +# Tonemap using Mantiuk's method to obtain 24-bit color image +tonemapMantiuk = cv2.createTonemapMantiuk(2.2,0.85, 1.2) +ldrMantiuk = tonemapMantiuk.process(hdrDebevec) +ldrMantiuk = 3 * ldrMantiuk +cv2.imwrite("ldr-Mantiuk.jpg", ldrMantiuk * 255) +``` + +Result + + [![HDR tone mapping using Mantiuk's algorithm](http://www.learnopencv.com/wp-content/uploads/2017/10/hdr-Mantiuk-1024x770.jpg)][24] HDR Tone mapping using Mantiuk’s algorithm + +### Subscribe & Download Code + +If you liked this article and would like to download code (C++ and Python) and example images used in this post, please [subscribe][25] to our newsletter. You will also receive a free [Computer Vision Resource][26]Guide. In our newsletter, we share OpenCV tutorials and examples written in C++/Python, and Computer Vision and Machine Learning algorithms and news. + +[Subscribe Now][27] + +Image Credits +The four exposure images used in this post are licensed under [CC BY-SA 3.0][28] and were downloaded from [Wikipedia’s HDR page][29]. They were photographed by Kevin McCoy. + +-------------------------------------------------------------------------------- + +作者简介: + +I am an entrepreneur with a love for Computer Vision and Machine Learning with a dozen years of experience (and a Ph.D.) in the field. + +In 2007, right after finishing my Ph.D., I co-founded TAAZ Inc. with my advisor Dr. David Kriegman and Kevin Barnes. The scalability, and robustness of our computer vision and machine learning algorithms have been put to rigorous test by more than 100M users who have tried our products. + +--------------------------- + +via: http://www.learnopencv.com/high-dynamic-range-hdr-imaging-using-opencv-cpp-python/ + +作者:[ SATYA MALLICK ][a] +译者:[译者ID](https://github.com/译者ID) +校对:[校对者ID](https://github.com/校对者ID) + +本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 + +[a]:http://www.learnopencv.com/about/ +[1]:http://www.learnopencv.com/author/spmallick/ +[2]:http://www.learnopencv.com/high-dynamic-range-hdr-imaging-using-opencv-cpp-python/#disqus_thread +[3]:http://www.learnopencv.com/wp-content/uploads/2017/09/high-dynamic-range-hdr.jpg +[4]:http://www.learnopencv.com/wp-content/uploads/2017/10/hdr.zip +[5]:http://www.learnopencv.com/wp-content/uploads/2017/10/hdr.zip +[6]:https://bigvisionllc.leadpages.net/leadbox/143948b73f72a2%3A173c9390c346dc/5649050225344512/ +[7]:https://itunes.apple.com/us/app/autobracket-hdr/id923626339?mt=8&ign-mpt=uo%3D8 +[8]:https://play.google.com/store/apps/details?id=com.almalence.opencam&hl=en +[9]:http://www.learnopencv.com/wp-content/uploads/2017/10/hdr-image-sequence.jpg +[10]:https://www.howtogeek.com/289712/how-to-see-an-images-exif-data-in-windows-and-macos +[11]:https://www.sno.phy.queensu.ca/~phil/exiftool +[12]:http://www.learnopencv.com/wp-content/uploads/2017/10/aligned-unaligned-hdr-comparison.jpg +[13]:https://www.slrlounge.com/workshop/using-mirror-up-mode-mirror-lockup +[14]:http://www.pauldebevec.com/Research/HDR/debevec-siggraph97.pdf +[15]:http://www.learnopencv.com/wp-content/uploads/2017/10/camera-response-function.jpg +[16]:http://www.learnopencv.com/wp-content/uploads/2017/10/hdr-Photoshop-Tonemapping.jpg +[17]:http://resources.mpi-inf.mpg.de/tmo/logmap/logmap.pdf +[18]:http://www.learnopencv.com/wp-content/uploads/2017/10/hdr-Drago.jpg +[19]:https://people.csail.mit.edu/fredo/PUBLI/Siggraph2002/DurandBilateral.pdf +[20]:http://www.learnopencv.com/wp-content/uploads/2017/10/hdr-Durand.jpg +[21]:http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.106.8100&rep=rep1&type=pdf +[22]:http://www.learnopencv.com/wp-content/uploads/2017/10/hdr-Reinhard.jpg +[23]:http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.60.4077&rep=rep1&type=pdf +[24]:http://www.learnopencv.com/wp-content/uploads/2017/10/hdr-Mantiuk.jpg +[25]:https://bigvisionllc.leadpages.net/leadbox/143948b73f72a2%3A173c9390c346dc/5649050225344512/ +[26]:https://bigvisionllc.leadpages.net/leadbox/143948b73f72a2%3A173c9390c346dc/5649050225344512/ +[27]:https://bigvisionllc.leadpages.net/leadbox/143948b73f72a2%3A173c9390c346dc/5649050225344512/ +[28]:https://creativecommons.org/licenses/by-sa/3.0/ +[29]:https://en.wikipedia.org/wiki/High-dynamic-range_imaging From ca1496b7ac291e978e0f2e099b79daa1aa437ada Mon Sep 17 00:00:00 2001 From: Ezio Date: Wed, 11 Oct 2017 20:53:01 +0800 Subject: [PATCH 20/79] =?UTF-8?q?20171011-7=20=E9=80=89=E9=A2=98?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- ...Multiple Linux Distributions on One USB.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 sources/tech/20171008 How to Install Multiple Linux Distributions on One USB.md diff --git a/sources/tech/20171008 How to Install Multiple Linux Distributions on One USB.md b/sources/tech/20171008 How to Install Multiple Linux Distributions on One USB.md new file mode 100644 index 0000000000..215be4e4b7 --- /dev/null +++ b/sources/tech/20171008 How to Install Multiple Linux Distributions on One USB.md @@ -0,0 +1,93 @@ +How to Install Multiple Linux Distributions on One USB +============================================================ + + + _Brief: This tutorial shows you how to install multiple Linux distributions on one USB. This way, you can enjoy more than one live Linux distros on a single USB key._ + +I enjoy trying out different Linux distributions via live USB. It gives me the option to test the OS on a real hardware, not in a virtualized environment. Also, I can plug in the USB to any system (read Windows), do whatever I want and enjoy the same Linux experience. And yes, in case something goes wrong with my system I can use the USB drive to recover! + +Creating a single [bootable live USB of Linux][8] is easy, you just download an ISO file and burn it to a USB drive. But, what if you want to try more than one Linux distribution? You can either use more than one USB or you can overwrite the same USB to try other Linux distributions. Neither of these methods is very convenient. + +So, how about installing more than one Linux distributions on a single USB? We are going to see how to do it in this tutorial. + +### How to create a bootable USB with multiple Linux distributions on it + +![How to install multiple linux distributions on a single USB](https://itsfoss.com/wp-content/uploads/2017/10/multiple-linux-on-one-usb-800x450.jpg) + +Well, we have a tool which does exactly the same by keeping  _more than one Linux distribution on a single USB drive_ . All you have to do is select the distributions you want to install. In this tutorial, we will cover  _how to install multiple Linux distribution on a USB stick_  for live sessions. + +Just to make sure, you should have a USB drive big enough to have several Linux distributions on it, so an 8 GB USB key should be enough for three or four Linux distributions. + +### Step 1 + +[MultiBootUSB][9] is a free and open source cross-platform application which allows you to create a USB drive with multiple Linux distributions. It also supports uninstalling any distribution at any point in time, so you can reclaim space on your drive for another one. + +Download the .deb package and install it by double-clicking on it. + +[Download MultiBootUSB][10] + +### Step 2 + +The recommended filesystem is FAT32, so make sure to format your USB drive before creating a multi-boot USB stick. + +### Step 3 + +Download the ISO images of Linux distributions you want to install. + +### Step 4 + +Once you have everything, start MultiBootUSB. + +![MultiBootUSB](https://itsfoss.com/wp-content/uploads/2017/09/1.png) + +The home screen asks you to select the USB disk and the image file for the Linux distribution which you want to put on your USB. + +MultiBootUSB supports persistence for Ubuntu, Fedora and Debian distros, which means that changes made to the live version of the Linux distributions are saved to the USB disk. + +You can select the persistence size by dragging the slider under MultiBootUSB tab. Persistence gives you an option to save changes to the USB drive in runtime. + +![MultiBootUSB persistence storage](https://itsfoss.com/wp-content/uploads/2017/09/2-1.png) + +### Step 5 + +Click on Install distro option and proceed with the installation. It will take some time to complete before showing a successful installation message. + +You can now see the distribution in the installed section. For another OS, repeat the process. This is what it looks like when I installed a copy of Ubuntu 16.10 and Fedora 24. + +![MultiBootSystem with Ubuntu and Fedora](https://itsfoss.com/wp-content/uploads/2017/09/3.png) + +### Step 6 + +Next time I boot through the USB, I get the option of choosing either of the distributions. + +![Boot Menu](https://itsfoss.com/wp-content/uploads/2017/09/VirtualBox_Ubuntu1610_23_09_2017_14_16_05-1.png) + +You can add as many distros as you want and your USB storage allows. For removing a distro, select it from the list and click on Uninstall Distro. + +### Final Words + +MultiBootUSB really looks handy for installing multiple Linux distribution on a USB stick. With just a few clicks, I have a working drive with two of my favorite OS and I can boot into them on any system. + +Let us know in the comments if you face any issue while installing or using MultiBootUSB. + +-------------------------------------------------------------------------------- + +via: https://itsfoss.com/multiple-linux-one-usb/ + +作者:[Ambarish Kumar ][a] +译者:[译者ID](https://github.com/译者ID) +校对:[校对者ID](https://github.com/校对者ID) + +本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 + +[a]:https://itsfoss.com/author/ambarish/ +[1]:https://itsfoss.com/author/ambarish/ +[2]:https://itsfoss.com/multiple-linux-one-usb/#comments +[3]:https://www.facebook.com/share.php?u=https%3A%2F%2Fitsfoss.com%2Fmultiple-linux-one-usb%2F%3Futm_source%3Dfacebook%26utm_medium%3Dsocial%26utm_campaign%3DSocialWarfare +[4]:https://twitter.com/share?original_referer=/&text=How+to+Install+Multiple+Linux+Distributions+on+One+USB&url=https://itsfoss.com/multiple-linux-one-usb/%3Futm_source%3Dtwitter%26utm_medium%3Dsocial%26utm_campaign%3DSocialWarfare&via=itsfoss2 +[5]:https://plus.google.com/share?url=https%3A%2F%2Fitsfoss.com%2Fmultiple-linux-one-usb%2F%3Futm_source%3DgooglePlus%26utm_medium%3Dsocial%26utm_campaign%3DSocialWarfare +[6]:https://www.linkedin.com/cws/share?url=https%3A%2F%2Fitsfoss.com%2Fmultiple-linux-one-usb%2F%3Futm_source%3DlinkedIn%26utm_medium%3Dsocial%26utm_campaign%3DSocialWarfare +[7]:https://www.reddit.com/submit?url=https://itsfoss.com/multiple-linux-one-usb/&title=How+to+Install+Multiple+Linux+Distributions+on+One+USB +[8]:https://itsfoss.com/create-live-usb-of-ubuntu-in-windows/ +[9]:http://multibootusb.org/ +[10]:https://github.com/mbusb/multibootusb/releases/download/v8.8.0/python3-multibootusb_8.8.0-1_all.deb From ee908990cbf4bafd169f9cc95282ecdf7d2212de Mon Sep 17 00:00:00 2001 From: Ezio Date: Wed, 11 Oct 2017 20:55:20 +0800 Subject: [PATCH 21/79] =?UTF-8?q?20171011-8=20=E9=80=89=E9=A2=98?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- ...Source Code... and Remove it Afterwards.md | 516 ++++++++++++++++++ 1 file changed, 516 insertions(+) create mode 100644 sources/tech/20171006 How to Install Software from Source Code... and Remove it Afterwards.md diff --git a/sources/tech/20171006 How to Install Software from Source Code... and Remove it Afterwards.md b/sources/tech/20171006 How to Install Software from Source Code... and Remove it Afterwards.md new file mode 100644 index 0000000000..5f61ca124a --- /dev/null +++ b/sources/tech/20171006 How to Install Software from Source Code... and Remove it Afterwards.md @@ -0,0 +1,516 @@ +How to Install Software from Source Code… and Remove it Afterwards +============================================================ + +![How to install software from source code](https://itsfoss.com/wp-content/uploads/2017/10/install-software-from-source-code-linux-800x450.jpg) + + _Brief: This detailed guide explains how to install a program from source code in Linux and how to remove the software installed from the source code._ + +One of the greatest strength of your Linux distribution is its package manager and the associated software repository. With them, you have all the necessary tools and resources to download and install a new software on your computer in a completely automated manner. + +But despite all their efforts, the package maintainers cannot handle each and every use cases. Nor can they package all the software available out there. So there are still situations where you will have to compile and install a new software by yourself. As of myself, the most common reason, by far, I have to compile some software is when I need to run a very specific version. Or because I want to modify the source code or use some fancy compilation options. + +If your needs belong to that latter category, there are chances you already know what you do. But for the vast majority of Linux users, compiling and installing a software from the sources for the first time might look like an initiation ceremony: somewhat frightening; but with the promise to enter a new world of possibilities and to be part of a privileged community if you overcome that. + +[Suggested readHow To Install And Remove Software In Ubuntu [Complete Guide]][8] + +### A. Installing software from source code in Linux + +And that’s exactly what we will do here. For the purpose of that article, let’s say I need to install [NodeJS][9] 8.1.1 on my system. That version exactly. A version which is not available from the Debian repository: + +``` +sh$ apt-cache madison nodejs | grep amd64 + nodejs | 6.11.1~dfsg-1 | http://deb.debian.org/debian experimental/main amd64 Packages + nodejs | 4.8.2~dfsg-1 | http://ftp.fr.debian.org/debian stretch/main amd64 Packages + nodejs | 4.8.2~dfsg-1~bpo8+1 | http://ftp.fr.debian.org/debian jessie-backports/main amd64 Packages + nodejs | 0.10.29~dfsg-2 | http://ftp.fr.debian.org/debian jessie/main amd64 Packages + nodejs | 0.10.29~dfsg-1~bpo70+1 | http://ftp.fr.debian.org/debian wheezy-backports/main amd64 Packages +``` + +### Step 1: Getting the source code from GitHub + +Like many open-source projects, the sources of NodeJS can be found on GitHub: [https://github.com/nodejs/node][10] + +So, let’s go directly there. + +![The NodeJS official GitHub repository](https://itsfoss.com/wp-content/uploads/2017/07/nodejs-github-account.png) + +If you’re not familiar with [GitHub][11], [git][12] or any other [version control system][13] worth mentioning the repository contains the current source for the software, as well as a history of all the modifications made through the years to that software. Eventually up to the very first line written for that project. For the developers, keeping that history has many advantages. For us today, the main one is we will be able to get the sources from for the project as they were at any given point in time. More precisely, I will be able to get the sources as they were when the 8.1.1 version I want was released. Even if there were many modifications since then. + +![Choose the v8.1.1 tag in the NodeJS GitHub repository](https://itsfoss.com/wp-content/uploads/2017/07/nodejs-github-choose-revision-tag.png) + +On GitHub, you can use the “branch” button to navigate between different versions of the software. [“Branch” and “tags” are somewhat related concepts in Git][14]. Basically, the developers create “branch” and “tags” to keep track of important events in the project history, like when they start working on a new feature or when they publish a release. I will not go into the details here, all you need to know is I’m looking for the version  _tagged_  “v8.1.1” + +![The NodeJS GitHub repository as it was at the time the v8.1.1 tag was created](https://itsfoss.com/wp-content/uploads/2017/07/nodejs-github-revision-811.png) + +After having chosen on the “v8.1.1” tag, the page is refreshed, the most obvious change being the tag now appears as part of the URL. In addition, you will notice the file change date are different too. The source tree you are now seeing is the one that existed at the time the v8.1.1 tag was created. In some sense, you can think of a version control tool like git as a time travel machine, allowing you to go back and forth into a project history. + +![NodeJS GitHub repository download as a ZIP button](https://itsfoss.com/wp-content/uploads/2017/07/nodejs-github-revision-download-zip.png) + +At this point, we can download the sources of NodeJS 8.1.1\. You can’t miss the big blue button suggesting to download the ZIP archive of the project. As of myself, I will download and extract the ZIP from the command line for the sake of the explanation. But if you prefer using a [GUI][15] tool, don’t hesitate to do that instead: + +``` +wget https://github.com/nodejs/node/archive/v8.1.1.zip +unzip v8.1.1.zip +cd node-8.1.1/ +``` + +Downloading the ZIP archive works great. But if you want to do it “like a pro”, I would suggest using directly the `git` tool to download the sources. It is not complicated at all— and it will be a nice first contact with a tool you will often encounter: + +``` +# first ensure git is installed on your system +sh$ sudo apt-get install git +# Make a shallow clone the NodeJS repository at v8.1.1 +sh$ git clone --depth 1 \ + --branch v8.1.1 \ + https://github.com/nodejs/node +sh$ cd node/ +``` + +By the way, if you have any issue, just consider that first part of this article as a general introduction. Later I have more detailed explanations for Debian- and ReadHat-based distributions in order to help you troubleshoot common issues. + +Anyway, whenever you downloaded the source using `git` or as a ZIP archive, you should now have exactly the same source files in the current directory: + +``` +sh$ ls +android-configure BUILDING.md common.gypi doc Makefile src +AUTHORS CHANGELOG.md configure GOVERNANCE.md node.gyp test +benchmark CODE_OF_CONDUCT.md CONTRIBUTING.md lib node.gypi tools +BSDmakefile COLLABORATOR_GUIDE.md deps LICENSE README.md vcbuild.bat +``` + +### Step 2: Understanding the Build System of the program + +We usually talk about “compiling the sources”, but the compilation is only one of the phases required to produce a working software from its source. A build system is a set of tool and practices used to automate and articulate those different tasks in order to build entirely the software just by issuing few commands. + +If the concept is simple, the reality is somewhat more complicated. Because different projects or programming language may have different requirements. Or because of the programmer’s tastes. Or the supported platforms. Or for historical reason. Or… or.. there is an almost endless list of reasons to choose or create another build system. All that to say there are many different solutions used out there. + +NodeJS uses a [GNU-style build system][16]. This is a popular choice in the open source community. And once again, a good way to start your journey. + +Writing and tuning a build system is a pretty complex task. But for the “end user”, GNU-style build systems resume themselves in using two tools: `configure` and `make`. + +The `configure` file is a project-specific script that will check the destination system configuration and available feature in order to ensure the project can be built, eventually dealing with the specificities of the current platform. + +An important part of a typical `configure` job is to build the `Makefile`. That is the file containing the instructions required to effectively build the project. + +The [`make` tool][17]), on the other hand, is a POSIX tool available on any Unix-like system. It will read the project-specific `Makefile` and perform the required operations to build and install your program. + +But, as always in the Linux world, you still have some latency to customize the build for your specific needs. + +``` +./configure --help +``` + +The `configure -help` command will show you all the available configuration options. Once again, this is very project-specific. And to be honest, it is sometimes required to dig into the project before fully understand the meaning of each and every configure option. + +But there is at least one standard GNU Autotools option that you must know: the `--prefix` option. This has to do with the file system hierarchy and the place your software will be installed. + +[Suggested read8 Vim Tips And Tricks That Will Make You A Pro User][18] + +### Step 3: The FHS + +The Linux file system hierarchy on a typical distribution mostly comply with the [Filesystem Hierarchy Standard (FHS)][19] + +That standard explains the purpose of the various directories of your system: `/usr`, `/tmp`, `/var` and so on. + +When using the GNU Autotools— and most other build systems— the default installation location for your new software will be `/usr/local`. Which is a good choice as according to the FSH  _“The /usr/local hierarchy is for use by the system administrator when installing software locally? It needs to be safe from being overwritten when the system software is updated. It may be used for programs and data that are shareable amongst a group of hosts, but not found in /usr.”_ + +The `/usr/local` hierarchy somehow replicates the root directory, and you will find there `/usr/local/bin` for the executable programs, `/usr/local/lib` for the libraries, `/usr/local/share` for architecture independent files and so on. + +The only issue when using the `/usr/local` tree for custom software installation is the files for all your software will be mixed there. Especially, after having installed a couple of software, it will be hard to track to which file exactly of `/usr/local/bin` and `/usr/local/lib` belongs to which software. That will not cause any issue to the system though. After all, `/usr/bin` is just about the same mess. But that will become an issue the day you will want to remove a manually installed software. + +To solve that issue, I usually prefer installing custom software in the `/opt`sub-tree instead. Once again, to quote the FHS: + +_”/opt is reserved for the installation of add-on application software packages. + +A package to be installed in /opt must locate its static files in a separate /opt/ or /opt/ directory tree, where is a name that describes the software package and is the provider’s LANANA registered name.”_ + +So we will create a sub-directory of `/opt` specifically for our custom NodeJS installation. And if someday I want to remove that software, I will simply have to remove that directory: + +``` +sh$ sudo mkdir /opt/node-v8.1.1 +sh$ sudo ln -sT node-v8.1.1 /opt/node +# What is the purpose of the symbolic link above? +# Read the article till the end--then try to answer that +# question in the comment section! + +sh$ ./configure --prefix=/opt/node-v8.1.1 +sh$ make -j9 && echo ok +# -j9 means run up to 9 parallel tasks to build the software. +# As a rule of thumb, use -j(N+1) where N is the number of cores +# of your system. That will maximize the CPU usage (one task per +# CPU thread/core + a provision of one extra task when a process +# is blocked by an I/O operation. +``` + +Anything but “ok” after the `make` command has completed would mean there was an error during the build process. As we ran a parallel build because of the `-j` option, it is not always easy to retrieve the error message given the large volume of output produced by the build system. + +In the case of issue, just restart `make`, but without the `-j` option this time. And the error should appear near the end of the output: + +``` +sh$ make +``` + +Finally, once the compilation has gone to the end, you can install your software to its location by running the command: + +``` +sh$ sudo make install +``` + +And test it: + +``` +sh$ /opt/node/bin/node --version +v8.1.1 +``` + +### B. What if things go wrong while installing from source code? + +What I’ve explained above is mostly what you can see on the “build instruction” page of a well-documented project. But given this article goal is to let you compile your first software from sources, it might worth taking the time to investigate some common issues. So, I will do the whole procedure again, but this time from a fresh and minimal Debian 9.0 and CentOS 7.0 systems. So you can see the error I encountered and how I solved them. + +### From Debian 9.0 “Stretch” + +``` +itsfoss@debian:~$ git clone --depth 1 \ + --branch v8.1.1 \ + https://github.com/nodejs/node +-bash: git: command not found +``` + +This problem is quite easy to diagnosis and solve. Just install the `git` package: + +``` +itsfoss@debian:~$ sudo apt-get install git +``` + +``` +itsfoss@debian:~$ git clone --depth 1 \ + --branch v8.1.1 \ + https://github.com/nodejs/node && echo ok +[...] +ok +``` + +``` +itsfoss@debian:~/node$ sudo mkdir /opt/node-v8.1.1 +itsfoss@debian:~/node$ sudo ln -sT node-v8.1.1 /opt/node +``` + +No problem here. + +``` +itsfoss@debian:~/node$ ./configure --prefix=/opt/node-v8.1.1/ +WARNING: failed to autodetect C++ compiler version (CXX=g++) +WARNING: failed to autodetect C compiler version (CC=gcc) +Node.js configure error: No acceptable C compiler found! + Please make sure you have a C compiler installed on your system and/or + consider adjusting the CC environment variable if you installed + it in a non-standard prefix. +``` + +Obviously, to compile a project, you need a compiler. NodeJS being written using the [C++ language][20], we need a C++ [compiler][21]. Here I will install `g++`, the GNU C++ compiler for that purpose: + +``` +itsfoss@debian:~/node$ sudo apt-get install g++ +itsfoss@debian:~/node$ ./configure --prefix=/opt/node-v8.1.1/ && echo ok +[...] +ok +``` + +``` +itsfoss@debian:~/node$ make -j9 && echo ok +-bash: make: command not found +``` + +One other missing tool. Same symptoms. Same solution: + +``` +itsfoss@debian:~/node$ sudo apt-get install make +itsfoss@debian:~/node$ make -j9 && echo ok +[...] +ok +``` + +``` +itsfoss@debian:~/node$ sudo make install +[...] +itsfoss@debian:~/node$ /opt/node/bin/node --version +v8.1.1 +``` + +Success! + +Please notice: I’ve installed the various tools one by one to show how to diagnosis the compilation issues and to show you the typical solution to solve those issues. But if you search more about that topic or read other tutorials, you will discover that most distributions have “meta-packages” acting as an umbrella to install some or all the typical tools used for compiling a software. On Debian-based systems, you will probably encounter the [build-essentials][22]package for that purpose. And on Red-Hat-based distributions, that will be the  _“Development Tools”_  group. + +### From CentOS 7.0 + +``` +[itsfoss@centos ~]$ git clone --depth 1 \ + --branch v8.1.1 \ + https://github.com/nodejs/node +-bash: git: command not found +``` + +Command not found? Just install it using the `yum` package manager: + +``` +[itsfoss@centos ~]$ sudo yum install git +``` + +``` +[itsfoss@centos ~]$ git clone --depth 1 \ + --branch v8.1.1 \ + https://github.com/nodejs/node && echo ok +[...] +ok +``` + +``` +[itsfoss@centos ~]$ sudo mkdir /opt/node-v8.1.1 +[itsfoss@centos ~]$ sudo ln -sT node-v8.1.1 /opt/node +``` + +``` +[itsfoss@centos ~]$ cd node +[itsfoss@centos node]$ ./configure --prefix=/opt/node-v8.1.1/ +WARNING: failed to autodetect C++ compiler version (CXX=g++) +WARNING: failed to autodetect C compiler version (CC=gcc) +Node.js configure error: No acceptable C compiler found! + + Please make sure you have a C compiler installed on your system and/or + consider adjusting the CC environment variable if you installed + it in a non-standard prefix. +``` + +You guess it: NodeJS is written using the C++ language, but my system lacks the corresponding compiler. Yum to the rescue. As I’m not a regular CentOS user, I actually had to search on the Internet the exact name of the package containing the g++ compiler. Leading me to that page: [https://superuser.com/questions/590808/yum-install-gcc-g-doesnt-work-anymore-in-centos-6-4][23] + +``` +[itsfoss@centos node]$ sudo yum install gcc-c++ +[itsfoss@centos node]$ ./configure --prefix=/opt/node-v8.1.1/ && echo ok +[...] +ok +``` + +``` +[itsfoss@centos node]$ make -j9 && echo ok +[...] +ok +``` + +``` +[itsfoss@centos node]$ sudo make install && echo ok +[...] +ok +``` + +``` +[itsfoss@centos node]$ /opt/node/bin/node --version +v8.1.1 +``` + +Success. Again. + +### C. Making changes to the software installed from source code + +You may install a software from the source because you need a very specific version not available in your distribution repository. Or because you want to  _modify_  that program. Either to fix a bug or add a feature. After all, open-source is all about that. So I will take that opportunity to give you a taste of the power you have at hand now you are able to compile your own software. + +Here, we will make a minor change to the sources of NodeJS. And we will see if our change will be incorporated into the compiled version of the software: + +Open the file `node/src/node.cc` in your favorite [text editor][24] (vim, nano, gedit, … ). And try to locate that fragment of code: + +``` + if (debug_options.ParseOption(argv[0], arg)) { + // Done, consumed by DebugOptions::ParseOption(). + } else if (strcmp(arg, "--version") == 0 || strcmp(arg, "-v") == 0) { + printf("%s\n", NODE_VERSION); + exit(0); + } else if (strcmp(arg, "--help") == 0 || strcmp(arg, "-h") == 0) { + PrintHelp(); + exit(0); + } +``` + +It is around [line 3830 of the file][25]. Then modify the line containing `printf` to match that one instead: + +``` + printf("%s (compiled by myself)\n", NODE_VERSION); +``` + +Then head back to your terminal. Before going further— and to give you some more insight of the power behind git— you can check if you’ve modified the right file: + +``` +diff --git a/src/node.cc b/src/node.cc +index bbce1022..a5618b57 100644 +--- a/src/node.cc ++++ b/src/node.cc +@@ -3828,7 +3828,7 @@ static void ParseArgs(int* argc, + if (debug_options.ParseOption(argv[0], arg)) { + // Done, consumed by DebugOptions::ParseOption(). + } else if (strcmp(arg, "--version") == 0 || strcmp(arg, "-v") == 0) { +- printf("%s\n", NODE_VERSION); ++ printf("%s (compiled by myself)\n", NODE_VERSION); + exit(0); + } else if (strcmp(arg, "--help") == 0 || strcmp(arg, "-h") == 0) { + PrintHelp(); +``` + +You should see a “-” (minus sign) before the line as it was before you changed it. And a “+” (plus sign) before the line after your changes. + +It is now time to recompile and re-install your software: + +``` +make -j9 && sudo make install && echo ok +[...] +ok +``` + +This times, the only reason it might fail is that you’ve made a typo while changing the code. If this is the case, re-open the `node/src/node.cc` file in your text editor and fix the mistake. + +Once you’ve managed to compile and install that new modified NodeJS version, you will be able to check if your modifications were actually incorporated into the software: + +``` +itsfoss@debian:~/node$ /opt/node/bin/node --version +v8.1.1 (compiled by myself) +``` + +Congratulations! You’ve made your first change to an open-source program! + +### D. Let the shell locate our custom build software + +You may have noticed until now, I always launched my newly compiled NodeJS software by specifying the absolute path to the binary file. + +``` +/opt/node/bin/node +``` + +It works. But this is annoying, to say the least. There are actually two common ways of fixing that. But to understand them, you must first know your shell locates the executable files by looking for them only into the directories specified by the `PATH` [environment variable][26]. + +``` +itsfoss@debian:~/node$ echo $PATH +/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games +``` + +Here, on that Debian system, if you do not specify explicitly any directory as part of a command name, the shell will first look for that executable programs into `/usr/local/bin`, then if not found into `/usr/bin`, then if not found into `/bin` then if not found into `/usr/local/games` then if not found into `/usr/games`, then if not found … the shell will report an error  _“command not found”_ . + +Given that, we have two way to make a command accessible to the shell: by adding it to one of the already configured `PATH` directories. Or by adding the directory containing our executable file to the `PATH`. + +### Adding a link from /usr/local/bin + +Just  _copying_  the node binary executable from `/opt/node/bin` to `/usr/local/bin` would be a bad idea since by doing so, the executable program would no longer be able to locate the other required components belonging to `/opt/node/` (it’s a common practice for a software to locate its resource files relative to its own location). + +So, the traditional way of doing that is by using a symbolic link: + +``` +itsfoss@debian:~/node$ sudo ln -sT /opt/node/bin/node /usr/local/bin/node +itsfoss@debian:~/node$ which -a node || echo not found +/usr/local/bin/node +itsfoss@debian:~/node$ node --version +v8.1.1 (compiled by myself) +``` + +This is a simple and effective solution, especially if a software package is made of just few well known executable programs— since you have to create a symbolic link for each and every user-invokable commands. For example, if you’re familiar with NodeJS, you know the `npm` companion application I should symlink from `/usr/local/bin` too. But I let that to you as an exercise. + +### Modifying the PATH + +First, if you tried the preceding solution, remove the node symbolic link created previously to start from a clear state: + +``` +itsfoss@debian:~/node$ sudo rm /usr/local/bin/node +itsfoss@debian:~/node$ which -a node || echo not found +not found +``` + +And now, here is the magic command to change your `PATH`: + +``` +itsfoss@debian:~/node$ export PATH="/opt/node/bin:${PATH}" +itsfoss@debian:~/node$ echo $PATH +/opt/node/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games +``` + +Simply said, I replaced the content of the `PATH` environment variable by its previous content, but prefixed by `/opt/node/bin`. So, as you can imagine it now, the shell will look first into the `/opt/node/bin` directory for executable programs. We can confirm that using the `which` command: + +``` +itsfoss@debian:~/node$ which -a node || echo not found +/opt/node/bin/node +itsfoss@debian:~/node$ node --version +v8.1.1 (compiled by myself) +``` + +Whereas the “link” solution is permanent as soon as you’ve created the symbolic link into `/usr/local/bin`, the `PATH` change is effective only into the current shell. I let you do some researches by yourself to know how to make changes of the `PATH` permanents. As a hint, it has to do with your “profile”. If you find the solution, don’t hesitate to share that with the other readers by using the comment section below! + +### E. How to remove that newly installed software from source code + +Since our custom compiled NodeJS software sits completely in the `/opt/node-v8.1.1` directory, removing that software is not more work than using the `rm` command to remove that directory: + +``` +sudo rm -rf /opt/node-v8.1.1 +``` + +BEWARE: `sudo` and `rm -rf` are a dangerous cocktail! Always check your command twice before pressing the “enter” key. You won’t have any confirmation message and no undelete if you remove the wrong directory… + +Then, if you’ve modified your `PATH`, you will have to revert those changes. Which is not complicated at all. + +And if you’ve created links from `/usr/local/bin` you will have to remove them all: + +``` +itsfoss@debian:~/node$ sudo find /usr/local/bin \ + -type l \ + -ilname "/opt/node/*" \ + -print -delete +/usr/local/bin/node +``` + +### Wait? Where was the Dependency Hell? + +As a final comment, if you read about compiling your own custom software, you might have heard about the [dependency hell][27]. This is a nickname for that annoying situation where before being able to successfully compile a software, you must first compile a pre-requisite library, which in its turn requires another library that might in its turn be incompatible with some other software you’ve already installed. + +Part of the job of the package maintainers of your distribution is to actually resolve that dependency hell and to ensure the various software of your system are using compatible libraries and are installed in the right order. + +In that article, I chose on purpose to install NodeJS as it virtually doesn’t have dependencies. I said “virtually” because, in fact, it  _has_  dependencies. But the source code of those dependencies are present in the source repository of the project (in the `node/deps` subdirectory), so you don’t have to download and install them manually before hand. + +But if you’re interested in understanding more about that problem and learn how to deal with it, let me know that using the comment section below: that would be a great topic for a more advanced article! + +-------------------------------------------------------------------------------- + +作者简介: + +Engineer by Passion, Teacher by Vocation. My goals : to share my enthusiasm for what I teach and prepare my students to develop their skills by themselves. You can find me on my website as well. + +-------------------- + +via: https://itsfoss.com/install-software-from-source-code/ + +作者:[Sylvain Leroux ][a] +译者:[译者ID](https://github.com/译者ID) +校对:[校对者ID](https://github.com/校对者ID) + +本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 + +[a]:https://itsfoss.com/author/sylvain/ +[1]:https://itsfoss.com/author/sylvain/ +[2]:https://itsfoss.com/install-software-from-source-code/#comments +[3]:https://www.facebook.com/share.php?u=https%3A%2F%2Fitsfoss.com%2Finstall-software-from-source-code%2F%3Futm_source%3Dfacebook%26utm_medium%3Dsocial%26utm_campaign%3DSocialWarfare +[4]:https://twitter.com/share?original_referer=/&text=How+to+Install+Software+from+Source+Code%E2%80%A6+and+Remove+it+Afterwards&url=https://itsfoss.com/install-software-from-source-code/%3Futm_source%3Dtwitter%26utm_medium%3Dsocial%26utm_campaign%3DSocialWarfare&via=Yes_I_Know_IT +[5]:https://plus.google.com/share?url=https%3A%2F%2Fitsfoss.com%2Finstall-software-from-source-code%2F%3Futm_source%3DgooglePlus%26utm_medium%3Dsocial%26utm_campaign%3DSocialWarfare +[6]:https://www.linkedin.com/cws/share?url=https%3A%2F%2Fitsfoss.com%2Finstall-software-from-source-code%2F%3Futm_source%3DlinkedIn%26utm_medium%3Dsocial%26utm_campaign%3DSocialWarfare +[7]:https://www.reddit.com/submit?url=https://itsfoss.com/install-software-from-source-code/&title=How+to+Install+Software+from+Source+Code%E2%80%A6+and+Remove+it+Afterwards +[8]:https://itsfoss.com/remove-install-software-ubuntu/ +[9]:https://nodejs.org/en/ +[10]:https://github.com/nodejs/node +[11]:https://en.wikipedia.org/wiki/GitHub +[12]:https://en.wikipedia.org/wiki/Git +[13]:https://en.wikipedia.org/wiki/Version_control +[14]:https://stackoverflow.com/questions/1457103/how-is-a-tag-different-from-a-branch-which-should-i-use-here +[15]:https://en.wikipedia.org/wiki/Graphical_user_interface +[16]:https://en.wikipedia.org/wiki/GNU_Build_System +[17]:https://en.wikipedia.org/wiki/Make_%28software +[18]:https://itsfoss.com/pro-vim-tips/ +[19]:http://www.pathname.com/fhs/ +[20]:https://en.wikipedia.org/wiki/C%2B%2B +[21]:https://en.wikipedia.org/wiki/Compiler +[22]:https://packages.debian.org/sid/build-essential +[23]:https://superuser.com/questions/590808/yum-install-gcc-g-doesnt-work-anymore-in-centos-6-4 +[24]:https://en.wikipedia.org/wiki/List_of_text_editors +[25]:https://github.com/nodejs/node/blob/v8.1.1/src/node.cc#L3830 +[26]:https://en.wikipedia.org/wiki/Environment_variable +[27]:https://en.wikipedia.org/wiki/Dependency_hell From a9122a0ddad62157fd7352d237aba7f30e4b84d3 Mon Sep 17 00:00:00 2001 From: Ezio Date: Wed, 11 Oct 2017 20:58:01 +0800 Subject: [PATCH 22/79] =?UTF-8?q?20171011-9=20=E9=80=89=E9=A2=98?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- ...20170903 Genymotion vs Android Emulator.md | 139 ++++++++++++++++++ 1 file changed, 139 insertions(+) create mode 100644 sources/tech/20170903 Genymotion vs Android Emulator.md diff --git a/sources/tech/20170903 Genymotion vs Android Emulator.md b/sources/tech/20170903 Genymotion vs Android Emulator.md new file mode 100644 index 0000000000..d28ec170a1 --- /dev/null +++ b/sources/tech/20170903 Genymotion vs Android Emulator.md @@ -0,0 +1,139 @@ +Genymotion vs Android Emulator +============================================================ + +### Has the Android emulator improved enough to take on Genymotion + + +There has always been a debate about which android emulator to choose or to go with Genymotion, I’ve seen most of the discussion ending in favor of Genymotion. +I’ve gathered some data around the most common use case according to me, based on this I’ll be evaluating all the android emulators along with Genymotion. + +TL;DR: Android emulator is faster than Genymotion when configured right. +Use x86 (32 bit) image with Google APIs, 3GB ram, quad-core CPU. + +> Pheww, glad we’re past that +> Now, let’s dive deep + +Disclaimer: I’ve tested my use case which to me looks the general use case i.e. running tests. All benchmarks were done on a mid-2015 MacBook Pro. +Wherever I say Genymotion I mean Genymotion Desktop. They have other products like Genymotion on Cloud & Genymotion on Demand which are not being considered here. +I’m not saying Genymotion is inadequate but is slower when running tests compared to certain Android emulators. + +A little background on the subject and then we’ll jump to the good stuff. + + _Psst: I have some benchmarks down the line, stick around._ + +Long ago Android emulator was the only way to go. But they were too slow to use, the reason being a change of architecture. +What can you expect out of an ARM emulator running on an x86 machine? Every instruction had to be converted from ARM to x86 architecture which makes it really slow. + +Then came along the x86 images of Android which are way faster as they get rid of the ARM to x86 platform change. +Now you can run x86 Android emulator on an x86 machine. + +> _Problem solved!!!_ +> +> NO! + +Android emulators were still slow from what people wanted. +Then came along Genymotion, which is just an Android VM running in a virtual box. But it is quite stable & fast compared to plain old android emulators which run on qemu. + +Let’s jump to how the situation is today. + +My team is using Genymotion in CI infrastructure and on developer machines. The task at hand was to get rid of all the Genymotion used in our CI infrastructure and developer machines. + +> You ask why? +> Licenses cost money. Duh… + +At a quick glance it seems like a stupid move as Android emulators are slow and buggy, they seem counterproductive, but when you get into the nitty-gritty of the situation you’ll actually find Android emulator to be superior. + +Our use case is to run integration tests on them (mostly espresso). +We have just over 1100 tests in our app and Genymotion takes ~23 minutes to run all the tests. + +A few other problems we were facing with Genymotion. + +* Limited command line tools ([GMTool][1]). + +* They needed periodic restart because of memory issues. This was a manual task, imagine doing it on a CI infrastructure with lots of machines. + +Enter Android Emulator + +The first time you try to set-up one of these it gives you so many options that you’ll feel like you are in Subway restaurant. +The biggest question of all is x86 or x86_64 and with Google APIs or without them. + +I did some research and benchmarking with these combinations and this is what we came up with. + +Drum Roll… + +> The winner of the competition is x86 with Google APIs +> But how? why? + +Well, I’ll tell you the problem with every one of them. + +x86_64 is slower compared to x86 + +> By how much you ask. +> +> 28.2% much !!! + +Emulator with Google APIs is more stable, things tend to crash without them. + +This brings us to the conclusion that the best one is x86 with Google APIs. + +Before we pit our winning emulator against Genymotion. There are few more details that are of great importance. + +* I’ve used Nexus 5 system image with Google APIs. + +* I noticed that giving emulator less ram caused a lot of Google API crashes. So I’ve settled for 3GB of ram for an emulator. + +* The emulator has a quad-core. + +* HAXM was installed on the host machine. + +Time for few benchmark + +![Genymotion and Android Emulator Espresso Benchmark](https://d33wubrfki0l68.cloudfront.net/5ffb16e99dbccd5f6e4848d7a1b6b92646fea15f/1356a/assets/images/genymotion-vs-android-emulator/espressobenchmark.png) + +![Linpack](https://d33wubrfki0l68.cloudfront.net/e5c28d737abf8dee69333f83657928c362157b4e/ede85/assets/images/genymotion-vs-android-emulator/linpack.png) + +![Geekbench 4](https://d33wubrfki0l68.cloudfront.net/b5af78db6d6eddd090d601fcf32c11e7622759f0/b00c1/assets/images/genymotion-vs-android-emulator/geekbench4.png) + +From the benchmarks, you can see that Android emulator beats Genymotion expect for in Geekbench4 which to me feels more of a virtual box beating qemu thing. + +> All hail the King of the Emulators + +We are now having a faster test execution time, better command line tools. Also with the latest [Android Emulator][2], things have gone a notch up. Faster boot time and what not. + +Google has been working very hard to + +> Make Android Emulator great again + +If you haven’t been using android emulator from some time. I’d suggest you revisit and save some money. + +One other solution which I was trying but couldn’t really get it to work was running an [Android-x86][3] image on AWS. +I was able to run it on a vSphere ESXi Hypervisor but not on AWS or any other cloud platform. If someone knows anything about it do comment below. + +PS: [VMWare is now available on AWS][4], [Android-x86][5] on AWS might be possible after all. + +-------------------------------------------------------------------------------- + +作者简介: + +Hi, My name is Sumit Gupta. I’m a software/application/web developer, from Gurgaon, India. +I’m in this business because I love technology and it never fails to fascinate me. I’ve been working for more than 3 years, in this time I’ve had many learning though this time. Don’t they say If you have knowledge, let others light their candles in it. + +When it’s compiling, I read articles, lots of them or listening to music. + +Below are my social feeds & [email][6] if you want to reach me. + +via: https://www.plightofbyte.com/android/2017/09/03/genymotion-vs-android-emulator/ + +作者:[Sumit Gupta ][a] +译者:[译者ID](https://github.com/译者ID) +校对:[校对者ID](https://github.com/校对者ID) + +本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 + +[a]:https://www.plightofbyte.com/about-me +[1]:https://docs.genymotion.com/Content/04_Tools/GMTool/GMTool.htm +[2]:https://developer.android.com/studio/releases/emulator.html +[3]:http://www.android-x86.org/ +[4]:https://aws.amazon.com/vmware/ +[5]:http://www.android-x86.org/ +[6]:thesumitgupta@outlook.com From 22ec0afc6e48aabe0fe0e8ee9956ccaad4b57fbf Mon Sep 17 00:00:00 2001 From: Ezio Date: Wed, 11 Oct 2017 21:05:42 +0800 Subject: [PATCH 23/79] =?UTF-8?q?20171011-10=20=E9=80=89=E9=A2=98?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- ...ng Languages and Code Quality in GitHub.md | 412 ++++++++++++++++++ 1 file changed, 412 insertions(+) create mode 100644 sources/tech/20171007 A Large-Scale Study of Programming Languages and Code Quality in GitHub.md diff --git a/sources/tech/20171007 A Large-Scale Study of Programming Languages and Code Quality in GitHub.md b/sources/tech/20171007 A Large-Scale Study of Programming Languages and Code Quality in GitHub.md new file mode 100644 index 0000000000..22986eaa19 --- /dev/null +++ b/sources/tech/20171007 A Large-Scale Study of Programming Languages and Code Quality in GitHub.md @@ -0,0 +1,412 @@ +A Large-Scale Study of Programming Languages and Code Quality in GitHub +============================================================ + + +![A Large-Scale Study of Programming Languages, illustration](https://cacm.acm.org/system/assets/0002/8759/092117_Getty_Large-Scale-Study1.large.jpg?1506007488&1506007487 "A Large-Scale Study of Programming Languages, illustration") + +What is the effect of programming languages on software quality? This question has been a topic of much debate for a very long time. In this study, we gather a very large data set from GitHub (728 projects, 63 million SLOC, 29,000 authors, 1.5 million commits, in 17 languages) in an attempt to shed some empirical light on this question. This reasonably large sample size allows us to use a mixed-methods approach, combining multiple regression modeling with visualization and text analytics, to study the effect of language features such as static versus dynamic typing and allowing versus disallowing type confusion on software quality. By triangulating findings from different methods, and controlling for confounding effects such as team size, project size, and project history, we report that language design does have a significant, but modest effect on software quality. Most notably, it does appear that disallowing type confusion is modestly better than allowing it, and among functional languages, static typing is also somewhat better than dynamic typing. We also find that functional languages are somewhat better than procedural languages. It is worth noting that these modest effects arising from language design are overwhelmingly dominated by the process factors such as project size, team size, and commit size. However, we caution the reader that even these modest effects might quite possibly be due to other, intangible process factors, for example, the preference of certain personality types for functional, static languages that disallow type confusion. + +[Back to Top][46] + +### 1\. Introduction + +A variety of debates ensue during discussions whether a given programming language is "the right tool for the job." While some of these debates may appear to be tinged with an almost religious fervor, most agree that programming language choice can impact both the coding process and the resulting artifact. + +Advocates of strong, static typing tend to believe that the static approach catches defects early; for them, an ounce of prevention is worth a pound of cure. Dynamic typing advocates argue, however, that conservative static type checking is wasteful of developer resources, and that it is better to rely on strong dynamic type checking to catch type errors as they arise. These debates, however, have largely been of the armchair variety, supported only by anecdotal evidence. + +This is perhaps not unreasonable; obtaining empirical evidence to support such claims is a challenging task given the number of other factors that influence software engineering outcomes, such as code quality, language properties, and usage domains. Considering, for example, software quality, there are a number of well-known influential factors, such as code size,[6][1] team size,[2][2]and age/maturity.[9][3] + +Controlled experiments are one approach to examining the impact of language choice in the face of such daunting confounds, however, owing to cost, such studies typically introduce a confound of their own, that is, limited scope. The tasks completed in such studies are necessarily limited and do not emulate  _real world_  development. There have been several such studies recently that use students, or compare languages with static or dynamic typing through an experimental factor.[7][4], [12][5],[15][6] + +Fortunately, we can now study these questions over a large body of real-world software projects. GitHub contains many projects in multiple languages that substantially vary across size, age, and number of developers. Each project repository provides a detailed record, including contribution history, project size, authorship, and defect repair. We then use a variety of tools to study the effects of language features on defect occurrence. Our approach is best described as mixed-methods, or triangulation[5][7] approach; we use text analysis, clustering, and visualization to confirm and support the findings of a quantitative regression study. This empirical approach helps us to understand the practical impact of programming languages, as they are used colloquially by developers, on software quality. + +[Back to Top][47] + +### 2\. Methodology + +Our methods are typical of large scale observational studies in software engineering. We first gather our data from several sources using largely automated methods. We then filter and clean the data in preparation for building a statistical model. We further validate the model using qualitative methods. Filtering choices are driven by a combination of factors including the nature of our research questions, the quality of the data and beliefs about which data is most suitable for statistical study. In particular, GitHub contains many projects written in a large number of programming languages. For this study, we focused our data collection efforts on the most popular projects written in the most popular languages. We choose statistical methods appropriate for evaluating the impact of factors on count data. + +![*](http://dl.acm.org/images/bullet.gif) + **2.1\. Data collection** + +We choose the top 19 programming languages from GitHub. We disregard CSS, Shell script, and Vim script as they are not considered to be general purpose languages. We further include `Typescript`, a typed superset of `JavaScript`. Then, for each of the studied languages we retrieve the top 50 projects that are primarily written in that language. In total, we analyze 850 projects spanning 17 different languages. + +Our language and project data was extracted from the  _GitHub Archive_ , a database that records all public GitHub activities. The archive logs 18 different GitHub events including new commits, fork events, pull request, developers' information, and issue tracking of all the open source GitHub projects on an hourly basis. The archive data is uploaded to Google BigQuery to provide an interface for interactive data analysis. + +**Identifying top languages.** We aggregate projects based on their primary language. Then we select the languages with the most projects for further analysis, as shown in [Table 1][48]. A given project can use many languages; assigning a single language to it is difficult. Github Archive stores information gathered from GitHub Linguist which measures the language distribution of a project repository using the source file extensions. The language with the maximum number of source files is assigned as the  _primary language_  of the project. + + [![t1.jpg](http://deliveryimages.acm.org/10.1145/3130000/3126905/t1.jpg)][49] +**Table 1\. Top 3 projects in each language.** + +**Retrieving popular projects.** For each selected language, we filter the project repositories written primarily in that language by its popularity based on the associated number of  _stars._ This number indicates how many people have actively expressed interest in the project, and is a reasonable proxy for its popularity. Thus, the top 3 projects in C are  _linux, git_ , and  _php-src_ ; and for C++ they are  _node-webkit, phantomjs_ , and  _mongo_ ; and for `Java` they are  _storm, elasticsearch_ , and  _ActionBarSherlock._  In total, we select the top 50 projects in each language. + +To ensure that these projects have a sufficient development history, we drop the projects with fewer than 28 commits (28 is the first quartile commit count of considered projects). This leaves us with 728 projects. [Table 1][50] shows the top 3 projects in each language. + +**Retrieving project evolution history.** For each of 728 projects, we downloaded the non-merged commits, commit logs, author date, and author name using  _git._  We compute code churn and the number of files modified per commit from the number of added and deleted lines per file. We retrieve the languages associated with each commit from the extensions of the modified files (a commit can have multiple language tags). For each commit, we calculate its  _commit age_  by subtracting its commit date from the first commit of the corresponding project. We also calculate other project-related statistics, including maximum commit age of a project and the total number of developers, used as control variables in our regression model, and discussed in Section 3\. We identify bug fix commits made to individual projects by searching for error related keywords: "error," "bug," "fix," "issue," "mistake," "incorrect," "fault," "defect," and "flaw," in the commit log, similar to a prior study.[18][8] + +[Table 2][51] summarizes our data set. Since a project may use multiple languages, the second column of the table shows the total number of projects that use a certain language at some capacity. We further exclude some languages from a project that have fewer than 20 commits in that language, where 20 is the first quartile value of the total number of commits per project per language. For example, we find 220 projects that use more than 20 commits in C. This ensures sufficient activity for each language–project pair. + + [![t2.jpg](http://deliveryimages.acm.org/10.1145/3130000/3126905/t2.jpg)][52] +**Table 2\. Study subjects.** + +In summary, we study 728 projects developed in 17 languages with 18 years of history. This includes 29,000 different developers, 1.57 million commits, and 564,625 bug fix commits. + +![*](http://dl.acm.org/images/bullet.gif) + **2.2\. Categorizing languages** + +We define language classes based on several properties of the language thought to influence language quality,[7][9], [8][10], [12][11] as shown in [Table 3][53]. The  _Programming Paradigm_  indicates whether the project is written in an imperative procedural, imperative scripting, or functional language. In the rest of the paper, we use the terms procedural and scripting to indicate imperative procedural and imperative scripting respectively. + + [![t3.jpg](http://deliveryimages.acm.org/10.1145/3130000/3126905/t3.jpg)][54] +**Table 3\. Different types of language classes.** + + _Type Checking_  indicates static or dynamic typing. In statically typed languages, type checking occurs at compile time, and variable names are bound to a value and to a type. In addition, expressions (including variables) are classified by types that correspond to the values they might take on at run-time. In dynamically typed languages, type checking occurs at run-time. Hence, in the latter, it is possible to bind a variable name to objects of different types in the same program. + + _Implicit Type Conversion_  allows access of an operand of type T1 as a different type T2, without an explicit conversion. Such implicit conversion may introduce type-confusion in some cases, especially when it presents an operand of specific type T1, as an instance of a different type T2\. Since not all implicit type conversions are immediately a problem, we operationalize our definition by showing examples of the implicit type confusion that can happen in all the languages we identified as allowing it. For example, in languages like `Perl, JavaScript`, and `CoffeeScript` adding a string to a number is permissible (e.g., "5" + 2 yields "52"). The same operation yields 7 in `Php`. Such an operation is not permitted in languages such as `Java` and `Python` as they do not allow implicit conversion. In C and C++ coercion of data types can result in unintended results, for example, `int x; float y; y=3.5; x=y`; is legal C code, and results in different values for x and y, which, depending on intent, may be a problem downstream.[a][12] In `Objective-C` the data type  _id_  is a generic object pointer, which can be used with an object of any data type, regardless of the class.[b][13] The flexibility that such a generic data type provides can lead to implicit type conversion and also have unintended consequences.[c][14]Hence, we classify a language based on whether its compiler  _allows_  or  _disallows_  the implicit type conversion as above; the latter explicitly detects type confusion and reports it. + +Disallowing implicit type conversion could result from static type inference within a compiler (e.g., with `Java`), using a type-inference algorithm such as Hindley[10][15] and Milner,[17][16] or at run-time using a dynamic type checker. In contrast, a type-confusion can occur silently because it is either undetected or is unreported. Either way, implicitly allowing type conversion provides flexibility but may eventually cause errors that are difficult to localize. To abbreviate, we refer to languages allowing implicit type conversion as  _implicit_  and those that disallow it as  _explicit._ + + _Memory Class_  indicates whether the language requires developers to manage memory. We treat `Objective-C` as unmanaged, in spite of it following a hybrid model, because we observe many memory errors in its codebase, as discussed in RQ4 in Section 3. + +Note that we classify and study the languages as they are colloquially used by developers in real-world software. For example, `TypeScript` is intended to be used as a static language, which disallows implicit type conversion. However, in practice, we notice that developers often (for 50% of the variables, and across `TypeScript`-using projects in our dataset) use the `any` type, a catch-all union type, and thus, in practice, `TypeScript` allows dynamic, implicit type conversion. To minimize the confusion, we exclude `TypeScript` from our language classifications and the corresponding model (see [Table 3][55] and [7][56]). + +![*](http://dl.acm.org/images/bullet.gif) + **2.3\. Identifying project domain** + +We classify the studied projects into different domains based on their features and function using a mix of automated and manual techniques. The projects in GitHub come with `project descriptions` and README files that describe their features. We used Latent Dirichlet Allocation (LDA)[3][17] to analyze this text. Given a set of documents, LDA identifies a set of topics where each topic is represented as probability of generating different words. For each document, LDA also estimates the probability of assigning that document to each topic. + +We detect 30 distinct domains, that is, topics, and estimate the probability that each project belonging to each domain. Since these auto-detected domains include several project-specific keywords, for example, facebook, it is difficult to identify the underlying common functions. In order to assign a meaningful name to each domain, we manually inspect each of the 30 domains to identify projectname-independent, domain-identifying keywords. We manually rename all of the 30 auto-detected domains and find that the majority of the projects fall under six domains: Application, Database, CodeAnalyzer, Middleware, Library, and Framework. We also find that some projects do not fall under any of the above domains and so we assign them to a catchall domain labeled as  _Other_ . This classification of projects into domains was subsequently checked and confirmed by another member of our research group. [Table 4][57] summarizes the identified domains resulting from this process. + + [![t4.jpg](http://deliveryimages.acm.org/10.1145/3130000/3126905/t4.jpg)][58] +**Table 4\. Characteristics of domains.** + +![*](http://dl.acm.org/images/bullet.gif) + **2.4\. Categorizing bugs** + +While fixing software bugs, developers often leave important information in the commit logs about the nature of the bugs; for example, why the bugs arise and how to fix the bugs. We exploit such information to categorize the bugs, similar to Tan  _et al._ [13][18], [24][19] + +First, we categorize the bugs based on their  _Cause_  and  _Impact. Causes_  are further classified into disjoint subcategories of errors: Algorithmic, Concurrency, Memory, generic Programming, and Unknown. The bug  _Impact_  is also classified into four disjoint subcategories: Security, Performance, Failure, and Other unknown categories. Thus, each bug-fix commit also has an induced Cause and an Impact type. [Table 5][59] shows the description of each bug category. This classification is performed in two phases: + + [![t5.jpg](http://deliveryimages.acm.org/10.1145/3130000/3126905/t5.jpg)][60] +**Table 5\. Categories of bugs and their distribution in the whole dataset.** + +**(1) Keyword search.** We randomly choose 10% of the bug-fix messages and use a keyword based search technique to automatically categorize them as potential bug types. We use this annotation, separately, for both Cause and Impact types. We chose a restrictive set of keywords and phrases, as shown in [Table 5][61]. Such a restrictive set of keywords and phrases helps reduce false positives. + +**(2) Supervised classification.** We use the annotated bug fix logs from the previous step as training data for supervised learning techniques to classify the remainder of the bug fix messages by treating them as test data. We first convert each bug fix message to a bag-of- words. We then remove words that appear only once among all of the bug fix messages. This reduces project specific keywords. We also stem the bag-of- words using standard natural language processing techniques. Finally, we use Support Vector Machine to classify the test data. + +To evaluate the accuracy of the bug classifier, we manually annotated 180 randomly chosen bug fixes, equally distributed across all of the categories. We then compare the result of the automatic classifier with the manually annotated data set. The performance of this process was acceptable with precision ranging from a low of 70% for performance bugs to a high of 100% for concurrency bugs with an average of 84%. Recall ranged from 69% to 91% with an average of 84%. + +The result of our bug classification is shown in [Table 5][62]. Most of the defect causes are related to generic programming errors. This is not surprising as this category involves a wide variety of programming errors such as type errors, typos, compilation error, etc. Our technique could not classify 1.04% of the bug fix messages in any Cause or Impact category; we classify these as Unknown. + +![*](http://dl.acm.org/images/bullet.gif) + **2.5\. Statistical methods** + +We model the number of defective commits against other factors related to software projects using regression. All models use  _negative binomial regression_  (NBR) to model the counts of project attributes such as the number of commits. NBR is a type of generalized linear model used to model non-negative integer responses.[4][20] + +In our models we control for several language per-project dependent factors that are likely to influence the outcome. Consequently, each (language, project) pair is a row in our regression and is viewed as a sample from the population of open source projects. We log-transform dependent count variables as it stabilizes the variance and usually improves the model fit.[4][21] We verify this by comparing transformed with non transformed data using the AIC and Vuong's test for non-nested models. + +To check that excessive multicollinearity is not an issue, we compute the variance inflation factor of each dependent variable in all of the models with a conservative maximum value of 5.[4][22]We check for and remove high leverage points through visual examination of the residuals versus leverage plot for each model, looking for both separation and large values of Cook's distance. + +We employ  _effects_ , or  _contrast_ , coding in our study to facilitate interpretation of the language coefficients.[4][23] Weighted effects codes allow us to compare each language to the average effect across all languages while compensating for the unevenness of language usage across projects.[23][24]To test for the relationship between two factor variables we use a Chi-square test of independence.[14][25] After confirming a dependence we use Cramer's V, an  _r_  ×  _c_  equivalent of the phi coefficient for nominal data, to establish an effect size. + +[Back to Top][63] + +### 3\. Results + +We begin with a straightforward question that directly addresses the core of what some fervently believe must be true, namely: + +**RQ1\. Are some languages more defect-prone than others?** + +We use a regression model to compare the impact of each language on the number of defects with the average impact of all languages, against defect fixing commits (see [Table 6][64]). + + [![t6.jpg](http://deliveryimages.acm.org/10.1145/3130000/3126905/t6.jpg)][65] +**Table 6\. Some languages induce fewer defects than other languages.** + +We include some variables as controls for factors that will clearly influence the response. Project age is included as older projects will generally have a greater number of defect fixes. Trivially, the number of commits to a project will also impact the response. Additionally, the number of developers who touch a project and the raw size of the project are both expected to grow with project activity. + +The sign and magnitude of the estimated coefficients in the above model relates the predictors to the outcome. The first four variables are control variables and we are not interested in their impact on the outcome other than to say that they are all positive and significant. The language variables are indicator variables, viz. factor variables, for each project. The coefficient compares each language to the grand weighted mean of all languages in all projects. The language coefficients can be broadly grouped into three general categories. The first category is those for which the coefficient is statistically insignificant and the modeling procedure could not distinguish the coefficient from zero. These languages may behave similar to the average or they may have wide variance. The remaining coefficients are significant and either positive or negative. For those with positive coefficients we can expect that the language is associated with a greater number of defect fixes. These languages include `C, C++, Objective-C, Php`, and `Python`. The languages `Clojure, Haskell, Ruby`, and `Scala`, all have negative coefficients implying that these languages are less likely than average to result in defect fixing commits. + +One should take care not to overestimate the impact of language on defects. While the observed relationships are statistically significant, the effects are quite small. Analysis of deviance reveals that language accounts for less than 1% of the total explained deviance. + + [![ut1.jpg](http://deliveryimages.acm.org/10.1145/3130000/3126905/ut1.jpg)][66] + +We can read the model coefficients as the expected change in the log of the response for a one unit change in the predictor with all other predictors held constant; that is, for a coefficient  _βi_ , a one unit change in  _βi_  yields an expected change in the response of e _βi_ . For the factor variables, this expected change is compared to the average across all languages. Thus, if, for some number of commits, a particular project developed in an  _average_  language had four defective commits, then the choice to use C++ would mean that we should expect one additional defective commit since e0.18 × 4 = 4.79\. For the same project, choosing `Haskell` would mean that we should expect about one fewer defective commit as  _e_ −0.26 × 4 = 3.08\. The accuracy of this prediction depends on all other factors remaining the same, a challenging proposition for all but the most trivial of projects. All observational studies face similar limitations; we address this concern in more detail in Section 5. + +**Result 1:**  _Some languages have a greater association with defects than other languages, although the effect is small._ + +In the remainder of this paper we expand on this basic result by considering how different categories of application, defect, and language, lead to further insight into the relationship between languages and defect proneness. + +Software bugs usually fall under two broad categories: (1)  _Domain Specific bug_ : specific to project function and do not depend on the underlying programming language. (2)  _Generic bug_ : more generic in nature and has less to do with project function, for example, typeerrors, concurrency errors, etc. + +Consequently, it is reasonable to think that the interaction of application domain and language might impact the number of defects within a project. Since some languages are believed to excel at some tasks more so than others, for example, C for low level work, or `Java` for user applications, making an inappropriate choice might lead to a greater number of defects. To study this we should ideally ignore the domain specific bugs, as generic bugs are more likely to depend on the programming language featured. However, since a domain-specific bugs may also arise due to a generic programming error, it is difficult to separate the two. A possible workaround is to study languages while controlling the domain. Statistically, however, with 17 languages across 7 domains, the large number of terms would be challenging to interpret given the sample size. + +Given this, we first consider testing for the dependence between domain and language usage within a project, using a Chi-square test of independence. Of 119 cells, 46, that is, 39%, are below the value of 5 which is too high. No more than 20% of the counts should be below 5.[14][26] We include the value here for completeness[d][27]; however, the low strength of association of 0.191 as measured by Cramer's V, suggests that any relationship between domain and language is small and that inclusion of domain in regression models would not produce meaningful results. + +One option to address this concern would be to remove languages or combine domains, however, our data here presents no clear choices. Alternatively, we could combine languages; this choice leads to a related but slightly different question. + +**RQ2\. Which language properties relate to defects?** + +Rather than considering languages individually, we aggregate them by language class, as described in Section 2.2, and analyze the relationship to defects. Broadly, each of these properties divides languages along lines that are often discussed in the context of errors, drives user debate, or has been the subject of prior work. Since the individual properties are highly correlated, we create six model factors that combine all of the individual factors across all of the languages in our study. We then model the impact of the six different factors on the number of defects while controlling for the same basic covariates that we used in the model in  _RQ1_ . + +As with language (earlier in [Table 6][67]), we are comparing language  _classes_  with the average behavior across all language classes. The model is presented in [Table 7][68]. It is clear that `Script-Dynamic-Explicit-Managed` class has the smallest magnitude coefficient. The coefficient is insignificant, that is, the z-test for the coefficient cannot distinguish the coefficient from zero. Given the magnitude of the standard error, however, we can assume that the behavior of languages in this class is very close to the average across all languages. We confirm this by recoding the coefficient using `Proc-Static-Implicit-Unmanaged` as the base level and employing treatment, or dummy coding that compares each language class with the base level. In this case, `Script-Dynamic-Explicit-Managed` is significantly different with  _p_  = 0.00044\. We note here that while choosing different coding methods affects the coefficients and z-scores, the models are identical in all other respects. When we change the coding we are rescaling the coefficients to reflect the comparison that we wish to make.[4][28] Comparing the other language classes to the grand mean, `Proc-Static-Implicit-Unmanaged` languages are more likely to induce defects. This implies that either implicit type conversion or memory management issues contribute to greater defect proneness as compared with other procedural languages. + + [![t7.jpg](http://deliveryimages.acm.org/10.1145/3130000/3126905/t7.jpg)][69] +**Table 7\. Functional languages have a smaller relationship to defects than other language classes whereas procedural languages are greater than or similar to the average.** + +Among scripting languages we observe a similar relationship between languages that allow versus those that do not allow implicit type conversion, providing some evidence that implicit type conversion (vs. explicit) is responsible for this difference as opposed to memory management. We cannot state this conclusively given the correlation between factors. However when compared to the average, as a group, languages that do not allow implicit type conversion are less error-prone while those that do are more error-prone. The contrast between static and dynamic typing is also visible in functional languages. + +The functional languages as a group show a strong difference from the average. Statically typed languages have a substantially smaller coefficient yet both functional language classes have the same standard error. This is strong evidence that functional static languages are less error-prone than functional dynamic languages, however, the z-tests only test whether the coefficients are different from zero. In order to strengthen this assertion, we recode the model as above using treatment coding and observe that the `Functional-Static-Explicit-Managed` language class is significantly less defect-prone than the `Functional-Dynamic-Explicit-Managed`language class with  _p_  = 0.034. + + [![ut2.jpg](http://deliveryimages.acm.org/10.1145/3130000/3126905/ut2.jpg)][70] + +As with language and defects, the relationship between language class and defects is based on a small effect. The deviance explained is similar, albeit smaller, with language class explaining much less than 1% of the deviance. + +We now revisit the question of application domain. Does domain have an interaction with language class? Does the choice of, for example, a functional language, have an advantage for a particular domain? As above, a Chi-square test for the relationship between these factors and the project domain yields a value of 99.05 and  _df_  = 30 with  _p_  = 2.622e–09 allowing us to reject the null hypothesis that the factors are independent. Cramer's-V yields a value of 0.133, a weak level of association. Consequently, although there is some relation between domain and language, there is only a weak relationship between domain and language class. + +**Result 2:**  _There is a small but significant relationship between language class and defects. Functional languages are associated with fewer defects than either procedural or scripting languages._ + +It is somewhat unsatisfying that we do not observe a strong association between language, or language class, and domain within a project. An alternative way to view this same data is to disregard projects and aggregate defects over all languages and domains. Since this does not yield independent samples, we do not attempt to analyze it statistically, rather we take a descriptive, visualization-based approach. + +We define  _Defect Proneness_  as the ratio of bug fix commits over total commits per language per domain. [Figure 1][71] illustrates the interaction between domain and language using a heat map, where the defect proneness increases from lighter to darker zone. We investigate which language factors influence defect fixing commits across a collection of projects written across a variety of languages. This leads to the following research question: + + [![f1.jpg](http://deliveryimages.acm.org/10.1145/3130000/3126905/f1.jpg)][72] +**Figure 1\. Interaction of language's defect proneness with domain. Each cell in the heat map represents defect proneness of a language (row header) for a given domain (column header). The "Overall" column represents defect proneness of a language over all the domains. The cells with white cross mark indicate null value, that is, no commits were made corresponding to that cell.** + +**RQ3\. Does language defect proneness depend on domain?** + +In order to answer this question we first filtered out projects that would have been viewed as outliers, filtered as high leverage points, in our regression models. This was necessary here as, even though this is a nonstatistical method, some relationships could impact visualization. For example, we found that a single project, Google's v8, a `JavaScript` project, was responsible for all of the errors in Middleware. This was surprising to us since `JavaScript` is typically not used for Middleware. This pattern repeats in other domains, consequently, we filter out the projects that have defect density below 10 and above 90 percentile. The result is in [Figure 1][73]. + +We see only a subdued variation in this heat map which is a result of the inherent defect proneness of the languages as seen in RQ1\. To validate this, we measure the pairwise rank correlation between the language defect proneness for each domain with the overall. For all of the domains except Database, the correlation is positive, and p-values are significant (<0.01). Thus, w.r.t. defect proneness, the language ordering in each domain is strongly correlated with the overall language ordering. + + [![ut3.jpg](http://deliveryimages.acm.org/10.1145/3130000/3126905/ut3.jpg)][74] + +**Result 3:**  _There is no general relationship between application domain and language defect proneness._ + +We have shown that different languages induce a larger number of defects and that this relationship is not only related to particular languages but holds for general classes of languages; however, we find that the type of project does not mediate this relationship to a large degree. We now turn our attention to categorization of the response. We want to understand how language relates to specific kinds of defects and how this relationship compares to the more general relationship that we observe. We divide the defects into categories as described in [Table 5][75] and ask the following question: + +**RQ4\. What is the relation between language and bug category?** + +We use an approach similar to RQ3 to understand the relation between languages and bug categories. First, we study the relation between bug categories and language class. A heat map ([Figure 2][76]) shows aggregated defects over language classes and bug types. To understand the interaction between bug categories and languages, we use an NBR regression model for each category. For each model we use the same control factors as RQ1 as well as languages encoded with weighted effects to predict defect fixing commits. + + [![f2.jpg](http://deliveryimages.acm.org/10.1145/3130000/3126905/f2.jpg)][77] +**Figure 2\. Relation between bug categories and language class. Each cell represents percentage of bug fix commit out of all bug fix commits per language class (row header) per bug category (column header). The values are normalized column wise.** + +The results along with the anova value for language are shown in [Table 8][78]. The overall deviance for each model is substantially smaller and the proportion explained by language for a specific defect type is similar in magnitude for most of the categories. We interpret this relationship to mean that language has a greater impact on specific categories of bugs, than it does on bugs overall. In the next section we expand on these results for the bug categories with significant bug counts as reported in [Table 5][79]. However, our conclusion generalizes for all categories. + + [![t8.jpg](http://deliveryimages.acm.org/10.1145/3130000/3126905/t8.jpg)][80] +**Table 8\. While the impact of language on defects varies across defect category, language has a greater impact on specific categories than it does on defects in general.** + +**Programming errors.** Generic programming errors account for around 88.53% of all bug fix commits and occur in all the language classes. Consequently, the regression analysis draws a similar conclusion as of RQ1 (see [Table 6][81]). All languages incur programming errors such as faulty error-handling, faulty definitions, typos, etc. + +**Memory errors.** Memory errors account for 5.44% of all the bug fix commits. The heat map in [Figure 2][82] shows a strong relationship between `Proc-Static-Implicit-Unmanaged` class and memory errors. This is expected as languages with unmanaged memory are known for memory bugs. [Table 8][83]confirms that such languages, for example, C, C++, and `Objective-C` introduce more memory errors. Among the managed languages, `Java` induces more memory errors, although fewer than the unmanaged languages. Although `Java` has its own garbage collector, memory leaks are not surprising since unused object references often prevent the garbage collector from reclaiming memory.[11][29] In our data, 28.89% of all the memory errors in `Java` are the result of a memory leak. In terms of effect size, language has a larger impact on memory defects than all other  _cause_ categories. + +**Concurrency errors.** 1.99% of the total bug fix commits are related to concurrency errors. The heat map shows that `Proc-Static-Implicit-Unmanaged` dominates this error type. C and C++ introduce 19.15% and 7.89% of the errors, and they are distributed across the projects. + + [![ut4.jpg](http://deliveryimages.acm.org/10.1145/3130000/3126905/ut4.jpg)][84] + +Both of the `Static-Strong-Managed` language classes are in the darker zone in the heat map confirming, in general static languages produce more concurrency errors than others. Among the dynamic languages, only `Erlang` is more prone to concurrency errors, perhaps relating to the greater use of this language for concurrent applications. Likewise, the negative coefficients in [Table 8][85] shows that projects written in dynamic languages like `Ruby` and `Php` have fewer concurrency errors. Note that, certain languages like `JavaScript, CoffeeScript`, and `TypeScript` do not support concurrency, in its traditional form, while `Php` has a limited support depending on its implementations. These languages introduce artificial zeros in the data, and thus the concurrency model coefficients in [Table 8][86] for those languages cannot be interpreted like the other coefficients. Due to these artificial zeros, the average over all languages in this model is smaller, which may affect the sizes of the coefficients, since they are given w.r.t. the average, but it will not affect their relative relationships, which is what we are after. + +A textual analysis based on word-frequency of the bug fix messages suggests that most of the concurrency errors occur due to a race condition, deadlock, or incorrect synchronization, as shown in the table above. Across all language, race conditions are the most frequent cause of such errors, for example, 92% in `Go`. The enrichment of race condition errors in `Go` is probably due to an accompanying race-detection tool that may help developers locate races. The synchronization errors are primarily related to message passing interface (MPI) or shared memory operation (SHM). `Erlang` and `Go` use MPI[e][30] for inter-thread communication, which explains why these two languages do not have any SHM related errors such as locking, mutex, etc. In contrast, projects in the other languages use SHM primitives for communication and can thus may have locking-related errors. + +**Security and other impact errors.** Around 7.33% of all the bug fix commits are related to Impact errors. Among them `Erlang, C++`, and `Python` associate with more security errors than average ([Table 8][87]). `Clojure` projects associate with fewer security errors ([Figure 2][88]). From the heat map we also see that `Static` languages are in general more prone to failure and performance errors, these are followed by `Functional-Dynamic-Explicit-Managed` languages such as `Erlang`. The analysis of deviance results confirm that language is strongly associated with failure impacts. While security errors are the weakest among the categories, the deviance explained by language is still quite strong when compared with the residual deviance. + +**Result 4:**  _Defect types are strongly associated with languages; some defect type like memory errors and concurrency errors also depend on language primitives. Language matters more for specific categories than it does for defects overall._ + +[Back to Top][89] + +### 4\. Related Work + +Prior work on programming language comparison falls in three categories: + +**(1)  _Controlled experiment._**  For a given task, developers are monitored while programming in different languages. Researchers then compare outcomes such as development effort and program quality. Hanenberg[7][31] compared static versus dynamic typing by monitoring 48 programmers for 27 h while developing a parser program. He found no significant difference in code quality between the two; however, dynamic type-based languages were found to have shorter development time. Their study was conducted with undergraduate students in a lab setting with custom-designed language and IDE. Our study, by contrast is a field study of popular software applications. While we can only indirectly (and  _post facto_ ) control for confounding factors using regression, we benefit from much larger sample sizes, and more realistic, widely-used software. We find that statically typed languages in general are less defect-prone than the dynamic types, and that disallowing implicit type conversion is better than allowing it, in the same regard. The effect sizes are modest; it could be reasonably argued that they are visible here precisely because of the large sample sizes. + +Harrison et al.[8][32] compared C++, a procedural language, with `SML`, a functional language, finding no significant difference in total number of errors, although `SML` has higher defect density than C++. `SML` was not represented in our data, which however, suggest that functional languages are generally less defect-prone than procedural languages. Another line of work primarily focuses on comparing development effort across different languages.[12][33], [20][34] However, they do not analyze language defect proneness. + +**(2)  _Surveys._**  Meyerovich and Rabkin[16][35] survey developers' views of programming languages, to study why some languages are more popular than others. They report strong influence from non-linguistic factors: prior language skills, availability of open source tools, and existing legacy systems. Our study also confirms that the availability of external tools also impacts software quality; for example, concurrency bugs in `Go` (see RQ4 in Section 3). + +**(3)  _Repository mining._**  Bhattacharya and Neamtiu[1][36] study four projects developed in both C and C++ and find that the software components developed in C++ are in general more reliable than C. We find that both C and C++ are more defect-prone than average. However, for certain bug types like concurrency errors, C is more defect-prone than C++ (see RQ4 in Section 3). + +[Back to Top][90] + +### 5\. Threats to Validity + +We recognize few threats to our reported results. First, to identify bug fix commits we rely on the keywords that developers often use to indicate a bug fix. Our choice was deliberate. We wanted to capture the issues that developers continuously face in an ongoing development process, rather than reported bugs. However, this choice possesses threats of over estimation. Our categorization of domains is subject to interpreter bias, although another member of our group verified the categories. Also, our effort to categorize bug fix commits could potentially be tainted by the initial choice of keywords. The descriptiveness of commit logs vary across projects. To mitigate these threats, we evaluate our classification against manual annotation as discussed in Section 2.4. + +We determine the language of a file based on its extension. This can be error-prone if a file written in a different language takes a common language extension that we have studied. To reduce such error, we manually verified language categorization against a randomly sampled file set. + +To interpret language class in Section 2.2, we make certain assumptions based on how a language property is most commonly used, as reflected in our data set, for example, we classify `Objective-C` as unmanaged memory type rather than hybrid. Similarly, we annotate `Scala` as functional and C# as procedural, although both support either design choice.[19][37], [21][38] We do not distinguish object-oriented languages (OOP) from procedural languages in this work as there is no clear distinction, the difference largely depends on programming style. We categorize C++ as allowing implicit type conversion because a memory region of a certain type can be treated differently using pointer manipulation.[22][39] We note that most C++ compilers can detect type errors at compile time. + +Finally, we associate defect fixing commits to language properties, although they could reflect reporting style or other developer properties. Availability of external tools or libraries may also impact the extent of bugs associated with a language. + +[Back to Top][91] + +### 6\. Conclusion + +We have presented a large-scale study of language type and use as it relates to software quality. The Github data we used is characterized by its complexity and variance along multiple dimensions. Our sample size allows a mixed-methods study of the effects of language, and of the interactions of language, domain, and defect type while controlling for a number of confounds. The data indicates that functional languages are better than procedural languages; it suggests that disallowing implicit type conversion is better than allowing it; that static typing is better than dynamic; and that managed memory usage is better than unmanaged. Further, that the defect proneness of languages in general is not associated with software domains. Additionally, languages are more related to individual bug categories than bugs overall. + +On the other hand, even large datasets become small and insufficient when they are sliced and diced many ways simultaneously. Consequently, with an increasing number of dependent variables it is difficult to answer questions about a specific variable's effect, especially where variable interactions exist. Hence, we are unable to quantify the specific effects of language type on usage. Additional methods such as surveys could be helpful here. Addressing these challenges remains for future work. + +[Back to Top][92] + +### Acknowledgments + +This material is based upon work supported by the National Science Foundation under grant nos. 1445079, 1247280, 1414172, 1446683 and from AFOSR award FA955-11-1-0246. + +[Back to Top][93] + +### References + +1\. Bhattacharya, P., Neamtiu, I. Assessing programming language impact on development and maintenance: A study on C and C++. In  _Proceedings of the 33rd International Conference on Software Engineering, ICSE'11_  (New York, NY USA, 2011). ACM, 171–180. + +2\. Bird, C., Nagappan, N., Murphy, B., Gall, H., Devanbu, P. Don't touch my code! Examining the effects of ownership on software quality. In  _Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering_  (2011). ACM, 4–14. + +3\. Blei, D.M. Probabilistic topic models.  _Commun. ACM 55_ , 4 (2012), 77–84. + +4\. Cohen, J.  _Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences._ Lawrence Erlbaum, 2003. + +5\. Easterbrook, S., Singer, J., Storey, M.-A., Damian, D. Selecting empirical methods for software engineering research. In  _Guide to Advanced Empirical Software Engineering_  (2008). Springer, 285–311. + +6\. El Emam, K., Benlarbi, S., Goel, N., Rai, S.N. The confounding effect of class size on the validity of object-oriented metrics.  _IEEE Trans. Softw. Eng. 27_ , 7 (2001), 630–650. + +7\. Hanenberg, S. An experiment about static and dynamic type systems: Doubts about the positive impact of static type systems on development time. In  _Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA'10_  (New York, NY, USA, 2010). ACM, 22–35. + +8\. Harrison, R., Smaraweera, L., Dobie, M., Lewis, P. Comparing programming paradigms: An evaluation of functional and object-oriented programs.  _Softw. Eng. J. 11_ , 4 (1996), 247–254. + +9\. Harter, D.E., Krishnan, M.S., Slaughter, S.A. Effects of process maturity on quality, cycle time, and effort in software product development.  _Manage. Sci. 46_  4 (2000), 451–466. + +10\. Hindley, R. The principal type-scheme of an object in combinatory logic.  _Trans. Am. Math. Soc._  (1969), 29–60. + +11\. Jump, M., McKinley, K.S. Cork: Dynamic memory leak detection for garbage-collected languages. In  _ACM SIGPLAN Notices_ , Volume 42 (2007). ACM, 31–38. + +12\. Kleinschmager, S., Hanenberg, S., Robbes, R., Tanter, É., Stefik, A. Do static type systems improve the maintainability of software systems? An empirical study. In  _2012 IEEE 20th International Conference on Program Comprehension (ICPC)_  (2012). IEEE, 153–162. + +13\. Li, Z., Tan, L., Wang, X., Lu, S., Zhou, Y., Zhai, C. Have things changed now? An empirical study of bug characteristics in modern open source software. In  _ASID'06: Proceedings of the 1st Workshop on Architectural and System Support for Improving Software Dependability_  (October 2006). + +14\. Marques De Sá, J.P.  _Applied Statistics Using SPSS, Statistica and Matlab_ , 2003. + +15\. Mayer, C., Hanenberg, S., Robbes, R., Tanter, É., Stefik, A. An empirical study of the influence of static type systems on the usability of undocumented software. In  _ACM SIGPLAN Notices_ , Volume 47 (2012). ACM, 683–702. + +16\. Meyerovich, L.A., Rabkin, A.S. Empirical analysis of programming language adoption. In  _Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications_  (2013). ACM, 1–18. + +17\. Milner, R. A theory of type polymorphism in programming.  _J. Comput. Syst. Sci. 17_ , 3 (1978), 348–375. + +18\. Mockus, A., Votta, L.G. Identifying reasons for software changes using historic databases. In  _ICSM'00\. Proceedings of the International Conference on Software Maintenance_  (2000). IEEE Computer Society, 120. + +19\. Odersky, M., Spoon, L., Venners, B.  _Programming in Scala._  Artima Inc, 2008. + +20\. Pankratius, V., Schmidt, F., Garretón, G. Combining functional and imperative programming for multicore software: An empirical study evaluating scala and java. In  _Proceedings of the 2012 International Conference on Software Engineering_  (2012). IEEE Press, 123–133. + +21\. Petricek, T., Skeet, J.  _Real World Functional Programming: With Examples in F# and C#._ Manning Publications Co., 2009. + +22\. Pierce, B.C.  _Types and Programming Languages._  MIT Press, 2002. + +23\. Posnett, D., Bird, C., Dévanbu, P. An empirical study on the influence of pattern roles on change-proneness.  _Emp. Softw. Eng. 16_ , 3 (2011), 396–423. + +24\. Tan, L., Liu, C., Li, Z., Wang, X., Zhou, Y., Zhai, C. Bug characteristics in open source software.  _Emp. Softw. Eng._  (2013). + +-------------------------------------------------------------------------------- + +via: https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007 + +作者:[ Baishakhi Ray][a], [Daryl Posnett][b], [Premkumar Devanbu][c], [Vladimir Filkov ][d] +译者:[译者ID](https://github.com/译者ID) +校对:[校对者ID](https://github.com/校对者ID) + +本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 + +[a]:http://delivery.acm.org/10.1145/3130000/3126905/mailto:rayb@virginia.edu +[b]:http://delivery.acm.org/10.1145/3130000/3126905/mailto:dpposnett@ucdavis.edu +[c]:http://delivery.acm.org/10.1145/3130000/3126905/mailto:devanbu@cs.ucdavis.edu +[d]:http://delivery.acm.org/10.1145/3130000/3126905/mailto:filkov@cs.ucdavis.edu +[1]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#R6 +[2]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#R2 +[3]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#R9 +[4]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#R7 +[5]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#R12 +[6]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#R15 +[7]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#R5 +[8]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#R18 +[9]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#R7 +[10]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#R8 +[11]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#R12 +[12]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#FNA +[13]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#FNB +[14]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#FNC +[15]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#R10 +[16]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#R17 +[17]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#R3 +[18]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#R13 +[19]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#R24 +[20]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#R4 +[21]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#R4 +[22]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#R4 +[23]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#R4 +[24]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#R23 +[25]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#R14 +[26]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#R14 +[27]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#FND +[28]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#R4 +[29]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#R11 +[30]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#FNE +[31]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#R7 +[32]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#R8 +[33]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#R12 +[34]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#R20 +[35]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#R16 +[36]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#R1 +[37]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#R19 +[38]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#R21 +[39]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#R22 +[40]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#comments +[41]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007# +[42]:https://cacm.acm.org/about-communications/mobile-apps/ +[43]:http://dl.acm.org/citation.cfm?id=3144574.3126905&coll=portal&dl=ACM +[44]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/pdf +[45]:http://dl.acm.org/ft_gateway.cfm?id=3126905&ftid=1909469&dwn=1 +[46]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#PageTop +[47]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#PageTop +[48]:http://deliveryimages.acm.org/10.1145/3130000/3126905/t1.jpg +[49]:http://deliveryimages.acm.org/10.1145/3130000/3126905/t1.jpg +[50]:http://deliveryimages.acm.org/10.1145/3130000/3126905/t1.jpg +[51]:http://deliveryimages.acm.org/10.1145/3130000/3126905/t2.jpg +[52]:http://deliveryimages.acm.org/10.1145/3130000/3126905/t2.jpg +[53]:http://deliveryimages.acm.org/10.1145/3130000/3126905/t3.jpg +[54]:http://deliveryimages.acm.org/10.1145/3130000/3126905/t3.jpg +[55]:http://deliveryimages.acm.org/10.1145/3130000/3126905/t3.jpg +[56]:http://deliveryimages.acm.org/10.1145/3130000/3126905/t7.jpg +[57]:http://deliveryimages.acm.org/10.1145/3130000/3126905/t4.jpg +[58]:http://deliveryimages.acm.org/10.1145/3130000/3126905/t4.jpg +[59]:http://deliveryimages.acm.org/10.1145/3130000/3126905/t5.jpg +[60]:http://deliveryimages.acm.org/10.1145/3130000/3126905/t5.jpg +[61]:http://deliveryimages.acm.org/10.1145/3130000/3126905/t5.jpg +[62]:http://deliveryimages.acm.org/10.1145/3130000/3126905/t5.jpg +[63]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#PageTop +[64]:http://deliveryimages.acm.org/10.1145/3130000/3126905/t6.jpg +[65]:http://deliveryimages.acm.org/10.1145/3130000/3126905/t6.jpg +[66]:http://deliveryimages.acm.org/10.1145/3130000/3126905/ut1.jpg +[67]:http://deliveryimages.acm.org/10.1145/3130000/3126905/t6.jpg +[68]:http://deliveryimages.acm.org/10.1145/3130000/3126905/t7.jpg +[69]:http://deliveryimages.acm.org/10.1145/3130000/3126905/t7.jpg +[70]:http://deliveryimages.acm.org/10.1145/3130000/3126905/ut2.jpg +[71]:http://deliveryimages.acm.org/10.1145/3130000/3126905/f1.jpg +[72]:http://deliveryimages.acm.org/10.1145/3130000/3126905/f1.jpg +[73]:http://deliveryimages.acm.org/10.1145/3130000/3126905/f1.jpg +[74]:http://deliveryimages.acm.org/10.1145/3130000/3126905/ut3.jpg +[75]:http://deliveryimages.acm.org/10.1145/3130000/3126905/t5.jpg +[76]:http://deliveryimages.acm.org/10.1145/3130000/3126905/f2.jpg +[77]:http://deliveryimages.acm.org/10.1145/3130000/3126905/f2.jpg +[78]:http://deliveryimages.acm.org/10.1145/3130000/3126905/t8.jpg +[79]:http://deliveryimages.acm.org/10.1145/3130000/3126905/t5.jpg +[80]:http://deliveryimages.acm.org/10.1145/3130000/3126905/t8.jpg +[81]:http://deliveryimages.acm.org/10.1145/3130000/3126905/t6.jpg +[82]:http://deliveryimages.acm.org/10.1145/3130000/3126905/f2.jpg +[83]:http://deliveryimages.acm.org/10.1145/3130000/3126905/t8.jpg +[84]:http://deliveryimages.acm.org/10.1145/3130000/3126905/ut4.jpg +[85]:http://deliveryimages.acm.org/10.1145/3130000/3126905/t8.jpg +[86]:http://deliveryimages.acm.org/10.1145/3130000/3126905/t8.jpg +[87]:http://deliveryimages.acm.org/10.1145/3130000/3126905/t8.jpg +[88]:http://deliveryimages.acm.org/10.1145/3130000/3126905/f2.jpg +[89]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#PageTop +[90]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#PageTop +[91]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#PageTop +[92]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#PageTop +[93]:https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#PageTop From ce1f6c331dd25d68916830a19e745934e68d9f09 Mon Sep 17 00:00:00 2001 From: Ezio Date: Wed, 11 Oct 2017 21:07:26 +0800 Subject: [PATCH 24/79] =?UTF-8?q?20171011-12=20=E9=80=89=E9=A2=98?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- ...nes and Android Architecture Components.md | 201 ++++++++++++++++++ 1 file changed, 201 insertions(+) create mode 100644 sources/tech/20171006 Create a Clean-Code App with Kotlin Coroutines and Android Architecture Components.md diff --git a/sources/tech/20171006 Create a Clean-Code App with Kotlin Coroutines and Android Architecture Components.md b/sources/tech/20171006 Create a Clean-Code App with Kotlin Coroutines and Android Architecture Components.md new file mode 100644 index 0000000000..0ff40cdd6e --- /dev/null +++ b/sources/tech/20171006 Create a Clean-Code App with Kotlin Coroutines and Android Architecture Components.md @@ -0,0 +1,201 @@ +Create a Clean-Code App with Kotlin Coroutines and Android Architecture Components +============================================================ + +### Full demo weather app included. + +Android development is evolving fast. A lot of developers and companies are trying to address common problems and create some great tools or libraries that can totally change the way we structure our apps. + + +![](https://cdn-images-1.medium.com/max/800/1*4z7VB5NWS2PMqD5k0hG4vQ.png) + +We get excited by the new possibilities, but it’s difficult to find time to rewrite our app to really benefit from a new programming style. But what if we actually start a new project? Which of those breakthrough ideas to employ? Which solutions are stable enough? Should we use RxJava extensively and structure our app with reactive-first mindset? + +> The Cycle.js library (by [André Staltz][6]) contains a great explanation of reactive-first mindset: [Cycle.js — Streams][7]. + +Rx is highly composable and it has great potential, but it’s so different from regular object-oriented programming style, that it will be really hard to understand for any developer without RxJava experience. + +There are more questions to ask before starting a new project. For example: + +* Should we use Kotlin instead of Java? + (actually here the answer is simple: [YES][1]) + +* Should we use experimental Kotlin Coroutines? (which, again, promote totally new programming style) + +* Should we use the new experimental library from Google: + Android Architecture Components? + +It’s necessary to try it all first in a small app to really make an informed decision. This is exactly what [I did][8], getting some useful insights in the process. If you want to find out what I learned, read on! + +### About [The App][9] + +The aim of the experiment was to create an [app][10] that downloads weather data for cities selected by user and then displays forecasts with graphical charts (and some fancy animations). It’s simple, yet it contains most of the typical features of Android projects. + +It turns out that coroutines and architecture components play really well together and give us clean app architecture with good separation of concerns. Coroutines allow to express ideas in a natural and concise way. Suspendable functions are great if you want to code line-by-line the exact logic you have in mind — even if you need to make some asynchronous calls in between. + +Also: no more jumping between callbacks. In this example app, coroutines also completely removed the need of using RxJava. Functions with suspendable points are easier to read and understand than some RxJava operator chains — these chains can quickly become too  _functional. _ ;-) + +> Having said that, I don’t think that RxJava can be replaced with coroutines in every use case. Observables give us a different kind of expressiveness that can not be mapped one to one to suspendable functions. In particular once constructed observable operator chain allow many events to flow through it, while a suspendable point resumes only once per invocation. + +Back to our weather app: +You can watch it in action below — but beware, I’m not a designer. :-) +Chart animations show how easily you can implement them arbitrarily by hand with simple coroutine — without any ObjectAnimators, Interpolators, Evaluators, PropertyValuesHolders, etc. + + ** 此处有Canvas,请手动处理 ** + + ** 此处有iframe,请手动处理 ** + +The most important source code snippets are displayed below. However, if you’d like to see the full project, it’s available [on GitHub.][11] + +[https://github.com/elpassion/crweather][12] + +There is not a lot of code and it should be easy to go through. + +I will present the app structure starting from the network layer. Then I will move to the business logic (in the [MainModel.kt][13] file) which is  _(almost)_  not Android-specific. And finish with the UI part (which obviously is Android-specific). + +Here is the general architecture diagram with text reference numbers added for your convenience. I will especially focus on  _green_  elements —  _suspendable functions_  and  _actors_  (an actor is a really useful kind of  _coroutine builder_ ). + +> The actor model in general is a mathematical model of concurrent computation — more about it in my next blog post. + + +![](https://cdn-images-1.medium.com/max/800/1*DL--eDRDLPPCDR1nsAmILg.png) + +### 01 Weather Service + +This service downloads weather forecasts for a given city from [Open Weather Map][14] REST API. + +I use simple but powerful library from [Square][15] called [Retrofit][16]. I guess by now every Android developer knows it, but in case you never used it: it’s the most popular HTTP client on Android. It makes network calls and parses responses to [POJO][17]. Nothing fancy here — just a typical Retrofit configuration. I plug in the [Moshi][18] converter to convert JSON responses to data classes. + + +![](https://cdn-images-1.medium.com/max/800/1*QGvoMVNbR_nHjmn0WCCFsw.png) +[https://github.com/elpassion/crweather/…/OpenWeatherMapApi.kt][2] + +One important thing to note here is that I set a return types of functions generated by Retrofit to: [Call][19]. + +I use [Call.enqueue(Callback)][20] to actually make a call to Open Weather Map. I don’t use any [call adapter][21] provided by Retrofit, because I wrap the Call object in the  _suspendable function_  myself. + +### 02 Utils + +This is where we enter the ([brave new][22])  _coroutines_  world: we want to create a generic  _suspendable function_  that wraps a [Call][23] object. + +> I assume you know at least the very basics of coroutines. Please read the first chapter of [Coroutines Guide][24] (written by [Roman Elizarov][25]) if you don’t. + +It will be an extension function:  [_suspend_  fun Call.await()][26] that invokes the [Call.enqueue(…)][27] (to actually make a network call), then  _suspends_  and later  _resumes_  (when the response comes back). + + ** 此处有Canvas,请手动处理 ** + +![](https://cdn-images-1.medium.com/max/800/1*T6QT9tRQbqOS9pKJfyh0og.png) +[https://github.com/elpassion/crweather/…/CommonUtils.kt][3] + +To turn any asynchronous computation into a  _suspendable function,_  we use the [suspendCoroutine][28] function from The Kotlin Standard Library. It gives us a [Continuation][29] object which is kind of a universal callback. We just have to call its [resume][30] method (or [resumeWithException][31]) anytime we want our new  _suspendable function_  to resume (normally or by throwing an exception). + +The next step will be to use our new  _suspend_  fun Call.await() function to convert asynchronous functions generated by Retrofit into convenient  _suspendable functions_ . + +### 03 Repository + +The Repository object is a source of the data ([charts][32]) displayed in our app. + + + +![](https://cdn-images-1.medium.com/max/800/1*rie-ith-AXP8-ajuBiNdzw.png) +[https://github.com/elpassion/crweather/…/Repository.kt][4] + +Here we have some private  _suspendable functions_  created by applying our  _suspend_  fun Call.await() extension to weather service functions. This way all of them return ready to use data like Forecast instead of Call. Then we use it in our one public  _suspendable function_ :  _suspend_  fun getCityCharts(city: String): List. It converts the data from api to a ready to display list of charts. I use some custom extension properties on List to actually convert the data to List. Important note: only  _suspendable functions_  can call other  _suspendable functions_ . + +> We have the [appid][33] hardcoded here for simplicity. Please generate new appid [here][34]if you want to test the app — this hardcoded one will be automatically blocked for 24h if it is used too frequently by too many people. + +In the next step we will create the main app model (implementing the Android [ViewModel][35] architecture component), that uses an  _actor (coroutine builder)_  to implement the application logic. + +### 04 Model + +In this app we only have one simple model: [MainModel][36] : [ViewModel][37] used by our one activity: [MainActivity][38]. + + + +![](https://cdn-images-1.medium.com/max/800/1*2frMeRS2T_3jwPpFeRInlQ.png) +[https://github.com/elpassion/crweather/…/MainModel.kt][5] + +This class represents the app itself. It will be instantiated by our activity (actually by the Android system [ViewModelProvider][39]), but it will survive configuration changes such as a screen rotation — new activity instance will get the same model instance. We don’t have to worry about activity lifecycle here at all. Instead of implementing all those activity lifecycle related methods (onCreate, onDestroy, …), we have just one onCleared() method called when the user exits the app. + +> To be precise onCleared method is called when the activity is finished. + +Even though we are not tightly coupled to activity lifecycle anymore, we still have to somehow publish current state of our app model to display it somewhere (in the activity). This is where the [LiveData][40] comes into play. + +The [LiveData][41] is like [RxJava][42] [BehaviorSubject][43] reinvented once again… It holds a mutable value that is observable. The most important difference is how we subscribe to it and we will see it later in the [MainActivity][44]. + +> Also LiveData doesn’t have all those powerful composable operators Observable has. There are only some simple [Transformations][45]. + +> Another difference is that LiveData is Android-specific and RxJava subjects are not, so they can be easily tested with regular non-android JUnit tests. + +> Yet another difference is that LiveData is “lifecycle aware” — more about it in my next posts, where I present the [MainActivity][46] class. + +In here we are actually using the [MutableLiveData][47] : [LiveData][48] objects that allow to push new values into it freely. The app state is represented by four LiveData objects: city, charts, loading, and message. The most important of these is the charts: LiveData> object which represents current list of charts to display. + +All the work of changing the app state and reacting to user actions is performed by an  _ACTOR_ . + + _Actors_  are awesome and will be explained in my next blog post :-) + +### Summary + +We have already prepared everything for our main  _actor_ . And if you look at the  _actor_  code itself — you can (kind of) see how it works even without knowing  _coroutines_  or  _actors_  theory. Even though it has only a few lines, it actually contains all important business logic of this app. The magic is where we call  _suspendable functions_  (marked by gray arrows with green line). One  _suspendable point_  is the iteration over user actions and second is the network call. Thanks to  _coroutines_  it looks like synchronous blocking code but it doesn’t block the thread at all. + +Stay tuned for my next post, where I will explain  _actors_  (and  _channels_ ) in detail. + +-------------------------------------------------------------------------------- + +via: https://blog.elpassion.com/create-a-clean-code-app-with-kotlin-coroutines-and-android-architecture-components-f533b04b5431 + +作者:[Marek Langiewicz][a] +译者:[译者ID](https://github.com/译者ID) +校对:[校对者ID](https://github.com/校对者ID) + +本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 + +[a]:https://blog.elpassion.com/@marek.langiewicz?source=post_header_lockup +[1]:https://www.quora.com/Does-Kotlin-make-Android-development-easier-and-faster/answer/Michal-Przadka?srid=Gu6q +[2]:https://github.com/elpassion/crweather/blob/master/app/src/main/java/com/elpassion/crweather/OpenWeatherMapApi.kt +[3]:https://github.com/elpassion/crweather/blob/master/app/src/main/java/com/elpassion/crweather/CommonUtils.kt +[4]:https://github.com/elpassion/crweather/blob/master/app/src/main/java/com/elpassion/crweather/Repository.kt +[5]:https://github.com/elpassion/crweather/blob/master/app/src/main/java/com/elpassion/crweather/MainModel.kt +[6]:https://medium.com/@andrestaltz +[7]:https://cycle.js.org/streams.html +[8]:https://github.com/elpassion/crweather +[9]:https://github.com/elpassion/crweather +[10]:https://github.com/elpassion/crweather +[11]:https://github.com/elpassion/crweather +[12]:https://github.com/elpassion/crweather +[13]:https://github.com/elpassion/crweather/blob/master/app/src/main/java/com/elpassion/crweather/MainModel.kt +[14]:http://openweathermap.org/api +[15]:https://github.com/square +[16]:http://square.github.io/retrofit/ +[17]:https://en.wikipedia.org/wiki/Plain_old_Java_object +[18]:https://github.com/square/retrofit/tree/master/retrofit-converters/moshi +[19]:https://github.com/square/retrofit/blob/master/retrofit/src/main/java/retrofit2/Call.java +[20]:https://github.com/square/retrofit/blob/b3ea768567e9e1fb1ba987bea021dbc0ead4acd4/retrofit/src/main/java/retrofit2/Call.java#L48 +[21]:https://github.com/square/retrofit/tree/master/retrofit-adapters +[22]:https://www.youtube.com/watch?v=_Lvf7Zu4XJU +[23]:https://github.com/square/retrofit/blob/master/retrofit/src/main/java/retrofit2/Call.java +[24]:https://github.com/Kotlin/kotlinx.coroutines/blob/master/coroutines-guide.md +[25]:https://medium.com/@elizarov +[26]:https://github.com/elpassion/crweather/blob/9c3e3cb803b7e4fffbb010ff085ac56645c9774d/app/src/main/java/com/elpassion/crweather/CommonUtils.kt#L24 +[27]:https://github.com/square/retrofit/blob/b3ea768567e9e1fb1ba987bea021dbc0ead4acd4/retrofit/src/main/java/retrofit2/Call.java#L48 +[28]:https://github.com/JetBrains/kotlin/blob/8f452ed0467e1239a7639b7ead3fb7bc5c1c4a52/libraries/stdlib/src/kotlin/coroutines/experimental/CoroutinesLibrary.kt#L89 +[29]:https://github.com/JetBrains/kotlin/blob/8fa8ba70558cfd610d91b1c6ba55c37967ac35c5/libraries/stdlib/src/kotlin/coroutines/experimental/Coroutines.kt#L23 +[30]:https://github.com/JetBrains/kotlin/blob/8fa8ba70558cfd610d91b1c6ba55c37967ac35c5/libraries/stdlib/src/kotlin/coroutines/experimental/Coroutines.kt#L32 +[31]:https://github.com/JetBrains/kotlin/blob/8fa8ba70558cfd610d91b1c6ba55c37967ac35c5/libraries/stdlib/src/kotlin/coroutines/experimental/Coroutines.kt#L38 +[32]:https://github.com/elpassion/crweather/blob/master/app/src/main/java/com/elpassion/crweather/DataTypes.kt +[33]:http://openweathermap.org/appid +[34]:http://openweathermap.org/appid +[35]:https://developer.android.com/topic/libraries/architecture/viewmodel.html +[36]:https://github.com/elpassion/crweather/blob/master/app/src/main/java/com/elpassion/crweather/MainModel.kt +[37]:https://developer.android.com/topic/libraries/architecture/viewmodel.html +[38]:https://github.com/elpassion/crweather/blob/master/app/src/main/java/com/elpassion/crweather/MainActivity.kt +[39]:https://developer.android.com/reference/android/arch/lifecycle/ViewModelProvider.html +[40]:https://developer.android.com/topic/libraries/architecture/livedata.html +[41]:https://developer.android.com/topic/libraries/architecture/livedata.html +[42]:https://github.com/ReactiveX/RxJava +[43]:https://github.com/ReactiveX/RxJava/wiki/Subject +[44]:https://github.com/elpassion/crweather/blob/master/app/src/main/java/com/elpassion/crweather/MainActivity.kt +[45]:https://developer.android.com/reference/android/arch/lifecycle/Transformations.html +[46]:https://github.com/elpassion/crweather/blob/master/app/src/main/java/com/elpassion/crweather/MainActivity.kt +[47]:https://developer.android.com/reference/android/arch/lifecycle/MutableLiveData.html +[48]:https://developer.android.com/topic/libraries/architecture/livedata.html From b93d47442c1e53085b9fe66f5c7bd049281316e2 Mon Sep 17 00:00:00 2001 From: Ezio Date: Wed, 11 Oct 2017 21:08:50 +0800 Subject: [PATCH 25/79] =?UTF-8?q?20171011-13=20=E9=80=89=E9=A2=98?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- ...velopment Beginner should know — Part 1.md | 215 ++++++++++++++++++ 1 file changed, 215 insertions(+) create mode 100644 sources/tech/20170929 12 Practices every Android Development Beginner should know — Part 1.md diff --git a/sources/tech/20170929 12 Practices every Android Development Beginner should know — Part 1.md b/sources/tech/20170929 12 Practices every Android Development Beginner should know — Part 1.md new file mode 100644 index 0000000000..224267b08c --- /dev/null +++ b/sources/tech/20170929 12 Practices every Android Development Beginner should know — Part 1.md @@ -0,0 +1,215 @@ +12 Practices every Android Development Beginner should know — Part 1 +============================================================ + +### One practice at a time to become a better Android beginner + + + +![](https://cdn-images-1.medium.com/max/800/1*RwCbsNdykQYr6vDa6aCGKQ.jpeg) + +It’s been more than 12 years since Andy Rubin and team started working on the idea of a mobile operating system that would change the way mobile phones, rather smartphones were seen by consumers as well as the people who developed software for it. Smartphones back then were limited to texting and checking emails (and of course, making phone calls), giving users and developers a boundary to work within. + +Android, the breaker of chains, with its excellent framework design gave both the parties the freedom to explore more than just a limited set of functionalities. One would argue that the iPhone brought the revolution in the mobile industry, but the thing is no matter how cool (and pricey, eh?) an iPhone is, it again brings that boundary, that limitation we never wanted. + +However, as Uncle Ben said — with great power comes great responsibility — we also need to be extra careful with our Android application design approach. I have often seen, in many courses offered, the negligence to teach beginners that value, the value to understand the architecture well enough before starting. We just throw things at people without correctly explaining what the upsides and downsides are, how they impact design or what to use, what not to. + +In this post, we will see some of the practices that a beginner or an intermediate (if missed any) level developer should know in order to get better out of the Android framework. This post will be followed by more in this series of posts where we will talk about more such useful practices. Let’s begin. + +* * * + +### 1\. Difference between @+id and @id + +In order to access a widget (or component) in Java or to make others dependent on it, we need a unique value to represent it. That unique value is provided by android:id attribute which essentially adds id provided as a suffix to @+id/ to the  _id resource file_  for others to query. An id for Toolbar can be defined like this, + +``` +android:id=”@+id/toolbar +``` + +The following id can now be tracked by  _findViewById(…)_  which looks for it in the res file for id, or simply R.id directory and returns the type of View in question. + +The other one, @id, behaves the same as findViewById(…) — looks for the component by the id provided but is reserved for layouts only. The most general use of it is to place a component relative to the component it returns. + +``` +android:layout_below=”@id/toolbar” +``` + +### 2\. Using @string res for providing Strings in XML + +In simpler words, don’t use hard coded strings in XML. The reason behind it is fairly simple. When we use hard coded string in XML, we often use the same word over and over again. Just imagine the nightmare of changing the same word at multiple places which could have been just one had it been a string resource. The other benefit it provides is multi-language support as different string resource files can be created for different languages. + +``` +android:text=”My Awesome Application” +``` + +When using hard coded strings, you will often see a warning over the use of such strings in Android Studio, offering to change that hard coded string into a string resource. Try clicking on them and then hitting ALT + ENTER to get the resource extractor. You can also go to strings.xml located in values folder under res and declare a string resource like this, + +``` +My Awesome Application +``` + +and then use it in place of the hard coded string, + +``` +android:text=”@string/app_name” +``` + +### 3\. Using @android and ?attr constants + +This is a fairly effective practice to use predefined constants instead of declaring new ones. Take an example of #ffffff or white color which is used several times in a layout. Now instead of writing #ffffff every single time, or declaring a color resource for white, we could directly use this, + +``` +@android:color/white +``` + +Android has several color constants declared mainly for general colors like white, black or pink. It’s best use case is setting transparent color with, + +``` +@android:color/transparent +``` + +Another constant holder is ?attr which is used for setting predefined attribute values to different attributes. Just take an example of a custom Toolbar. This Toolbar needs a defined width and height. The width can be normally set to MATCH_PARENT, but what about height? Most of us aren’t aware of the guidelines, and we simply set the desired height that seems fitting. That’s wrong practice. Instead of setting our own height, we should rather be using, + +``` +android:layout_height=”?attr/actionBarSize” +``` + +Another use of ?attr is to draw ripples on views when clicked. SelectableItemBackground is a predefined drawable that can be set as background to any view which needs ripple effect, + +``` +android:background=”?attr/selectableItemBackground” +``` + +or we can use + +``` +android:background=”?attr/selectableItemBackgroundBorderless” +``` + +to enable borderless ripple. + +### 4\. Difference between SP and DP + +While there’s no real difference between these two, it’s important to know what these two are, and where to use them to best results. + +SP or Scale-independent pixels are recommended for use with TextViews which require the font size to not change with display (density). Instead, the content of a TextView needs to scale as per the needs of a user, or simply the font size preferred by the user. + +With anything else that needs dimension or position, DP or Density-independent pixels can be used. As mentioned earlier, DP and SP are same things, it’s just that DP scales well with changing densities as the Android System dynamically calculates the pixels from it making it suitable for use on components that need to look and feel the same on different devices with different display densities. + +### 5\. Use of Drawables and Mipmaps + +This is the most confusing of them all — How are drawable and mipmap different? + +While it may seem that both serve the same purpose, they are inherentaly different. Mipmaps are meant to be used for storing icons, where as drawables are for any other format. Let’s see how they are used by the system internally and why not to use one in place of the other. + +You’ll notice that your application has several mipmap and drawable folders, each representing a different display resolution. When it comes to choosing from Drawable folder, the system chooses from the folder that belongs to current device density. However, with Mipmap, the system can choose an icon from any folder that fits the need mainly because some launchers display larger icons than intended, so system chooses the next size up. + +In short, use mipmaps for icons or markers that see a change in resolution when used on different device densities and use drawable for other resource types that can be stripped out when required. + +For example, a Nexus 5 is xxhdpi. Now when we put icons in mipmap folders, all the folders of mipmap will be retained. But when it comes to drawable, only drawable-xxhdpi will be retained, terming any other folder useless. + +### 6\. Using Vector Drawables + +It’s a very common practice to add multiple versions (sizes) of the same asset in order to support different screen densities. While this approach may work, it also adds certain performance overheads like larger apk size and extra development effort. To eliminate these overheads, Android team at Google announced the addition of Vector Drawables. + +Vector Drawables are SVGs (scaled vector graphics) but in XML representing an image drawn using a set of dots, lines and curves with fill colors. The very fact that Vector Drawables are made of lines and dots, gives them the ability to scale at different densities without losing resolution. The other associated benefit with Vector Drawables is the ease of animation. Add multiple vector drawables in a single AnimatedVectorDrawable file and we’re good to go instead of adding multiple images and handling them separately. + +``` + +``` + +``` + +``` + +``` + +``` + +The above vector definition will result in the following drawable, + +![](https://cdn-images-1.medium.com/max/600/1*KGmMIhrQR0UyrpIP_niEZw.png) + +To Add a vector drawable to your android project, right click on app module of your project, then New >> Vector Assets.This will get you Asset Studio which gives you two options to configure vector drawable. First, picking from Material Icons and second, choosing a local SVG or PSD file. + +Google recommends using Material Icons for anything app related to maintain continuity and feel of Android. Be sure to check out all of the icons [here][1]. + +### 7\. Setting End/Start Margin + +This is one of the easiest things people miss out on. Margin! Sure adding margin is easy but what about supporting older platforms? + +Start and End are supersets of Left and Right respectively, so if the application has minSdkVersion 17 or less, start or end margin/padding is required with older left/right. On platforms where start and end are missing, these two can be safely ignored for left/right. Sample declaration looks like this, + +``` +android:layout_marginEnd=”20dp” +android:paddingStart=”20dp” +``` + +### 8\. Using Getter/Setter Generator + +One of the most frustrating things to do while creating a holder class (which simply holds variable data) is creating multiple getters and setters — Copy/paste method body and rename them for each variable. + +Luckily, Android Studio has a solution for it. It goes like this — declare all the variables you need inside the class, and go to Toolbar >> Code. The Shortcut for it is ALT + Insert. Clicking Code will get you Generate, tap on it and among many other options, there will be Getter and Setter option. Tapping on it while maintaining focus on your class page will add all the getters and setters to the class (handle the previous window on your own). Neat, isn’t it? + +### 9\. Using Override/Implement Generator + +Another helpful generator. Writing custom classes and extending them is easy but what about classes you have little idea about. Take PagerAdapter for example. You want a ViewPager to show a few pages and for that, you will need a custom PagerAdapter that will work as you define inside its overridden methods. But where are those methods? Android Studio may be gracious enough to force you to add a constructor to your custom class or even to give a short cut for (that’s you ALT + Enter), but the rest of the (abstract) methods from parent PagerAdapter need to be added manually which I am sure is tiring for most of us. + +To get a list of all the overridden methods available, go to Code >> Generate and Override method or Implement methods, which ever is your need. You can even choose to add multiple methods to your class, just hold Ctrl and select methods and hit OK. + +### 10\. Understanding Contexts Properly + +Context is scary and I believe a lot of beginners never care to understand the architecture of Context class — what it is, and why is it needed everywhere. + +In simpler terms, it is the one that binds all that you see on the screen together. All the views (or their extensions) are tied to the current environment using Context. Context is responsible for allowing access to application level resources such as density or current activity associated with it. Activities, Services, and Application all implement Context interface to provide other to-be-associated components in-house resources. Take an example of a TextView which has to be added to MainActivity. You would notice while creating an object that the TextView constructor needs Context. This is to resolve any resources needed within TextView definition. Say TextView needs to internally load the Roboto font. For doing this, TextView needs Context. Also when we are providing context (or this) to TextView, we’re telling it to bind with the current activity’s lifecycle. + +Another key use of Context is to initiate application level operations such as initiating a library. A library lives through out the application lifecycle and thus it needs to be initiated with getApplicationContext() instead of  _getContext_  or  _this_ or  _getActivity()_ . It’s important to know the correct use of different Context types to avoid a memory leak. Other uses of Context includes starting an Activity or Service. Remember startActivity(…)? When you need to change Activity from a non-activity class, you will need a context object to access startActivity method since it belongs to the Context class, not Activity class. + +``` +getContext().startActivity(getContext(), SecondActivity.class); +``` + +If you want to know more about the behavior of Context, go [here][2] or [here][3]. The first one is a nice article on Contexts and where to use them while the latter is Android documentation for Context which has elaborately explained all of its available features — methods, static flags and more. + +### Bonus #1: Formatting Code + +Who doesn’t like clean, properly formatted code? Well, almost every one of us working on classes that tend to go up to 1000 lines in size want our code to stay structured. And it’s not that only larger classes need formatting, even smaller modular classes need to make sure code remains readable. + +With Android Studio, or any of the JetBrains IDEs you don’t even need to care about manually structuring your code like adding indentation or space before =. Write code the way you want and when you feel like formatting it, just hit ALT + CTRL + L on Windows or ALT + CTRL + SHIFT + L on Linux. *Code Auto-Formatted* + +### Bonus #2: Using Libraries + +One of the key principles of Object Oriented Programming is to increase reuse of code or rather decrease the habit of reinventing the wheel. It’s a very common approach that a lot of beginners follow wrongly. The approach has two ends, + +- Don’t use libraries, write every code on your own. + +- Use a library for everything. + +Going completely to either of the ends is wrong practice. If you go to the first end, you’re going to eat up a lot of resources just to live up to your pride to own everything. Plus chances are there that your code will be less tested than that library you should have gone with, increasing the chances of a buggy module. Don’t reinvent the wheel when there is a limited resource. Go with a tested library and when you’ve got the complete idea and resources, replace the library with your own reliable code. + +With the second end, there is an even bigger issue — reliance on foreign code. Don’t get used to the idea of relying on others code for everything. Write your own code for things that need lesser resources or things that are within your reach. You don’t need a library that sets up custom TypeFaces (fonts) for you, that you can do on your own. + +So remember, stay in the middle of the two ends — don’t reinvent everything but also don’t over-rely on foreign code. Stay neutral and code to your abilities. + +* * * + +This article was first published on [What’s That Lambda][4]. Be sure to visit for more articles like this one on Android, Node.js, Angular.js and more. + +-------------------------------------------------------------------------------- + +via: https://android.jlelse.eu/12-practices-every-android-beginner-should-know-cd43c3710027 + +作者:[ Nilesh Singh][a] +译者:[译者ID](https://github.com/译者ID) +校对:[校对者ID](https://github.com/校对者ID) + +本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 + +[a]:https://android.jlelse.eu/@nileshsingh?source=post_header_lockup +[1]:https://material.io/icons/ +[2]:https://blog.mindorks.com/understanding-context-in-android-application-330913e32514 +[3]:https://developer.android.com/reference/android/content/Context.html +[4]:https://www.whatsthatlambda.com/android/android-dev-101-things-every-beginner-must-know From c0c50fb2daecb044690eec4157abc4f7d457c112 Mon Sep 17 00:00:00 2001 From: Ezio Date: Wed, 11 Oct 2017 21:11:52 +0800 Subject: [PATCH 26/79] =?UTF-8?q?20171011-14=20=E9=80=89=E9=A2=98?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- ...0928 3 Python web scrapers and crawlers.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 sources/tech/20170928 3 Python web scrapers and crawlers.md diff --git a/sources/tech/20170928 3 Python web scrapers and crawlers.md b/sources/tech/20170928 3 Python web scrapers and crawlers.md new file mode 100644 index 0000000000..68fc9455e7 --- /dev/null +++ b/sources/tech/20170928 3 Python web scrapers and crawlers.md @@ -0,0 +1,114 @@ +3 Python web scrapers and crawlers +============================================================ + +### Check out these great Python tools for crawling and scraping the web, and parsing out the data you need. + + +![Python web scrapers and crawlers](https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/openweb-osdc-lead.png?itok=yjU4KliG "Python web scrapers and crawlers") +Image credits : [You as a Machine][13]. Modified by Rikki Endsley. [CC BY-SA 2.0][14]. + +In a perfect world, all of the data you need would be cleanly presented in an open and well-documented format that you could easily download and use for whatever purpose you need. + +In the real world, data is messy, rarely packaged how you need it, and often out-of-date. + +More Python Resources + +* [What is Python?][1] + +* [Top Python IDEs][2] + +* [Top Python GUI frameworks][3] + +* [Latest Python content][4] + +* [More developer resources][5] + +Often, the information you need is trapped inside of a website. While some websites make an effort to present data in a clean, structured data format, many do not. [Crawling][33], [scraping][34], processing, and cleaning data is a necessary activity for a whole host of activities from mapping a website's structure to collecting data that's in a web-only format, or perhaps, locked away in a proprietary database. + +Sooner or later, you're going to find a need to do some crawling and scraping to get the data you need, and almost certainly you're going to need to do a little coding to get it done right. How you do this is up to you, but I've found the Python community to be a great provider of tools, frameworks, and documentation for grabbing data off of websites. + +Before we jump in, just a quick request: think before you do, and be nice. In the context of scraping, this can mean a lot of things. Don't crawl websites just to duplicate them and present someone else's work as your own (without permission, of course). Be aware of copyrights and licensing, and how each might apply to whatever you have scraped. Respect [robots.txt][15] files. And don't hit a website so frequently that the actual human visitors have trouble accessing the content. + +With that caution stated, here are some great Python tools for crawling and scraping the web, and parsing out the data you need. + +### Pyspider + +Let's kick things off with [pyspider][16], a web-crawler with a web-based user interface that makes it easy to keep track of multiple crawls. It's an extensible option, with multiple backend databases and message queues supported, and several handy features baked in, from prioritization to the ability to retry failed pages, crawling pages by age, and others. Pyspider supports both Python 2 and 3, and for faster crawling, you can use it in a distributed format with multiple crawlers going at once. + +Pyspyder's basic usage is well [documented][17] including sample code snippets, and you can check out an [online demo][18] to get a sense of the user interface. Licensed under the Apache 2 license, pyspyder is still being actively developed on GitHub. + +### MechanicalSoup + +[MechanicalSoup][19] is a crawling library built around the hugely-popular and incredibly versatile HTML parsing library [Beautiful Soup][20]. If your crawling needs are fairly simple, but require you to check a few boxes or enter some text and you don't want to build your own crawler for this task, it's a good option to consider. + +MechanicalSoup is licensed under an MIT license. For more on how to use it, check out the example source file [example.py][21] on the project's GitHub page. Unfortunately, the project does not have robust documentation at this time + +### Scrapy + +[Scrapy][22] is a scraping framework supported by an active community with which you can build your own scraping tool. In addition to scraping and parsing tools, it can easily export the data it collects in a number of formats like JSON or CSV and store the data on a backend of your choosing. It also has a number of built-in extensions for tasks like cookie handling, user-agent spoofing, restricting crawl depth, and others, as well as an API for easily building your own additions. + +For an introduction to Scrapy, check out the [online documentation][23] or one of their many [community][24] resources, including an IRC channel, Subreddit, and a healthy following on their StackOverflow tag. Scrapy's code base can be found [on GitHub][25] under a 3-clause BSD license. + +If you're not all that comfortable with coding, [Portia][26] provides a visual interface that makes it easier. A hosted version is available at [scrapinghub.com][27]. + +### Others + +* [Cola][6] describes itself as a “high-level distributed crawling framework” that might meet your needs if you're looking for a Python 2 approach, but note that it has not been updated in over two years. + +* [Demiurge][7], which supports both Python 2 and Python 3, is another potential candidate to look at, although development on this project is relatively quiet as well. + +* [Feedparser][8] might be a helpful project to check out if the data you are trying to parse resides primarily in RSS or Atom feeds. + +* [Lassie][9] makes it easy to retrieve basic content like a description, title, keywords, or a list of images from a webpage. + +* [RoboBrowser][10] is another simple library for Python 2 or 3 with basic functionality, including button-clicking and form-filling. Though it hasn't been updated in a while, it's still a reasonable choice. + +* * * + +This is far from a comprehensive list, and of course, if you're a master coder you may choose to take your own approach rather than use one of these frameworks. Or, perhaps, you've found a great alternative built for a different language. For example, Python coders would probably appreciate checking out the [Python bindings][28] for [Selenium][29] for sites that are trickier to crawl without using an actual web browser. If you've got a favorite tool for crawling and scraping, let us know in the comments below. + +-------------------------------------------------------------------------------- + +via: https://opensource.com/resources/python/web-scraper-crawler + +作者:[Jason Baker ][a] +译者:[译者ID](https://github.com/译者ID) +校对:[校对者ID](https://github.com/校对者ID) + +本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 + +[a]:https://opensource.com/users/jason-baker +[1]:https://opensource.com/resources/python?intcmp=7016000000127cYAAQ +[2]:https://opensource.com/resources/python/ides?intcmp=7016000000127cYAAQ +[3]:https://opensource.com/resources/python/gui-frameworks?intcmp=7016000000127cYAAQ +[4]:https://opensource.com/tags/python?intcmp=7016000000127cYAAQ +[5]:https://developers.redhat.com/?intcmp=7016000000127cYAAQ +[6]:https://github.com/chineking/cola +[7]:https://github.com/matiasb/demiurge +[8]:https://github.com/kurtmckee/feedparser +[9]:https://github.com/michaelhelmick/lassie +[10]:https://github.com/jmcarp/robobrowser +[11]:https://opensource.com/resources/python/web-scraper-crawler?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007&rate=Wn1vUb9FpPK-IGQ1waRzgdIsDN3pXBH6rO2xnjoK_t4 +[12]:https://opensource.com/user/19894/feed +[13]:https://www.flickr.com/photos/youasamachine/8025582590/in/photolist-decd6C-7pkccp-aBfN9m-8NEffu-3JDbWb-aqf5Tx-7Z9MTZ-rnYTRu-3MeuPx-3yYwA9-6bSLvd-irmvxW-5Asr4h-hdkfCA-gkjaSQ-azcgct-gdV5i4-8yWxCA-9G1qDn-5tousu-71V8U2-73D4PA-iWcrTB-dDrya8-7GPuxe-5pNb1C-qmnLwy-oTxwDW-3bFhjL-f5Zn5u-8Fjrua-bxcdE4-ddug5N-d78G4W-gsYrFA-ocrBbw-pbJJ5d-682rVJ-7q8CbF-7n7gDU-pdfgkJ-92QMx2-aAmM2y-9bAGK1-dcakkn-8rfyTz-aKuYvX-hqWSNP-9FKMkg-dyRPkY +[14]:https://creativecommons.org/licenses/by/2.0/ +[15]:http://www.robotstxt.org/ +[16]:https://github.com/binux/pyspider +[17]:http://docs.pyspider.org/en/latest/ +[18]:http://demo.pyspider.org/ +[19]:https://github.com/hickford/MechanicalSoup +[20]:https://www.crummy.com/software/BeautifulSoup/ +[21]:https://github.com/hickford/MechanicalSoup/blob/master/example.py +[22]:https://scrapy.org/ +[23]:https://doc.scrapy.org/en/latest/ +[24]:https://scrapy.org/community/ +[25]:https://github.com/scrapy/scrapy +[26]:https://github.com/scrapinghub/portia +[27]:https://portia.scrapinghub.com/ +[28]:https://selenium-python.readthedocs.io/ +[29]:https://github.com/SeleniumHQ/selenium +[30]:https://opensource.com/users/jason-baker +[31]:https://opensource.com/users/jason-baker +[32]:https://opensource.com/resources/python/web-scraper-crawler?imm_mid=0f7103&cmp=em-prog-na-na-newsltr_20171007#comments +[33]:https://en.wikipedia.org/wiki/Web_crawler +[34]:https://en.wikipedia.org/wiki/Web_scraping From ee349d7911eb7e0064c15b0b6503bd196c7cf36b Mon Sep 17 00:00:00 2001 From: Ezio Date: Wed, 11 Oct 2017 21:14:46 +0800 Subject: [PATCH 27/79] =?UTF-8?q?20171011-15=20=E9=80=89=E9=A2=98?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- ...general purpose data structure in Redis.md | 168 ++++++++++++++++++ 1 file changed, 168 insertions(+) create mode 100644 sources/tech/20171003 Streams a new general purpose data structure in Redis.md diff --git a/sources/tech/20171003 Streams a new general purpose data structure in Redis.md b/sources/tech/20171003 Streams a new general purpose data structure in Redis.md new file mode 100644 index 0000000000..a5ac0a965c --- /dev/null +++ b/sources/tech/20171003 Streams a new general purpose data structure in Redis.md @@ -0,0 +1,168 @@ +[Streams: a new general purpose data structure in Redis.][1] +================================== + + +Until a few months ago, for me streams were no more than an interesting and relatively straightforward concept in the context of messaging. After Kafka popularized the concept, I mostly investigated their usefulness in the case of Disque, a message queue that is now headed to be translated into a Redis 4.2 module. Later I decided that Disque was all about AP messaging, which is, fault tolerance and guarantees of delivery without much efforts from the client, so I decided that the concept of streams was not a good match in that case. + +However, at the same time, there was a problem in Redis, that was not taking me relaxed about the data structures exported by default. There is some kind of gap between Redis lists, sorted sets, and Pub/Sub capabilities. You can kindly use all these tools in order to model a sequence of messages or events, but with different tradeoffs. Sorted sets are memory hungry, can’t model naturally the same message delivered again and again, clients can’t block for new messages. Because a sorted set is not a sequential data structure, it’s a set where elements can be moved around changing their scores: no wonder if it was not a good match for things like time series. Lists have different problems creating similar applicability issues in certain use cases: you cannot explore what is in the middle of a list because the access time in that case is linear. Moreover no fan-out is possible, blocking operations on list serve a single element to a single client. Nor there was a fixed element identifier in lists, in order to say: given me things starting from that element. For one-to-many workloads there is Pub/Sub, which is great in many cases, but for certain things you do not want fire-and-forget: to retain a history is important, not just to refetch messages after a disconnection, also because certain list of messages, like time series, are very important to explore with range queries: what were my temperature readings in this 10 seconds range? + +The way I tried to address the above problems, was planning a generalization of sorted sets and lists into a unique more flexible data structure, however my design attempts ended almost always in making the resulting data structure ways more artificial than the current ones. One good thing about Redis is that the data structures exported resemble more the natural computer science data structures, than, “this API that Salvatore invented”. So in the end, I stopped my attempts, and said, ok that’s what we can provide so far, maybe I’ll add some history to Pub/Sub, or some more flexibility to lists access patterns in the future. However every time an user approached me during a conference saying “how would you model time series in Redis?” or similar related questions, my face turned green. + +Genesis +======= + +After the introduction of modules in Redis 4.0, users started to see how to fix this problem themselves. One of them, Timothy Downs, wrote me the following over IRC: + + the module I'm planning on doing is to add a transaction log style data type - meaning that a very large number of subscribers can do something like pub sub without a lot of redis memory growth + subscribers keeping their position in a message queue rather than having redis maintain where each consumer is up to and duplicating messages per subscriber + +This captured my imagination. I thought about it a few days, and realized that this could be the moment when we could solve all the above problems at once. What I needed was to re-imagine the concept of “log”. It is a basic programming element, everybody is used to it, because it’s just as simple as opening a file in append mode and writing data to it in some format. However Redis data structures must be abstract. They are in memory, and we use RAM not just because we are lazy, but because using a few pointers, we can conceptualize data structures and make them abstract, to allow them to break free from the obvious limits. For instance normally a log has several problems: the offset is not logical, but is an actual bytes offset, what if we want logical offsets that are related to the time an entry was inserted? We have range queries for free. Similarly, a log is often hard to garbage collect: how to remove old elements in an append only data structure? Well, in our idealized log, we just say we want at max this number of entries, and the old ones will go away, and so forth. + +While I was trying to write a specification starting from the seed idea of Timothy, I was working to a radix tree implementation that I was using for Redis Cluster, to optimize certain parts of its internals. This provided the ground in order to implement a very space efficient log, that was still accessible in logarithmic time to get ranges. At the same time I started reading about Kafka streams to get other interesting ideas that could fit well into my design, and this resulted into getting the concept of Kafka consumer groups, and idealizing it again for Redis and the in-memory use case. However the specification remained just a specification for months, at the point that after some time I rewrote it almost from scratch in order to upgrade it with many hints that I accumulated talking with people about this upcoming addition to Redis. I wanted Redis streams to be a very good use case for time series especially, not just for other kind of events and messaging applications. + +Let’s write some code +===================== + +Back from Redis Conf, during the summertime, I was implementing a library called “listpack”. This library is just the successor of ziplist.c, that is, a data structure that can represent a list of string elements inside a single allocation. It’s just a very specialized serialization format, with the peculiarity of being parsable also in reverse order, from right to left: something needed in order to substitute ziplists in all the use cases. + +Mixing radix trees + listpacks, it is possible to easily build a log that is at the same time very space efficient, and indexed, that means, allowing for random access by IDs and time. Once this was ready, I started to write the code in order to implement the stream data structure. I’m still finishing the implementation, however at this point, inside the Redis “streams” branch at Github, there is enough to start playing and having fun. I don’t claim that the API is 100% final, but there are two interesting facts: one is that at this point, only the consumer groups are missing, plus a number of less important commands to manipulate the stream, but all the big things are implemented already. The second is the decision to backport all the stream work back into the 4.0 branch in about two months, once everything looks stable. It means that Redis users will not have to wait for Redis 4.2 in order to use streams, they will be available ASAP for production usage. This is possible because being a new data structure, almost all the code changes are self-contained into the new code. With the exception of the blocking list operations: the code was refactored so that we share the same code for streams and lists blocking operations, with a great simplification of the Redis internals. + +Tutorial: welcome to Redis Streams +================================== + +In some way, you can think at streams as a supercharged version of Redis lists. Streams elements are not just a single string, they are more objects composed of fields and values. Range queries are possible and fast. Each entry in a stream has an ID, which is a logical offset. Different clients can blocking-wait for elements with IDs greater than a specified one. A fundamental command of Redis streams is XADD. Yes, all the Redis stream commands are prefixed by an “X”. + +> XADD mystream * sensor-id 1234 temperature 10.5 +1506871964177.0 + +The XADD command will append the specified entry as a new element to the specified stream “mystream”. The entry, in the example above, has two fields: sensor-id and temperature, however each entry in the same stream can have different fields. Using the same field names will just lead to better memory usage. An interesting thing is also that the fields order is guaranteed to be retained. XADD returns the ID of the just inserted entry, because with the asterisk in the third argument, we asked the command to auto-generate the ID. This is almost always what you want, but it is possible also to force a specific ID, for instance in order to replicate the command to slaves and AOF files. + +The ID is composed of two parts: a millisecond time and a sequence number. 1506871964177 is the millisecond time, and is just a Unix time with millisecond resolution. The number after the dot, 0, is the sequence number, and is used in order to distinguish entries added in the same millisecond. Both numbers are 64 bit unsigned integers. This means that we can add all the entries we want in a stream, even in the same millisecond. The millisecond part of the ID is obtained using the maximum between the current local time of the Redis server generating the ID, and the last entry inside the stream. So even if, for instance, the computer clock jumps backward, the IDs will continue to be incremental. In some way you can think stream entry IDs as whole 128 bit numbers. However the fact that they have a correlation with the local time of the instance where they are added, means that we have millisecond precision range queries for free. + +As you can guess, adding two entries in a very fast way, will result in only the sequence number to be incremented. We can simulate the “fast insertion” simply with a MULTI/EXEC block: + +> MULTI +OK +> XADD mystream * foo 10 +QUEUED +> XADD mystream * bar 20 +QUEUED +> EXEC +1) 1506872463535.0 +2) 1506872463535.1 + +The above example also shows how we can use different fields for different entries without having to specifying any schema initially. What happens however is that every first message of every block (that usually contains something in the range of 50-150 messages) is used as reference, and successive entries having the same fields are compressed with a single flag saying “same fields of the first entry in this block”. So indeed using the same fields for successive messages saves a lot of memory, even when the set of fields slowly change over time. + +In order to retrieve data from the stream there are two ways: range queries, that are implemented by the XRANGE command, and streaming, implemented by the XREAD command. XRANGE just fetches a range of items from start to stop, inclusive. So for instance I can fetch a single item, if I know its ID, with: + +> XRANGE mystream 1506871964177.0 1506871964177.0 +1) 1) 1506871964177.0 + 2) 1) "sensor-id" + 2) "1234" + 3) "temperature" + 4) "10.5" + +However you can use the special start symbol of “-“ and the special stop symbol of “+” to signify the minimum and maximum ID possible. It’s also possible to use the COUNT option in order to limit the amount of entries returned. A more complex XRANGE example is the following: + +> XRANGE mystream - + COUNT 2 +1) 1) 1506871964177.0 + 2) 1) "sensor-id" + 2) "1234" + 3) "temperature" + 4) "10.5" +2) 1) 1506872463535.0 + 2) 1) "foo" + 2) "10" + +Here we are reasoning in terms of ranges of IDs, however you can use XRANGE in order to get a specific range of elements in a given time range, because you can omit the “sequence” part of the IDs. So what you can do is to just specify times in milliseconds. The following means: “Give me 10 entries starting from the Unix time 1506872463”: + +127.0.0.1:6379> XRANGE mystream 1506872463000 + COUNT 10 +1) 1) 1506872463535.0 + 2) 1) "foo" + 2) "10" +2) 1) 1506872463535.1 + 2) 1) "bar" + 2) "20" + +A final important thing to note about XRANGE is that, given that we receive the IDs in the reply, and the immediately successive ID is trivially obtained just incrementing the sequence part of the ID, it is possible to use XRANGE to incrementally iterate the whole stream, receiving for every call the specified number of elements. After the *SCAN family of commands in Redis, that allowed iteration of Redis data structures *despite* the fact they were not designed for being iterated, I avoided to make the same error again. + +Streaming with XREAD: blocking for new data +=========================================== + +XRANGE is perfect when we want to access our stream to get ranges by ID or time, or single elements by ID. However in the case of streams that different clients must consume as data arrives, this is not good enough and would require some form of pooling (that could be a good idea for *certain* applications that just connect from time to time to get data). + +The XREAD command is designed in order to read, at the same time, from multiple streams just specifying the ID of the last entry in the stream we got. Moreover we can request to block if no data is available, to be unblocked when data arrives. Similarly to what happens with blocking list operations, but here data is not consumed from the stream, and multiple clients can access the same data at the same time. + +This is a canonical example of XREAD call: + +> XREAD BLOCK 5000 STREAMS mystream otherstream $ $ + +And it means: get data from “mystream” and “otherstream”. If no data is available, block the client, with a timeout of 5000 milliseconds. After the STREAMS option we specify the keys we want to listen for, and the last ID we have. However a special ID of “$” means: assume I’ve all the elements that there are in the stream right now, so give me just starting from the next element arriving. + +If, from another client, I send the commnad: + +> XADD otherstream * message “Hi There” + +This is what happens on the XREAD side: + +1) 1) "otherstream" + 2) 1) 1) 1506935385635.0 + 2) 1) "message" + 2) "Hi There" + +We get the key that received data, together with the data received. In the next call, we’ll likely use the ID of the last message received: + +> XREAD BLOCK 5000 STREAMS mystream otherstream $ 1506935385635.0 + +And so forth. However note that with this usage pattern, it is possible that the client will connect again after a very big delay (because it took time to process messages, or for any other reason). In such a case, in the meantime, a lot of messages could pile up, so it is wise to always use the COUNT option with XREAD, in order to make sure the client will not be flooded with messages and the server will not have to lose too much time just serving tons of messages to a single client. + +Capped streams +============== + +So far so good… however streams at some point have to remove old messages. Fortunately this is possible with the MAXLEN option of the XADD command: + +> XADD mystream MAXLEN 1000000 * field1 value1 field2 value2 + +This basically means, if the stream, after adding the new element is found to have more than 1 million messages, remove old messages so that the length returns back to 1 million elements. It’s just like using RPUSH + LTRIM with lists, but this time we have a built-in mechanism to do so. However note that the above means that every time we add a new message, we have also to incur in the work needed in order to remove a message from the other side of the stream. This takes some CPU, so it is possible to use the “~” symbol before the count in MAXLEN, in order to specify that we are not really demanding *exactly* 1 million messages, but if there are a few more it’s not a big problem: + +> XADD mystream MAXLEN ~ 1000000 * foo bar + +This way XADD will remove messages only when it can remove a whole node. This will make having the capped stream almost for free compared to vanilla XADD. + +Consumer groups (work in progress) +================================== + +This is the first of the features that is not already implemented in Redis, but is a work in progress. It is also the idea more clearly inspired by Kafka, even if implemented here in a pretty different way. The gist is that with XREAD, clients can also add a “GROUP ” option. Automatically all the clients in the same group will get *different* messages. Of course there could be multiple groups reading from the same stream, in such cases all groups will receive duplicates of the same messages arriving in the stream, but within each group, messages will not be repeated. + +An extension to groups is that it will be possible to specify a “RETRY ” option when groups are specified: in this case, if messages are not acknowledged for processing with XACK, they will be delivered again after the specified amount of milliseconds. This provides some best effort reliability to the delivering of the messages, in case the client has no private means to mark messages as processed. This part is a work in progress as well. + +Memory usage and saving loading times +===================================== + +Because of the design used to model Redis streams, the memory usage is remarkably low. It depends on the number of fields, values, and their lengths, but for simple messages we are at a few millions of messages for every 100 MB of used memory. Moreover, the format is conceived to need very minimal serialization: the listpack blocks that are stored as radix tree nodes, have the same representation on disk and in memory, so they are trivially stored and read. For instance Redis can read 5 million entries from the RDB file in 0.3 seconds. +This makes replication and persistence of streams very efficient. + +It is planned to also allow deletion of items in the middle. This is only partially implemented, but the strategy is to mark entries as deleted in the entry flag, and when a given ratio between entries and deleted entires is reached, the block is rewritten to collect the garbage, and if needed it is glued to another adjacent block in order to avoid fragmentation. + +Conclusions end ETA +=================== + +Redis streams will be part of Redis stable in the 4.0 series before the end of the year. I think that this general purpose data structure is going to put a huge patch in order for Redis to cover a lot of use cases that were hard to cover: that means that you had to be creative in order to abuse the current data structures to fix certain problems. One very important use case is time series, but my feeling is that also streaming of messages for other use cases via TREAD is going to be very interesting both as replacement for Pub/Sub applications that need more reliability than fire-and-forget, and for completely new use cases. For now, if you want to start to evaluate the new capabilities in the context of your problems, just fetch the “streams” branch at Github and start playing. After all bug reports are welcome :-) + +If you like videos, a real-time session showing streams is here: https://www.youtube.com/watch?v=ELDzy9lCFHQ + + +-------------------------------------------------------------------------------- + +via: http://antirez.com/news/114 + +作者:[antirez ][a] +译者:[译者ID](https://github.com/译者ID) +校对:[校对者ID](https://github.com/校对者ID) + +本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 + +[a]:http://antirez.com/ +[1]:http://antirez.com/news/114 +[2]:http://antirez.com/user/antirez +[3]:https://www.youtube.com/watch?v=ELDzy9lCFHQ From 51d1f45e55809da2b690feafd84d4909bb682fef Mon Sep 17 00:00:00 2001 From: Ezio Date: Wed, 11 Oct 2017 21:16:36 +0800 Subject: [PATCH 28/79] =?UTF-8?q?20171011-16=20=E9=80=89=E9=A2=98?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .../20171002 Scaling the GitLab database.md | 256 ++++++++++++++++++ 1 file changed, 256 insertions(+) create mode 100644 sources/tech/20171002 Scaling the GitLab database.md diff --git a/sources/tech/20171002 Scaling the GitLab database.md b/sources/tech/20171002 Scaling the GitLab database.md new file mode 100644 index 0000000000..d22f811bf1 --- /dev/null +++ b/sources/tech/20171002 Scaling the GitLab database.md @@ -0,0 +1,256 @@ +Scaling the GitLab database +============================================================ + +An in-depth look at the challenges faced when scaling the GitLab database and the solutions we applied to help solve the problems with our database setup. + +For a long time GitLab.com used a single PostgreSQL database server and a single replica for disaster recovery purposes. This worked reasonably well for the first few years of GitLab.com's existence, but over time we began seeing more and more problems with this setup. In this article we'll take a look at what we did to help solve these problems for both GitLab.com and self-hosted GitLab instances. + +For example, the database was under constant pressure, with CPU utilization hovering around 70 percent almost all the time. Not because we used all available resources in the best way possible, but because we were bombarding the server with too many (badly optimized) queries. We realized we needed a better setup that would allow us to balance the load and make GitLab.com more resilient to any problems that may occur on the primary database server. + +When tackling these problems using PostgreSQL there are essentially four techniques you can apply: + +1. Optimize your application code so the queries are more efficient (and ideally use fewer resources). + +2. Use a connection pooler to reduce the number of database connections (and associated resources) necessary. + +3. Balance the load across multiple database servers. + +4. Shard your database. + +Optimizing the application code is something we have been working on actively for the past two years, but it's not a final solution. Even if you improve performance, when traffic also increases you may still need to apply the other two techniques. For the sake of this article we'll skip over this particular subject and instead focus on the other techniques. + +### Connection pooling + +In PostgreSQL a connection is handled by starting an OS process which in turn needs a number of resources. The more connections (and thus processes), the more resources your database will use. PostgreSQL also enforces a maximum number of connections as defined in the [max_connections][5] setting. Once you hit this limit PostgreSQL will reject new connections. Such a setup can be illustrated using the following diagram: + +![PostgreSQL Diagram](https://about.gitlab.com/images/scaling-the-gitlab-database/postgresql.svg) + +Here our clients connect directly to PostgreSQL, thus requiring one connection per client. + +By pooling connections we can have multiple client-side connections reuse PostgreSQL connections. For example, without pooling we'd need 100 PostgreSQL connections to handle 100 client connections; with connection pooling we may only need 10 or so PostgreSQL connections depending on our configuration. This means our connection diagram will instead look something like the following: + +![Connection Pooling Diagram](https://about.gitlab.com/images/scaling-the-gitlab-database/pooler.svg) + +Here we show an example where four clients connect to pgbouncer but instead of using four PostgreSQL connections we only need two of them. + +For PostgreSQL there are two connection poolers that are most commonly used: + +* [pgbouncer][1] + +* [pgpool-II][2] + +pgpool is a bit special because it does much more than just connection pooling: it has a built-in query caching mechanism, can balance load across multiple databases, manage replication, and more. + +On the other hand pgbouncer is much simpler: all it does is connection pooling. + +### Database load balancing + +Load balancing on the database level is typically done by making use of PostgreSQL's "[hot standby][6]" feature. A hot-standby is a PostgreSQL replica that allows you to run read-only SQL queries, contrary to a regular standby that does not allow any SQL queries to be executed. To balance load you'd set up one or more hot-standby servers and somehow balance read-only queries across these hosts while sending all other operations to the primary. Scaling such a setup is fairly easy: simply add more hot-standby servers (if necessary) as your read-only traffic increases. + +Another benefit of this approach is having a more resilient database cluster. Web requests that only use a secondary can continue to operate even if the primary server is experiencing issues; though of course you may still run into errors should those requests end up using the primary. + +This approach however can be quite difficult to implement. For example, explicit transactions must be executed on the primary since they may contain writes. Furthermore, after a write we want to continue using the primary for a little while because the changes may not yet be available on the hot-standby servers when using asynchronous replication. + +### Sharding + +Sharding is the act of horizontally partitioning your data. This means that data resides on specific servers and is retrieved using a shard key. For example, you may partition data per project and use the project ID as the shard key. Sharding a database is interesting when you have a very high write load (as there's no other easy way of balancing writes other than perhaps a multi-master setup), or when you have  _a lot_  of data and you can no longer store it in a conventional manner (e.g. you simply can't fit it all on a single disk). + +Unfortunately the process of setting up a sharded database is a massive undertaking, even when using software such as [Citus][7]. Not only do you need to set up the infrastructure (which varies in complexity depending on whether you run it yourself or use a hosted solution), but you also need to adjust large portions of your application to support sharding. + +### Cases against sharding + +On GitLab.com the write load is typically very low, with most of the database queries being read-only queries. In very exceptional cases we may spike to 1500 tuple writes per second, but most of the time we barely make it past 200 tuple writes per second. On the other hand we can easily read up to 10 million tuples per second on any given secondary. + +Storage-wise, we also don't use that much data: only about 800 GB. A large portion of this data is data that is being migrated in the background. Once those migrations are done we expect our database to shrink in size quite a bit. + +Then there's the amount of work required to adjust the application so all queries use the right shard keys. While quite a few of our queries usually include a project ID which we could use as a shard key, there are also many queries where this isn't the case. Sharding would also affect the process of contributing changes to GitLab as every contributor would now have to make sure a shard key is present in their queries. + +Finally, there is the infrastructure that's necessary to make all of this work. Servers have to be set up, monitoring has to be added, engineers have to be trained so they are familiar with this new setup, the list goes on. While hosted solutions may remove the need for managing your own servers it doesn't solve all problems. Engineers still have to be trained and (most likely very expensive) bills have to be paid. At GitLab we also highly prefer to ship the tools we need so the community can make use of them. This means that if we were going to shard the database we'd have to ship it (or at least parts of it) in our Omnibus packages. The only way you can make sure something you ship works is by running it yourself, meaning we wouldn't be able to use a hosted solution. + +Ultimately we decided against sharding the database because we felt it was an expensive, time-consuming, and complex solution to a problem we do not have. + +### Connection pooling for GitLab + +For connection pooling we had two main requirements: + +1. It has to work well (obviously). + +2. It has to be easy to ship in our Omnibus packages so our users can also take advantage of the connection pooler. + +Reviewing the two solutions (pgpool and pgbouncer) was done in two steps: + +1. Perform various technical tests (does it work, how easy is it to configure, etc). + +2. Find out what the experiences are of other users of the solution, what problems they ran into and how they dealt with them, etc. + +pgpool was the first solution we looked into, mostly because it seemed quite attractive based on all the features it offered. Some of the data from our tests can be found in [this][8] comment. + +Ultimately we decided against using pgpool based on a number of factors. For example, pgpool does not support sticky connections. This is problematic when performing a write and (trying to) display the results right away. Imagine creating an issue and being redirected to the page, only to run into an HTTP 404 error because the server used for any read-only queries did not yet have the data. One way to work around this would be to use synchronous replication, but this brings many other problems to the table; problems we prefer to avoid. + +Another problem is that pgpool's load balancing logic is decoupled from your application and operates by parsing SQL queries and sending them to the right server. Because this happens outside of your application you have very little control over which query runs where. This may actually be beneficial to some because you don't need additional application logic, but it also prevents you from adjusting the routing logic if necessary. + +Configuring pgpool also proved quite difficult due to the sheer number of configuration options. Perhaps the final nail in the coffin was the feedback we got on pgpool from those having used it in the past. The feedback we received regarding pgpool was usually negative, though not very detailed in most cases. While most of the complaints appeared to be related to earlier versions of pgpool it still made us doubt if using it was the right choice. + +The feedback combined with the issues described above ultimately led to us deciding against using pgpool and using pgbouncer instead. We performed a similar set of tests with pgbouncer and were very satisfied with it. It's fairly easy to configure (and doesn't have that much that needs configuring in the first place), relatively easy to ship, focuses only on connection pooling (and does it really well), and had very little (if any) noticeable overhead. Perhaps my only complaint would be that the pgbouncer website can be a little bit hard to navigate. + +Using pgbouncer we were able to drop the number of active PostgreSQL connections from a few hundred to only 10-20 by using transaction pooling. We opted for using transaction pooling since Rails database connections are persistent. In such a setup, using session pooling would prevent us from being able to reduce the number of PostgreSQL connections, thus brining few (if any) benefits. By using transaction pooling we were able to drop PostgreSQL's `max_connections` setting from 3000 (the reason for this particular value was never really clear) to 300\. pgbouncer is configured in such a way that even at peak capacity we will only need 200 connections; giving us some room for additional connections such as `psql` consoles and maintenance tasks. + +A side effect of using transaction pooling is that you cannot use prepared statements, as the `PREPARE` and `EXECUTE` commands may end up running in different connections; producing errors as a result. Fortunately we did not measure any increase in response timings when disabling prepared statements, but we  _did_  measure a reduction of roughly 20 GB in memory usage on our database servers. + +To ensure both web requests and background jobs have connections available we set up two separate pools: one pool of 150 connections for background processing, and a pool of 50 connections for web requests. For web requests we rarely need more than 20 connections, but for background processing we can easily spike to a 100 connections simply due to the large number of background processes running on GitLab.com. + +Today we ship pgbouncer as part of GitLab EE's High Availability package. For more information you can refer to ["Omnibus GitLab PostgreSQL High Availability."][9] + +### Database load balancing for GitLab + +With pgpool and its load balancing feature out of the picture we needed something else to spread load across multiple hot-standby servers. + +For (but not limited to) Rails applications there is a library called [Makara][10] which implements load balancing logic and includes a default implementation for ActiveRecord. Makara however has some problems that were a deal-breaker for us. For example, its support for sticky connections is very limited: when you perform a write the connection will stick to the primary using a cookie, with a fixed TTL. This means that if replication lag is greater than the TTL you may still end up running a query on a host that doesn't have the data you need. + +Makara also requires you to configure quite a lot, such as all the database hosts and their roles, with no service discovery mechanism (our current solution does not yet support this either, though it's planned for the near future). Makara also [does not appear to be thread-safe][11], which is problematic since Sidekiq (the background processing system we use) is multi-threaded. Finally, we wanted to have control over the load balancing logic as much as possible. + +Besides Makara there's also [Octopus][12] which has some load balancing mechanisms built in. Octopus however is geared towards database sharding and not just balancing of read-only queries. As a result we did not consider using Octopus. + +Ultimately this led to us building our own solution directly into GitLab EE. The merge request adding the initial implementation can be found [here][13], though some changes, improvements, and fixes were applied later on. + +Our solution essentially works by replacing `ActiveRecord::Base.connection` with a proxy object that handles routing of queries. This ensures we can load balance as many queries as possible, even queries that don't originate directly from our own code. This proxy object in turn determines what host a query is sent to based on the methods called, removing the need for parsing SQL queries. + +### Sticky connections + +Sticky connections are supported by storing a pointer to the current PostgreSQL WAL position the moment a write is performed. This pointer is then stored in Redis for a short duration at the end of a request. Each user is given their own key so that the actions of one user won't lead to all other users being affected. In the next request we get the pointer and compare this with all the secondaries. If all secondaries have a WAL pointer that exceeds our pointer we know they are in sync and we can safely use a secondary for our read-only queries. If one or more secondaries are not yet in sync we will continue using the primary until they are in sync. If no write is performed for 30 seconds and all the secondaries are still not in sync we'll revert to using the secondaries in order to prevent somebody from ending up running queries on the primary forever. + +Checking if a secondary has caught up is quite simple and is implemented in `Gitlab::Database::LoadBalancing::Host#caught_up?` as follows: + +``` +def caught_up?(location) + string = connection.quote(location) + + query = "SELECT NOT pg_is_in_recovery() OR " \ + "pg_xlog_location_diff(pg_last_xlog_replay_location(), #{string}) >= 0 AS result" + + row = connection.select_all(query).first + + row && row['result'] == 't' +ensure + release_connection +end + +``` + +Most of the code here is standard Rails code to run raw queries and grab the results. The most interesting part is the query itself, which is as follows: + +``` +SELECT NOT pg_is_in_recovery() +OR pg_xlog_location_diff(pg_last_xlog_replay_location(), WAL-POINTER) >= 0 AS result" + +``` + +Here `WAL-POINTER` is the WAL pointer as returned by the PostgreSQL function `pg_current_xlog_insert_location()`, which is executed on the primary. In the above code snippet the pointer is passed as an argument, which is then quoted/escaped and passed to the query. + +Using the function `pg_last_xlog_replay_location()` we can get the WAL pointer of a secondary, which we can then compare to our primary pointer using `pg_xlog_location_diff()`. If the result is greater than 0 we know the secondary is in sync. + +The check `NOT pg_is_in_recovery()` is added to ensure the query won't fail when a secondary that we're checking was  _just_  promoted to a primary and our GitLab process is not yet aware of this. In such a case we simply return `true` since the primary is always in sync with itself. + +### Background processing + +Our background processing code  _always_  uses the primary since most of the work performed in the background consists of writes. Furthermore we can't reliably use a hot-standby as we have no way of knowing whether a job should use the primary or not as many jobs are not directly tied into a user. + +### Connection errors + +To deal with connection errors our load balancer will not use a secondary if it is deemed to be offline, plus connection errors on any host (including the primary) will result in the load balancer retrying the operation a few times. This ensures that we don't immediately display an error page in the event of a hiccup or a database failover. While we also deal with [hot standby conflicts][14] on the load balancer level we ended up enabling `hot_standby_feedback` on our secondaries as doing so solved all hot-standby conflicts without having any negative impact on table bloat. + +The procedure we use is quite simple: for a secondary we'll retry a few times with no delay in between. For a primary we'll retry the operation a few times using an exponential backoff. + +For more information you can refer to the source code in GitLab EE: + +* [https://gitlab.com/gitlab-org/gitlab-ee/tree/master/lib/gitlab/database/load_balancing.rb][3] + +* [https://gitlab.com/gitlab-org/gitlab-ee/tree/master/lib/gitlab/database/load_balancing][4] + +Database load balancing was first introduced in GitLab 9.0 and  _only_  supports PostgreSQL. More information can be found in the [9.0 release post][15] and the [documentation][16]. + +### Crunchy Data + +In parallel to working on implementing connection pooling and load balancing we were working with [Crunchy Data][17]. Until very recently I was the only [database specialist][18] which meant I had a lot of work on my plate. Furthermore my knowledge of PostgreSQL internals and its wide range of settings is limited (or at least was at the time), meaning there's only so much I could do. Because of this we hired Crunchy to help us out with identifying problems, investigating slow queries, proposing schema optimisations, optimising PostgreSQL settings, and much more. + +For the duration of this cooperation most work was performed in confidential issues so we could share private data such as log files. With the cooperation coming to an end we have removed sensitive information from some of these issues and opened them up to the public. The primary issue was [gitlab-com/infrastructure#1448][19], which in turn led to many separate issues being created and resolved. + +The benefit of this cooperation was immense as it helped us identify and solve many problems, something that would have taken me months to identify and solve if I had to do this all by myself. + +Fortunately we recently managed to hire our [second database specialist][20] and we hope to grow the team more in the coming months. + +### Combining connection pooling and database load balancing + +Combining connection pooling and database load balancing allowed us to drastically reduce the number of resources necessary to run our database cluster as well as spread load across our hot-standby servers. For example, instead of our primary having a near constant CPU utilisation of 70 percent today it usually hovers between 10 percent and 20 percent, while our two hot-standby servers hover around 20 percent most of the time: + +![CPU Percentage](https://about.gitlab.com/images/scaling-the-gitlab-database/cpu-percentage.png) + +Here `db3.cluster.gitlab.com` is our primary while the other two hosts are our secondaries. + +Other load-related factors such as load averages, disk usage, and memory usage were also drastically improved. For example, instead of the primary having a load average of around 20 it barely goes above an average of 10: + +![CPU Percentage](https://about.gitlab.com/images/scaling-the-gitlab-database/load-averages.png) + +During the busiest hours our secondaries serve around 12 000 transactions per second (roughly 740 000 per minute), while the primary serves around 6 000 transactions per second (roughly 340 000 per minute): + +![Transactions Per Second](https://about.gitlab.com/images/scaling-the-gitlab-database/transactions.png) + +Unfortunately we don't have any data on the transaction rates prior to deploying pgbouncer and our database load balancer. + +An up-to-date overview of our PostgreSQL statistics can be found at our [public Grafana dashboard][21]. + +Some of the settings we have set for pgbouncer are as follows: + +| Setting | Value | +| --- | --- | +| default_pool_size | 100 | +| reserve_pool_size | 5 | +| reserve_pool_timeout | 3 | +| max_client_conn | 2048 | +| pool_mode | transaction | +| server_idle_timeout | 30 | + +With that all said there is still some work left to be done such as: implementing service discovery ([#2042][22]), improving how we check if a secondary is available ([#2866][23]), and ignoring secondaries that are too far behind the primary ([#2197][24]). + +It's worth mentioning that we currently do not have any plans of turning our load balancing solution into a standalone library that you can use outside of GitLab, instead our focus is on providing a solid load balancing solution for GitLab EE. + +If this has gotten you interested and you enjoy working with databases, improving application performance, and adding database-related features to GitLab (such as [service discovery][25]) you should definitely check out the [job opening][26] and the [database specialist handbook entry][27] for more information. + +-------------------------------------------------------------------------------- + +via: https://about.gitlab.com/2017/10/02/scaling-the-gitlab-database/ + +作者:[Yorick Peterse ][a] +译者:[译者ID](https://github.com/译者ID) +校对:[校对者ID](https://github.com/校对者ID) + +本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 + +[a]:https://about.gitlab.com/team/#yorickpeterse +[1]:https://pgbouncer.github.io/ +[2]:http://pgpool.net/mediawiki/index.php/Main_Page +[3]:https://gitlab.com/gitlab-org/gitlab-ee/tree/master/lib/gitlab/database/load_balancing.rb +[4]:https://gitlab.com/gitlab-org/gitlab-ee/tree/master/lib/gitlab/database/load_balancing +[5]:https://www.postgresql.org/docs/9.6/static/runtime-config-connection.html#GUC-MAX-CONNECTIONS +[6]:https://www.postgresql.org/docs/9.6/static/hot-standby.html +[7]:https://www.citusdata.com/ +[8]:https://gitlab.com/gitlab-com/infrastructure/issues/259#note_23464570 +[9]:https://docs.gitlab.com/ee/administration/high_availability/alpha_database.html +[10]:https://github.com/taskrabbit/makara +[11]:https://github.com/taskrabbit/makara/issues/151 +[12]:https://github.com/thiagopradi/octopus +[13]:https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/1283 +[14]:https://www.postgresql.org/docs/current/static/hot-standby.html#HOT-STANDBY-CONFLICT +[15]:https://about.gitlab.com/2017/03/22/gitlab-9-0-released/ +[16]:https://docs.gitlab.com/ee/administration/database_load_balancing.html +[17]:https://www.crunchydata.com/ +[18]:https://about.gitlab.com/handbook/infrastructure/database/ +[19]:https://gitlab.com/gitlab-com/infrastructure/issues/1448 +[20]:https://gitlab.com/_stark +[21]:http://monitor.gitlab.net/dashboard/db/postgres-stats?refresh=5m&orgId=1 +[22]:https://gitlab.com/gitlab-org/gitlab-ee/issues/2042 +[23]:https://gitlab.com/gitlab-org/gitlab-ee/issues/2866 +[24]:https://gitlab.com/gitlab-org/gitlab-ee/issues/2197 +[25]:https://gitlab.com/gitlab-org/gitlab-ee/issues/2042 +[26]:https://about.gitlab.com/jobs/specialist/database/ +[27]:https://about.gitlab.com/handbook/infrastructure/database/ From 9ecb10eb9d19481e4d708835a97c1c00f629045b Mon Sep 17 00:00:00 2001 From: Ezio Date: Wed, 11 Oct 2017 21:18:48 +0800 Subject: [PATCH 29/79] =?UTF-8?q?20171011-17=20=E9=80=89=E9=A2=98?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- ... Redis 3 while not taking the site down.md | 57 +++++++++++++++++++ 1 file changed, 57 insertions(+) create mode 100644 sources/tech/20170925 Our journey from Redis 2 to Redis 3 while not taking the site down.md diff --git a/sources/tech/20170925 Our journey from Redis 2 to Redis 3 while not taking the site down.md b/sources/tech/20170925 Our journey from Redis 2 to Redis 3 while not taking the site down.md new file mode 100644 index 0000000000..e3f531e0f8 --- /dev/null +++ b/sources/tech/20170925 Our journey from Redis 2 to Redis 3 while not taking the site down.md @@ -0,0 +1,57 @@ +Our journey from Redis 2 to Redis 3 while not taking the site down. +============================================================ + +We use [Redis][2] within Sky Betting & Gaming as a shared in-memory cache for things like identity tokens that need to be known across API servers, or web servers. Within the Core Tribe this is used to help deal with the huge number of logins we have to handle day to day and particularly at busy times when we could have more than 20,000 people logging in within a single minute. This works well in so far as the data is readily available to a large number of servers (in the case of SSO tokens 70 Apache HTTPD servers). We’ve recently embarked upon a process of upgrading our Redis servers, and this upgrade is intended to enable the use of the native clustering features available from Redis 3.2\. This blog post hopes to explain why we’re using clustering, what problems we have encountered along the way, and what our solutions have been. + +### In the beginning (or at least before the upgrade) + +Our legacy caches consisted of a pair of Redis servers for each cache that we had, with keepalived running to ensure that there was always a master node, listening on a floating IP address. These failover pairs required considerable effort to manage when things went wrong, and the failure modes were sometimes quite interesting. On occasion the slave node, that would only allow reads of the data it held, and not writes, would end up with the floating IP address, which was relatively easy to diagnose, but broke whichever application was trying to use that cache at the time in painful ways. + +### The new application + +So whilst in this situation we needed to build a new application, one that used a shared in-memory cache, but that we didn’t want to be at the mercy of a dodgy failover process for that cache. So our requirements were a shared in-memory cache, with no single point of failure, that could cope with multiple different failure modes using as little human intervention as possible, and also recover after those events cleanly, also with little human intervention, an additional ask was to improve the security of the cache to reduce the scope for data exfiltration (more on that later). At the time Redis Sentinel was looking promising, and there were a number of applications floating about to allow proxying of Redis connections such as [twemproxy][3]. This would have lead to a setup with many moving parts, it should have worked, with minimal human interaction, but it was complex and needed a large number of servers and services running and communicating with each other. + +![Redis Sentinel and TwemProxy](http://engineering.skybettingandgaming.com/images/Redis-Sentinel-and-TwemProxy.svg) + +There would be a number of application servers talking to twemproxy, which would route their calls to an appropriate Redis master, and twemproxy would get the information on the masters from a sentinal cluster, which would control which Redis instances were master and which were slave. This setup, as well as being complex, still had a single point of failure, it relied on twemproxy to handle sharding, and connections to the correct Redis instance. It had the advantage of being transparent to the application so we could in theory, having built this, have moved existing applications over to this Redis configuration without changing the application. But we were building an application from scratch, so migration of an application wasn’t a requirement, yet. + +Fortunately it was at this time that Redis 3.2 came out, and that had native clustering built in, removing the need for a separate sentinel cluster. + +![Redis3 Cluster and Twemproxy](http://engineering.skybettingandgaming.com/images/Redis3-Cluster-and-Twemproxy.svg) + +This allowed for a simpler set up, but alas twemproxy didn’t support Redis cluster sharding, it could shard data for you, but if it tried to do so in a manor inconsistent with the cluster sharding it would cause problems. There were guides available to make it match up, but the cluster could change shape automatically and change the way the sharding was set up. And it still had a single point of failure. It is at this point that I will be forever grateful to one of my colleagues who found a Redis cluster aware driver for Node.js, allowing us to drop twemproxy altogether. + +![Redis3 Cluster](http://engineering.skybettingandgaming.com/images/Redis3-Cluster.svg) + +With this we were able to shard data automatically, and failovers and failbacks were largely automatic. The application knew which nodes existed, and when writing data if they wrote to the wrong node the cluster would redirect that write automatically. This was the configuration that was chosen, and it worked we had a shared in-memory cache that was reasonably robust, and could cope with basic failure modes without intervention. During testing we did find some flaws. Replication was on a node by node basis, so if we lost a master node, then its slave became a single point of failure until the dead node was restored into service, also only the masters voted on the cluster health, so if we lost too many masters too quickly the cluster wouldn’t self heal. But this was better than we had. + +### Moving forward + +With a new application, using a clustered Redis configuration, we became increasingly uncomfortable with the state of the legacy Redis instances, but the new applicaction simply wasn’t of the same scale as the existing applications (over 30GB of memory is dedicated to the database of our largest legacy Redis instance). So with Redis cluster proven at a low level we decided to migrate to off the legacy Redis instances to new Redis clusters. + +As we had a Node.js Redis driver that supported Redis cluster natively, we started with the migration of our Node.js applications onto Redis cluster. But how do you go about moving tens of gigabytes of constantly shifting data from one place to another, without causing major problems? Especially given these bits of data are things like authentication tokens, so if they were wrong our end users would be logged out. One option was to ask for a full site outage, point everything over to the new Redis cluster and migrate the data into it and hope for the best. Another option was to switch over to the new cluster and force all our users to login again. Neither of these proved to be terribly palatable, for obvious reasons. The alternative that was decided upon was to write the data to both the legacy Redis instance, and the cluster that was replacing it, at the same time, we would then read from the cluster increasingly more often as time went on. As the data has a limited shelf life (tokens expire after a few hours) this approach should result in zero downtime, and no risk of data loss. And so it was. The migration was a success. + +All that remained was the Redis instances that served our PHP code (well one of them anyway, the other turned out to be unnecessary in the end) and we hit a thorn, actually a couple, in the road. First, and most pressing, was finding a Redis cluster aware driver that we could use in PHP, and the version of PHP we were using. This proved to be something that was doable, because we had upgraded to a recent version of PHP. Alas the driver we chose did not like using Redis auth, something we had decided to use with Redis cluster as an extra security step (I told you there would be more on that security thing). As we were replacing each legacy Redis instance with it’s own Redis cluster the fix seemed straight forward, turn Redis auth off, and all would be well with the world. However this did not prove to be true, for some reason having done this the Redis cluster wouldn’t accept connections from the web servers. A new security feature introduced by Redis in version 3 called protected mode would stop Redis listening to connections from external IP addresses when Redis was bound to any interface, and no Redis auth password was configured. This proved reasonably easy to fix, but caught us off guard. + +### And now? + +So this is where we find ourselves. We have migrated off some of our legacy Redis instances, and are migrating off the rest. We have, by way of doing this solved some of our technical debt, and improved our platform’s stability. With Redis cluster we can also scale out the in-memory databases as well as scale them up. Redis is single threaded, so just throwing more memory at a single instance is only ever going to allow so much growth, and we are already nipping at the heels of that limit. We’re expecting an improved perfromance from the new cluster, as well as it giving us more options for expansion and load balancing. + +### What about the Future? + +So we have solved some technical debt, and made our services easier to support, and more stable. That doesn’t mean the job is done, indeed Redis 4 appears to have some features that we may want to look into. And Redis isn’t the only software we use. We will continue to work to improve the platform, and reduce the time spent dealing with technical debt, but as our customer base expands, and we strive to offer ever richer services, we are always going to end up with things that need improving. The next challenge is likely to be related to scaling from more than 20,000 logins a minute to more than 40,000, and even beyond. + +-------------------------------------------------------------------------------- + +via: http://engineering.skybettingandgaming.com/2017/09/25/redis-2-to-redis-3/ + +作者:[ Craig Stewart][a] +译者:[译者ID](https://github.com/译者ID) +校对:[校对者ID](https://github.com/校对者ID) + +本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 + +[a]:http://engineering.skybettingandgaming.com/authors#craig_stewart +[1]:http://engineering.skybettingandgaming.com/category/devops/ +[2]:https://redis.io/ +[3]:https://github.com/twitter/twemproxy From 07e38732380443a071a714d197686fa44fa2c5ad Mon Sep 17 00:00:00 2001 From: sugarfillet <18705174754@163.com> Date: Wed, 11 Oct 2017 22:01:55 +0800 Subject: [PATCH 30/79] " 21070711 Functional testing Gtk+ applications in C " translated by sugarfillet (#6122) * translated by sugarfillet 20170711 * Update 20170711 Functional testing Gtk applications in C.md --- ...unctional testing Gtk applications in C.md | 48 +++++++++---------- 1 file changed, 24 insertions(+), 24 deletions(-) rename {sources => translated}/tech/20170711 Functional testing Gtk applications in C.md (68%) diff --git a/sources/tech/20170711 Functional testing Gtk applications in C.md b/translated/tech/20170711 Functional testing Gtk applications in C.md similarity index 68% rename from sources/tech/20170711 Functional testing Gtk applications in C.md rename to translated/tech/20170711 Functional testing Gtk applications in C.md index 717796f8c4..55a35956a7 100644 --- a/sources/tech/20170711 Functional testing Gtk applications in C.md +++ b/translated/tech/20170711 Functional testing Gtk applications in C.md @@ -1,23 +1,22 @@ -translating by sugarfillet -Functional testing Gtk+ applications in C -============================================================ + C 语言对 Gtk+ 应用进行功能测试 +======== +### 这个简单教程教你如何测试你应用的功能 -### Learn how to test your application's function with this simple tutorial. +![Functional testing Gtk+ applications in C](https://opensource.com/sites/default/files/styles/image-full-size/public/images/business/cube_innovation_block_collaboration.png?itok=CbG3Mpqi "Functional testing Gtk+ applications in C ") - -![Functional testing Gtk+ applications in C ](https://opensource.com/sites/default/files/styles/image-full-size/public/images/business/cube_innovation_block_collaboration.png?itok=CbG3Mpqi "Functional testing Gtk+ applications in C ") -Image by :  +图片源自 :  opensource.com -Automated tests are required to ensure your program's quality and that it works as expected. Unit tests examine only certain parts of your algorithm, but don't look at how each component fits together. That's where functional testing, sometimes referred as integration testing, comes in. -A functional test basically interacts with your user interface, whether through a website or a desktop application. To show you how that works, let's look at how to test a Gtk+ application. For simplicity, in this tutorial let's use the [Tictactoe][6] example from the Gtk+ 2.0 tutorial. +自动化测试用来保证你程序的质量以及让它以预想的运行。单元测试只是检测你算法的某一部分,但是并不注重各组件间的适应性。这就是为什么会有功能测试,有时也称为集成测试。 -### Basic setup -For every functional test, you usually define some global variables, such as "user interaction delay" or "timeout until a failure is indicated" (i.e., when an event doesn't occur until the specified time and the application is doomed). +一个功能测试简单地与你的用户界面交互,可通过一个网站或者一个桌面应用。为了展示功能测试如何工作,我们以测试一个 Gtk+ 应用为例。为了简单,这个教程里,我们使用 Gtk+ 2.0 教程的示例。 +### 基础设置 + +每一个功能测试,你通常需要定义一些全局变量,比如 “用户交互时延” 或者 “失败的超时时间”(也就是说,如果在指定的时间内一个时间没有发生,程序就要中断)。 ``` #define TTT_FUNCTIONAL_TEST_UTIL_IDLE_CONDITION(f) ((TttFunctionalTestUtilIdleCondition)(f)) #define TTT_FUNCTIONAL_TEST_UTIL_REACTION_TIME (125000) @@ -29,8 +28,7 @@ struct timespec ttt_functional_test_util_default_timeout = { }; ``` -Now we can implement our dead-time functions. Here, we'll use the **usleep** function in order to get the desired delay. - +现在我们可以实现我们自己的超时函数。这里,为了能够得到期望的延迟,我们采用 **usleep** 函数。 ``` void ttt_functional_test_util_reaction_time() @@ -45,7 +43,7 @@ ttt_functional_test_util_reaction_time_long() } ``` -The timeout function delays execution until a state of a control is applied. It is useful for actions that are applied asynchronously, and that is why it delays for a longer period of time. +直到控制状态被执行,超时函数才会推迟执行。这对于一个异步执行的动作很有帮助,这也是为什么采用这么长的时延。 ``` void @@ -74,9 +72,10 @@ ttt_functional_test_util_idle_condition_and_timeout( } ``` -### Interacting with the graphical user interface +### 与图形化用户界面交互 -In order to simulate user interaction, the [**Gdk library**][7] provides the functions we need. To do our work here, we need only these three functions: + +为了模拟用户交互的操作, [**Gdk library**][7] 提供一些我们需要的函数。为了完成我们的工作,我们只需要如下 3 个函数。 * gdk_display_warp_pointer() @@ -84,8 +83,8 @@ In order to simulate user interaction, the [**Gdk library**][7] provides the f * gdk_test_simulate_key() -For instance, to test a button click, we do the following: +举个例子,为了测试按钮点击,我们可以这么做: ``` gboolean ttt_functional_test_util_button_click(GtkButton *button) @@ -151,7 +150,8 @@ ttt_functional_test_util_button_click(GtkButton *button) } ``` -We want to ensure the button has an active state, so we provide an idle-condition function: + +我们想要保证按钮处于激活状态,因此我们提供一个空闲条件函数: ``` gboolean @@ -176,12 +176,12 @@ ttt_functional_test_util_idle_test_toggle_active( } ``` -### The test scenario -Since the Tictactoe program is very simple, we just need to ensure that a [**GtkToggleButton**][8] was clicked. The functional test can proceed once it asserts the button entered the active state. To click the buttons, we use the handy **util** function provided above. +### 测试场景 -For illustration, let's assume player A wins immediately by filling the very first row, because player B is not paying attention and just filled the second row: +因为 Tictactoe 程序非常简单,我们只需要保证一个 [**GtkToggleButton**][8] 被点击。一旦说按钮肯定进入了激活状态,功能测试就可以执行。为了点击按钮,我们使用上面提到的手动的 **工具** 。 +如图所示,我们假设,填满第一行,玩家 A 就赢,因为玩家 B 没有注意,只填充了第二行。 ``` GtkWindow *window; Tictactoe *ttt; @@ -269,13 +269,13 @@ main(int argc, char **argv) 作者简介: -Joël Krähemann - Free software enthusiast with a strong knowledge about the C programming language. I don't fear any code complexity as long it is written in a simple manner. As developer of Advanced Gtk+ Sequencer I know how challenging multi-threaded applications can be and with it we have a great basis for future demands.my personal website +Joël Krähemann - 精通 C 语言编程的自由软件爱好者。我不怕代码有多复杂,它只是以一种简单的方法去编码。作为高级的 Gtk+ 序的开发者,我知道多线程编程有着多大的挑战性,有了多线程编程,我们就有了未来需求的良好基础。 -via: https://opensource.com/article/17/7/functional-testing +摘自: https://opensource.com/article/17/7/functional-testing 作者:[Joël Krähemann][a] -译者:[译者ID](https://github.com/译者ID) +译者:[sugarfillet](https://github.com/sugarfillet) 校对:[校对者ID](https://github.com/校对者ID) 本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 From 770d3f2da5fd00b84a071bbe8c2de602425ee492 Mon Sep 17 00:00:00 2001 From: geekpi Date: Thu, 12 Oct 2017 08:42:14 +0800 Subject: [PATCH 31/79] translated --- ... UP – deploy serverless apps in seconds.md | 539 ------------------ ... UP – deploy serverless apps in seconds.md | 539 ++++++++++++++++++ 2 files changed, 539 insertions(+), 539 deletions(-) delete mode 100644 sources/tech/20170811 UP – deploy serverless apps in seconds.md create mode 100644 translated/tech/20170811 UP – deploy serverless apps in seconds.md diff --git a/sources/tech/20170811 UP – deploy serverless apps in seconds.md b/sources/tech/20170811 UP – deploy serverless apps in seconds.md deleted file mode 100644 index ae0e48fb83..0000000000 --- a/sources/tech/20170811 UP – deploy serverless apps in seconds.md +++ /dev/null @@ -1,539 +0,0 @@ -translating----geekpi - -UP – deploy serverless apps in seconds -============================================================ - -![](https://cdn-images-1.medium.com/max/2000/1*8KijrYCm1j0_XvrACQD_fQ.png) - -Last year I wrote [Blueprints for Up][1], describing how most of the building blocks are available to create a great serverless experience on AWS with minimal effort. This post talks about the initial alpha release of [Up][2]. - -Why focus on serverless? For starters it’s cost-effective since you pay on-demand, only for what you use. Serverless options are self-healing, as each request is isolated and considered to be “stateless.” And finally it scales indefinitely with ease — there are no machines or clusters to manage. Deploy your code and you’re done. - -Roughly a month ago I decided to start working on it over at [apex/up][3], and wrote the first small serverless sample application [tj/gh-polls][4] for live SVG GitHub user polls. It worked well and costs less than $1/month to serve millions of polls, so I thought I’d go ahead with the project and see if I can offer open-source and commercial variants. - -The long-term goal is to provide a “Bring your own Heroku” of sorts, supporting many platforms. While Platform-as-a-Service is nothing new, the serverless ecosystem is making this kind of program increasingly trivial. This said, AWS and others often suffer in terms of UX due to the flexibility they provide. Up abstracts the complexity away, while still providing you with a virtually ops-free solution. - -### Installation - -You can install Up with the following command, and view the [temporary documentation][5] to get started. Or if you’re sketched out by install scripts, grab a [binary release][6]. (Keep in mind that this project is still early on.) - -``` -curl -sfL https://raw.githubusercontent.com/apex/up/master/install.sh | sh -``` - -To upgrade to the latest version at any time just run: - -``` -up upgrade -``` - -You may also install via NPM: - -``` -npm install -g up -``` - -### Features - -What features does the early alpha provide? Let’s take a look! Keep in mind that Up is not a hosted service, so you’ll need an AWS account and [AWS credentials][8]. If you’re not familiar at all with AWS you may want to hold off until that process is streamlined. - -The first question I always get is: how does up(1) differ from [apex(1)][9]? Apex focuses on deploying functions, for pipelines and event processing, while Up focuses on apps, apis, and static sites, aka single deployable units. Apex does not provision API Gateway, SSL certs, or DNS for you, nor does it provide URL rewriting, script injection and so on. - -#### Single command serverless apps - -Up lets you deploy apps, apis, and static sites with a single command. To create an application all you need is a single file, in the case of Node.js, an `./app.js` listening on `PORT` which is provided by Up. Note that if you’re using a `package.json` Up will detect and utilize the `start` and `build`scripts. - -``` -const http = require('http') -const { PORT = 3000 } = process.env -``` - -``` -http.createServer((req, res) => { - res.end('Hello World\n') -}).listen(PORT) -``` - -Additional [runtimes][10] are supported out of the box, such as `main.go` for Golang, so you can deploy Golang, Python, Crystal, or Node.js applications in seconds. - -``` -package main -``` - -``` -import ( - "fmt" - "log" - "net/http" - "os" -) -``` - -``` -func main() { - addr := ":" + os.Getenv("PORT") - http.HandleFunc("/", hello) - log.Fatal(http.ListenAndServe(addr, nil)) -} -``` - -``` -func hello(w http.ResponseWriter, r *http.Request) { - fmt.Fprintln(w, "Hello World from Go") -} -``` - -To deploy the application type `up` to create the resources required, and deploy the application itself. There are no smoke and mirrors here, once it says “complete”, you’re done, the app is immediately available — there is no remote build process. - - ** 此处有Canvas,请手动处理 ** - -![](https://cdn-images-1.medium.com/max/2000/1*tBYR5HXeDDVkb_Pv2MCj1A.png) - -The subsequent deploys will be even quicker since the stack is already provisioned: - - ** 此处有Canvas,请手动处理 ** - -![](https://cdn-images-1.medium.com/max/2000/1*2w2WHDTfTT-7GsMtNPklXw.png) - -Test out your app with `up url --open` to view it in the browser, `up url --copy` to save the URL to the clipboard, or try it with curl: - -``` -curl `up url` -Hello World -``` - -To delete the app and its resources just type `up stack delete`: - - ** 此处有Canvas,请手动处理 ** - -![](https://cdn-images-1.medium.com/max/2000/1*FUdhBTtDHaZ2CEPHR7PGqg.png) - -Deploy to the staging or production environments using `up staging` or `up production` , and `up url --open production` for example. Note that custom domains are not yet available, [they will be shortly][11]. Later you’ll also be able to “promote” a release to other stages. - -#### Reverse proxy - -One feature which makes Up unique is that it doesn’t just simply deploy your code, it places a Golang reverse proxy in front of your application. This provides many features such as URL rewriting, redirection, script injection and more, which we’ll look at further in the post. - -#### Infrastructure as code - -Up follows modern best practices in terms of configuration, as all changes to the infrastructure can be previewed before applying, and the use of IAM policies can also restrict developer access to prevent mishaps. A side benefit is that it helps self-document your infrastructure as well. - -Here’s an example of configuring some (dummy) DNS records and free SSL certificates via AWS ACM which utilizes LetsEncrypt. - -``` -{ - "name": "app", - "dns": { - "myapp.com": [ - { - "name": "myapp.com", - "type": "A", - "ttl": 300, - "value": ["35.161.83.243"] - }, - { - "name": "blog.myapp.com", - "type": "CNAME", - "ttl": 300, - "value": ["34.209.172.67"] - }, - { - "name": "api.myapp.com", - "type": "A", - "ttl": 300, - "value": ["54.187.185.18"] - } - ] - }, - "certs": [ - { - "domains": ["myapp.com", "*.myapp.com"] - } - ] -} -``` - -When you deploy the application the first time via `up` all the permissions required, API Gateway, Lambda function, ACM certs, Route53 DNS records and others are created for you. - -[ChangeSets][12] are not yet implemented but you will be able to preview further changes with `up stack plan` and commit them with `up stack apply`, much like you would with Terraform. - -Check out the [configuration documentation][13] for more information. - -#### Global deploys - -The `regions` array allows you to specify target regions for your app. For example if you’re only interested in a single region you’d use: - -``` -{ - "regions": ["us-west-2"] -} -``` - -If your customers are concentrated in North America, you may want to use all of the US and CA regions: - -``` -{ - "regions": ["us-*", "ca-*"] -} -``` - -Lastly of course you can target all 14 regions currently supported: - -``` -{ - "regions": ["*"] -} -``` - -Multi-region support is still a work-in-progress as a few new AWS features are required to tie things together. - -#### Static file serving - -Up supports static file serving out of the box, with HTTP cache support, so you can use CloudFront or any other CDN in front of your application to dramatically reduce latency. - -By default the working directory is served (`.`) when `type` is “static”, however you may provide a `static.dir` as well: - -``` -{ "name": "app", "type": "static", "static": { "dir": "public" }} -``` - -#### Build hooks - -The build hooks allow you to define custom actions when deploying or performing other operations. A common example would be to bundle Node.js apps using Webpack or Browserify, greatly reducing the file size, as node_modules is  _huge_ . - -``` -{ - "name": "app", - "hooks": { - "build": "browserify --node server.js > app.js", - "clean": "rm app.js" - } -} -``` - -#### Script and stylesheet injection - -Up allows you to inject scripts and styles, either inline or paths in a declarative manner. It even supports a number of “canned” scripts for Google Analytics and [Segment][14], just copy & paste your write key. - -``` -{ - "name": "site", - "type": "static", - "inject": { - "head": [ - { - "type": "segment", - "value": "API_KEY" - }, - { - "type": "inline style", - "file": "/css/primer.css" - } - ], - "body": [ - { - "type": "script", - "value": "/app.js" - } - ] - } -} -``` - -#### Rewrites and redirects - -Up supports redirects and URL rewriting via the `redirects` object, which maps path patterns to a new location. If `status` is omitted (or 200) then it is a rewrite, otherwise it is a redirect. - -``` -{ - "name": "app", - "type": "static", - "redirects": { - "/blog": { - "location": "https://blog.apex.sh/", - "status": 301 - }, - "/docs/:section/guides/:guide": { - "location": "/help/:section/:guide", - "status": 302 - }, - "/store/*": { - "location": "/shop/:splat" - } - } -} -``` - -A common use-case for rewrites is for SPAs (Single Page Apps), where you want to serve the `index.html` file regardless of the path. Unless of course the file exists. - -``` -{ - "name": "app", - "type": "static", - "redirects": { - "/*": { - "location": "/", - "status": 200 - } - } -} -``` - -If you want to force the rule regardless of a file existing, just add `"force": true` . - -#### Environment variables - -Secrets will be in the next release, however for now plain-text environment variables are supported: - -``` -{ - "name": "api", - "environment": { - "API_FEATURE_FOO": "1", - "API_FEATURE_BAR": "0" - } -} -``` - -#### CORS support - -The [CORS][16] support allows you to to specify which (if any) domains can access your API from the browser. If you wish to allow any site to access your API, just enable it: - -``` -{ - "cors": { - "enable": true - } -} -``` - -You can also customize access, for example restricting API access to your front-end or SPA only. - -``` -{ - "cors": { - "allowed_origins": ["https://myapp.com"], - "allowed_methods": ["HEAD", "GET", "POST", "PUT", "DELETE"], - "allowed_headers": ["Content-Type", "Authorization"] - } -} -``` - -#### Logging - -For the low price of $0.5/GB you can utilize CloudWatch logs for structured log querying and tailing. Up implements a custom [query language][18] used to improve upon what CloudWatch provides, purpose-built for querying structured JSON logs. - - ** 此处有Canvas,请手动处理 ** - -![](https://cdn-images-1.medium.com/max/2000/1*hrON4pH_WzN6CajaiU-ZYw.png) - -You can query existing logs: - -``` -up logs -``` - -Tail live logs: - -``` -up logs -f -``` - -Or filter on either of them, for example only showing 200 GET / HEAD requests that take more than 5 milliseconds to complete: - -``` -up logs 'method in ("GET", "HEAD") status = 200 duration >= 5' -``` - - ** 此处有Canvas,请手动处理 ** - -![](https://cdn-images-1.medium.com/max/1600/1*Nhc5eiMM24gbiICFW7kBLg.png) - -The query language is quite flexible, here are some more examples from `up help logs` - -``` -Show logs from the past 5 minutes. -$ up logs -``` - -``` -Show logs from the past 30 minutes. -$ up logs -s 30m -``` - -``` -Show logs from the past 5 hours. -$ up logs -s 5h -``` - -``` -Show live log output. -$ up logs -f -``` - -``` -Show error logs. -$ up logs error -``` - -``` -Show error and fatal logs. -$ up logs 'error or fatal' -``` - -``` -Show non-info logs. -$ up logs 'not info' -``` - -``` -Show logs with a specific message. -$ up logs 'message = "user login"' -``` - -``` -Show 200 responses with latency above 150ms. -$ up logs 'status = 200 duration > 150' -``` - -``` -Show 4xx and 5xx responses. -$ up logs 'status >= 400' -``` - -``` -Show emails containing @apex.sh. -$ up logs 'user.email contains "@apex.sh"' -``` - -``` -Show emails ending with @apex.sh. -$ up logs 'user.email = "*@apex.sh"' -``` - -``` -Show emails starting with tj@. -$ up logs 'user.email = "tj@*"' -``` - -``` -Show errors from /tobi and /loki -$ up logs 'error and (path = "/tobi" or path = "/loki")' -``` - -``` -Show the same as above with 'in' -$ up logs 'error and path in ("/tobi", "/loki")' -``` - -``` -Show logs with a more complex query. -$ up logs 'method in ("POST", "PUT") ip = "207.*" status = 200 duration >= 50' -``` - -``` -Pipe JSON error logs to the jq tool. -$ up logs error | jq -``` - -Note that the `and` keyword is implied, though you can use it if you prefer. - -#### Cold start times - -This is a property of AWS Lambda as a platform, but the cold start times are typically well below 1 second, and in the future I plan on providing an option to keep them warm. - -#### Config validation - -The `up config` command outputs the resolved configuration, complete with defaults and inferred runtime settings – it also serves the dual purpose of validating configuration, as any error will result in exit > 0. - -#### Crash recovery - -Another benefit of using Up as a reverse proxy is performing crash recovery — restarting your server upon crashes and re-attempting the request before responding to the client with an error. - -For example suppose your Node.js application crashes with an uncaught exception due to an intermittent database issue, Up can retry this request before ever responding to the client. Later this behaviour will be more customizable. - -#### Continuous integration friendly - -It’s hard to call this a feature, but thanks to Golang’s relatively small and isolated binaries, you can install Up in a CI in a second or two. - -#### HTTP/2 - -Up supports HTTP/2 out of the box via API Gateway, reducing the latency for serving apps and sites with with many assets. I’ll do more comprehensive testing against many platforms in the future, but Up’s latency is already favourable: - - ** 此处有Canvas,请手动处理 ** - -![](https://cdn-images-1.medium.com/max/1600/1*psg0kJND1UCryXEa0D3VBA.jpeg) - -#### Error pages - -Up provides a default error page which you may customize with `error_pages` if you’d like to provide a support email or tweak the color. - -``` -{ "name": "site", "type": "static", "error_pages": { "variables": { "support_email": "support@apex.sh", "color": "#228ae6" } }} -``` - -By default it looks like this: - - ** 此处有Canvas,请手动处理 ** - -![](https://cdn-images-1.medium.com/max/2000/1*_Mdj6uTCGvYTCoXsNOSD6w.png) - -If you’d like to provide custom templates you may create one or more of the following files. The most specific file takes precedence. - -* `error.html` – Matches any 4xx or 5xx - -* `5xx.html` – Matches any 5xx error - -* `4xx.html` – Matches any 4xx error - -* `CODE.html` – Matches a specific code such as 404.html - -Check out the [docs][22] to read more about templating. - -### Scaling and cost - -So you’ve made it this far, but how well does Up scale? Currently API Gateway and AWS are the target platform, so you’re not required to make any changes in order to scale, just deploy your code and it’s done. You pay only for what you actually use, on-demand, and no manual intervention is required for scaling. - -AWS offers 1,000,000 requests per month for free, but you can use [http://serverlesscalc.com][23] to plug in your expected traffic. In the future Up will provide additional platforms, so that if one becomes prohibitively expensive, you can migrate to another! - -### The Future - -That’s all for now! It may not look like much, but it’s clocking-in above 10,000 lines of code already, and I’ve just begun development. Take a look at the issue queue for a small look at what to expect in the future, assuming the project becomes sustainable. - -If you find the free version useful please consider donating on [OpenCollective][24], as I do not make any money working on it. I will be working on early access to the Pro version shortly, with a discounted annual price for early adopters. Either the Pro or Enterprise editions will provide the source as well, so internal hotfixes and customizations can be made. - --------------------------------------------------------------------------------- - -via: https://medium.freecodecamp.org/up-b3db1ca930ee - -作者:[TJ Holowaychuk ][a] -译者:[译者ID](https://github.com/译者ID) -校对:[校对者ID](https://github.com/校对者ID) - -本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 - -[a]:https://medium.freecodecamp.org/@tjholowaychuk?source=post_header_lockup -[1]:https://medium.com/@tjholowaychuk/blueprints-for-up-1-5f8197179275 -[2]:https://github.com/apex/up -[3]:https://github.com/apex/up -[4]:https://github.com/tj/gh-polls -[5]:https://github.com/apex/up/tree/master/docs -[6]:https://github.com/apex/up/releases -[7]:https://raw.githubusercontent.com/apex/up/master/install.sh -[8]:https://github.com/apex/up/blob/master/docs/aws-credentials.md -[9]:https://github.com/apex/apex -[10]:https://github.com/apex/up/blob/master/docs/runtimes.md -[11]:https://github.com/apex/up/issues/166 -[12]:https://github.com/apex/up/issues/115 -[13]:https://github.com/apex/up/blob/master/docs/configuration.md -[14]:https://segment.com/ -[15]:https://blog.apex.sh/ -[16]:https://developer.mozilla.org/en-US/docs/Web/HTTP/Access_control_CORS -[17]:https://myapp.com/ -[18]:https://github.com/apex/up/blob/master/internal/logs/parser/grammar.peg -[19]:http://twitter.com/apex -[20]:http://twitter.com/apex -[21]:http://twitter.com/apex -[22]:https://github.com/apex/up/blob/master/docs/configuration.md#error-pages -[23]:http://serverlesscalc.com/ -[24]:https://opencollective.com/apex-up diff --git a/translated/tech/20170811 UP – deploy serverless apps in seconds.md b/translated/tech/20170811 UP – deploy serverless apps in seconds.md new file mode 100644 index 0000000000..0bc8ad8846 --- /dev/null +++ b/translated/tech/20170811 UP – deploy serverless apps in seconds.md @@ -0,0 +1,539 @@ +UP - 在几秒钟内部署无服务器应用程序 +============================================================ + +![](https://cdn-images-1.medium.com/max/2000/1*8KijrYCm1j0_XvrACQD_fQ.png) + +去年,我[为 Up 写了一份蓝图][1],其中描述了大多数构建块是如何以最小的成本在 AWS 上创建一个很棒的无服务器体验。这篇文章谈到了 [Up][2] 的初始 alpha 版本。 + +为什么专注于无服务器?对于初学者来说,它可以节省成本,因为你可以按需付费,且只为你使用的付费。无服务器选项是自我修复的,因为每个请求被隔离并被认为是“无状态的”。最后,它可以无限轻松扩展 - 没有机器或集群要管理。部署你的代码就完成了。 + +大约一个月前,我决定使用 [apex/up][3],并为在线 SVG GitHub 用户调查写了第一个小型无服务器示例程序 [tj/gh-polls][4]。它运行良好,成本低于每月 1 美元,为数百万调查服务,因此我会继续这个项目,看看我是否可以提供开源和商业的变体。 + +长期的目标是提供“你自己即 Heroku” 的版本,支持许多平台。虽然平台即服务并不新鲜,但无服务器生态系统正在使这种方案日益微不足道。据说,AWS 和其他的经常因为 UX 提供的灵活性而被人诟病。Up 将复杂性抽象出来,同时为你提供一个几乎无需操作的解决方案。 + +### 安装 + +你可以使用以下命令安装 Up,查看[临时文档][5]开始使用。或者如果你使用安装脚本,请下载[二进制版本][6]。(请记住,这个项目还在早期。) + +``` +curl -sfL https://raw.githubusercontent.com/apex/up/master/install.sh | sh +``` + +只需运行以下命令随时升级到最新版本: + +``` +up upgrade +``` + +你也可以通过NPM进行安装: + +``` +npm install -g up +``` + +### 功能 + +早期 alpha 提供什么功能?让我们来看看!请记住,Up 不是托管服务,因此你需要一个 AWS 帐户和[ AWS 凭证][8]。如果你对 AWS 不熟悉,你可能需要先停下直到熟悉流程。 + +我的第一个问题是:up(1) 与 [apex(1)][9] 有何不同?Apex 专注于部署功能,用于管道和事件处理,而 Up 则侧重于应用程序、apis 和静态站点,也就是单个可部署单元。Apex 不为你提供 API 网关、SSL 证书或 DNS,也不提供 URL 重写,脚本注入等。 + +#### 单命令无服务器应用程序 + +Up 可以让你使用单条命令部署应用程序、apis 和静态站点。要创建一个应用程序,你需要的是一个文件,在 Node.js 的情况下,`./app.js` 监听由 Up 提供的 `PORT'。请注意,如果你使用的是 `package.json` ,则会检测并使用 `start`和 `build` 脚本。 + +``` +const http = require('http') +const { PORT = 3000 } = process.env +``` + +``` +http.createServer((req, res) => { + res.end('Hello World\n') +}).listen(PORT) +``` + +额外的[运行时][10]支持开箱即用,例如 Golang 的“main.go”,所以你可以在几秒钟内部署 Golang、Python、Crystal 或 Node.js 应用程序。 + +``` +package main +``` + +``` +import ( + "fmt" + "log" + "net/http" + "os" +) +``` + +``` +func main() { + addr := ":" + os.Getenv("PORT") + http.HandleFunc("/", hello) + log.Fatal(http.ListenAndServe(addr, nil)) +} +``` + +``` +func hello(w http.ResponseWriter, r *http.Request) { + fmt.Fprintln(w, "Hello World from Go") +} +``` + +要部署应用程序输入 `up` 来创建所需的资源,并部署应用程序本身。这里没有迷雾,一旦它说“完成”了,你就完成了,该应用程序立即可用 - 没有远程构建过程。 + + ** 此处有Canvas,请手动处理 ** + +![](https://cdn-images-1.medium.com/max/2000/1*tBYR5HXeDDVkb_Pv2MCj1A.png) + +后续的部署将会更快,因为栈已被配置: + + ** 此处有Canvas,请手动处理 ** + +![](https://cdn-images-1.medium.com/max/2000/1*2w2WHDTfTT-7GsMtNPklXw.png) + +使用 `up url --open` 测试你的程序,以在浏览器中浏览它,`up url --copy` 将 URL 保存到剪贴板,或者尝试使用 curl: + +``` +curl `up url` +Hello World +``` + +To delete the app and its resources just type `up stack delete`: +要删除应用程序及其资源,只需输入 `up stack delete`: + + ** 此处有Canvas,请手动处理 ** + +![](https://cdn-images-1.medium.com/max/2000/1*FUdhBTtDHaZ2CEPHR7PGqg.png) + +例如,使用 `up staging` 或 `up production` 和 `up url --open production` 部署到预发布或生产环境。请注意,自定义域名尚不可用,[它们将很快可用][11]。之后,你还可以将版本“推广”到其他环境。 + +#### 反向代理 + +一个使 Up 独特的功能是,它不仅仅是简单地部署代码,它将一个 Golang 反向代理放在应用程序的前面。这提供了许多功能,如 URL 重写、重定向、脚本注入等等,我们将在后面进一步介绍。 + +#### 基础设施即代码 + +在配置方面,Up 遵循现代最佳实践,因此多有对基础设施的更改都可以在部署之前预览,并且 IAM 策略的使用还可以限制开发人员访问以防止事故发生。一个好处是它有助于自动记录你的基础设施。 + +以下是使用 LetsEncrypt 通过 AWS ACM 配置一些(虚拟)DNS 记录和免费 SSL 证书的示例。 + +``` +{ + "name": "app", + "dns": { + "myapp.com": [ + { + "name": "myapp.com", + "type": "A", + "ttl": 300, + "value": ["35.161.83.243"] + }, + { + "name": "blog.myapp.com", + "type": "CNAME", + "ttl": 300, + "value": ["34.209.172.67"] + }, + { + "name": "api.myapp.com", + "type": "A", + "ttl": 300, + "value": ["54.187.185.18"] + } + ] + }, + "certs": [ + { + "domains": ["myapp.com", "*.myapp.com"] + } + ] +} +``` + +当你首次通过 `up` 部署应用程序时,需要所有的权限,它为你创建 API 网关、Lambda 函数、ACM 证书、Route53 DNS 记录等。 + +[ChangeSets][12] 尚未实现,但你能使用 `up stack plan` 预览进一步的更改,并使用 `up stack apply` 提交,这与 Terraform 非常相似。 + +详细信息请参阅[配置文档][13]。 + +#### 全球部署 + +`regions` 数组可以指定应用程序的目标区域。例如,如果你只对单个地区感兴趣,请使用: + +``` +{ + "regions": ["us-west-2"] +} +``` + +如果你的客户集中在北美,你可能需要使用美国和加拿大所有地区: + +``` +{ + "regions": ["us-*", "ca-*"] +} +``` + +最后,你可以使用目前支持的所有 14 个地区: + +``` +{ + "regions": ["*"] +} +``` + +多区域支持仍然是一个正在进行的工作,因为需要一些新的 AWS 功能来将它们结合在一起。 + +#### 静态文件服务 + +Up 开箱即支持静态文件服务,支持 HTTP 缓存,因此你可以在应用程序前使用 CloudFront 或任何其他 CDN 来大大减少延迟。 + +当 `type` 为 “static” 时,默认情况下的工作目录是(`.`),但是你也可以提供一个`static.dir`: + +``` +{ "name": "app", "type": "static", "static": { "dir": "public" }} +``` + +#### 构建钩子 + +构建钩子允许你在部署或执行其他操作时定义自定义操作。一个常见的例子是使用 Webpack 或 Browserify 捆绑 Node.js 应用程序,这大大减少了文件大小,因为 node_modules 是_很大_的。 + +``` +{ + "name": "app", + "hooks": { + "build": "browserify --node server.js > app.js", + "clean": "rm app.js" + } +} +``` + +#### 脚本和样式表注入 + +Up 允许你插入脚本和样式,它可以内联或声明路径。它甚至支持一些“罐头”脚本,用于 Google Analytics(分析)和 [Segment][14],只需复制并粘贴你的写入密钥即可。 + +``` +{ + "name": "site", + "type": "static", + "inject": { + "head": [ + { + "type": "segment", + "value": "API_KEY" + }, + { + "type": "inline style", + "file": "/css/primer.css" + } + ], + "body": [ + { + "type": "script", + "value": "/app.js" + } + ] + } +} +``` + +#### 重写和重定向 + +Up通过 `redirects` 对象支持重定向和 URL 重写,该对象将路径模式映射到新位置。如果省略 `status`(或200),那么它是重写,否则是重定向。 + +``` +{ + "name": "app", + "type": "static", + "redirects": { + "/blog": { + "location": "https://blog.apex.sh/", + "status": 301 + }, + "/docs/:section/guides/:guide": { + "location": "/help/:section/:guide", + "status": 302 + }, + "/store/*": { + "location": "/shop/:splat" + } + } +} +``` + +用于重写的常见情况是 SPA(单页面应用程序),你希望为 `index.html` 提供服务,而不管路径如何。当然除非文件存在。 + +``` +{ + "name": "app", + "type": "static", + "redirects": { + "/*": { + "location": "/", + "status": 200 + } + } +} +``` + +如果要强制规则,无论文件是否存在,只需添加 `"force": true` 。 + +#### 环境变量 + +密码将在下一个版本中有,但是现在支持纯文本环境变量: + +``` +{ + "name": "api", + "environment": { + "API_FEATURE_FOO": "1", + "API_FEATURE_BAR": "0" + } +} +``` + +#### CORS 支持 + +[CORS][16] 支持允许你指定哪些(如果有的话)域可以从浏览器访问你的 API。如果你希望允许任何网站访问你的 API,只需启用它: + +``` +{ + "cors": { + "enable": true + } +} +``` + +你还可以自定义访问,例如仅限制 API 访问你的前端或 SPA。 + +``` +{ + "cors": { + "allowed_origins": ["https://myapp.com"], + "allowed_methods": ["HEAD", "GET", "POST", "PUT", "DELETE"], + "allowed_headers": ["Content-Type", "Authorization"] + } +} +``` + +#### 日志 + +对于 $0.5/GB 的低价格,你可以使用 CloudWatch 日志进行结构化日志查询和跟踪。Up 实现了一种用于改进 CloudWatch 提供的自定义[查询语言][18],专门用于查询结构化 JSON 日志。 + + ** 此处有Canvas,请手动处理 ** + +![](https://cdn-images-1.medium.com/max/2000/1*hrON4pH_WzN6CajaiU-ZYw.png) + +你可以查询现有日志: + +``` +up logs +``` + +跟踪在线日志: + +``` +up logs -f +``` + +或者对其中任一个进行过滤,例如只显示耗时超过 5 毫秒的 200 个 GET/HEAD 请求: + +``` +up logs 'method in ("GET", "HEAD") status = 200 duration >= 5' +``` + + ** 此处有Canvas,请手动处理 ** + +![](https://cdn-images-1.medium.com/max/1600/1*Nhc5eiMM24gbiICFW7kBLg.png) + +查询语言是非常灵活的,这里有更多来自于 `up help logs` 的例子 + +``` +Show logs from the past 5 minutes. +$ up logs +``` + +``` +Show logs from the past 30 minutes. +$ up logs -s 30m +``` + +``` +Show logs from the past 5 hours. +$ up logs -s 5h +``` + +``` +Show live log output. +$ up logs -f +``` + +``` +Show error logs. +$ up logs error +``` + +``` +Show error and fatal logs. +$ up logs 'error or fatal' +``` + +``` +Show non-info logs. +$ up logs 'not info' +``` + +``` +Show logs with a specific message. +$ up logs 'message = "user login"' +``` + +``` +Show 200 responses with latency above 150ms. +$ up logs 'status = 200 duration > 150' +``` + +``` +Show 4xx and 5xx responses. +$ up logs 'status >= 400' +``` + +``` +Show emails containing @apex.sh. +$ up logs 'user.email contains "@apex.sh"' +``` + +``` +Show emails ending with @apex.sh. +$ up logs 'user.email = "*@apex.sh"' +``` + +``` +Show emails starting with tj@. +$ up logs 'user.email = "tj@*"' +``` + +``` +Show errors from /tobi and /loki +$ up logs 'error and (path = "/tobi" or path = "/loki")' +``` + +``` +Show the same as above with 'in' +$ up logs 'error and path in ("/tobi", "/loki")' +``` + +``` +Show logs with a more complex query. +$ up logs 'method in ("POST", "PUT") ip = "207.*" status = 200 duration >= 50' +``` + +``` +Pipe JSON error logs to the jq tool. +$ up logs error | jq +``` + +请注意,`and` 关键字是暗含的,虽然你也可以使用它。 + +#### 冷启动时间 + +这是 AWS Lambda 平台的特性, 但冷启动时间通常远远低于 1 秒, 在未来, 我计划提供一个选项来保持它们在线。 + +#### 配置验证 + +The `up config` command outputs the resolved configuration, complete with defaults and inferred runtime settings – it also serves the dual purpose of validating configuration, as any error will result in exit > 0. +`up config` 命令输出解析后的配置,有默认值和推断的运行时设置 - 它也起到验证配置的双重目的,因为任何错误都会导致 exit > 0。 + +#### 崩溃恢复 + +使用 Up 作为反向代理的另一个好处是执行崩溃恢复 - 在崩溃后重新启动服务器,并在响应客户端发生错误之前重新尝试该请求。 + +例如,假设你的 Node.js 程序由于间歇性数据库问题而导致未捕获的异常崩溃,Up 可以在响应客户端之前重试该请求。之后这个行为会更加可定制。 + +#### 持续集成友好 + +很难说这是一个功能,但是感谢 Golang 相对较小和独立的二进制文件,你可以在一两秒中在 CI 中安装 Up。 + +#### HTTP/2 + +Up 通过 API 网关支持 HTTP/2,对服务有很多资源的应用和站点减少延迟。我将来会对许多平台进行更全面的测试,但是 Up 的延迟已经很好了: + + ** 此处有Canvas,请手动处理 ** + +![](https://cdn-images-1.medium.com/max/1600/1*psg0kJND1UCryXEa0D3VBA.jpeg) + +#### 错误页面 + +Up 提供了一个默认错误页面,如果你要提供支持电子邮件或调整颜色,你可以使用 `error_pages` 自定义。 + +``` +{ "name": "site", "type": "static", "error_pages": { "variables": { "support_email": "support@apex.sh", "color": "#228ae6" } }} +``` + +默认情况下,它看上去像这样: + + ** 此处有Canvas,请手动处理 ** + +![](https://cdn-images-1.medium.com/max/2000/1*_Mdj6uTCGvYTCoXsNOSD6w.png) + +如果你想提供自定义模板,你可以创建以下一个或多个文件。特定文件优先。 + +* `error.html` – 匹配任何 4xx 或 5xx + +* `5xx.html` – 匹配任何 5xx 错误 + +* `4xx.html` – 匹配任何 4xx 错误 + +* `CODE.html` – 匹配一个特定的代码,如 404.html + +查看[文档][22]阅读更多有关模板的信息。 + +### 伸缩和成本 + +你已经做了这么多,但是 Up 规模如何?目前,API 网关和 AWS 是目标平台,因此你无需进行任何更改即可扩展,只需部署代码即可完成。你只需支付实际使用的数量、按需并且无需人工干预。 + +AWS 每月免费提供 1,000,000 个请求,但你可以使用 [http://serverlesscalc.com][23] 来插入预期流量。在未来 Up 将提供额外的平台,所以如果一个成本过高,你可以迁移到另一个! + +### 未来 + +目前为止就这样了!它可能看起来不是很多,但它已经超过 10,000 行代码,并且我刚刚开始开发。看看这个问题队列,假设项目可持续发展,看看未来会有什么期待。 + +如果你发现免费版本有用,请考虑在 [OpenCollective][24] 上捐赠 ,因为我没有任何工作。我将在短期内开发早期专业版,早期用户的年费优惠。专业或企业版也将提供源码,因此可以进行内部修复和自定义。 + +-------------------------------------------------------------------------------- + +via: https://medium.freecodecamp.org/up-b3db1ca930ee + +作者:[TJ Holowaychuk ][a] +译者:[geekpi](https://github.com/geekpi) +校对:[校对者ID](https://github.com/校对者ID) + +本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 + +[a]:https://medium.freecodecamp.org/@tjholowaychuk?source=post_header_lockup +[1]:https://medium.com/@tjholowaychuk/blueprints-for-up-1-5f8197179275 +[2]:https://github.com/apex/up +[3]:https://github.com/apex/up +[4]:https://github.com/tj/gh-polls +[5]:https://github.com/apex/up/tree/master/docs +[6]:https://github.com/apex/up/releases +[7]:https://raw.githubusercontent.com/apex/up/master/install.sh +[8]:https://github.com/apex/up/blob/master/docs/aws-credentials.md +[9]:https://github.com/apex/apex +[10]:https://github.com/apex/up/blob/master/docs/runtimes.md +[11]:https://github.com/apex/up/issues/166 +[12]:https://github.com/apex/up/issues/115 +[13]:https://github.com/apex/up/blob/master/docs/configuration.md +[14]:https://segment.com/ +[15]:https://blog.apex.sh/ +[16]:https://developer.mozilla.org/en-US/docs/Web/HTTP/Access_control_CORS +[17]:https://myapp.com/ +[18]:https://github.com/apex/up/blob/master/internal/logs/parser/grammar.peg +[19]:http://twitter.com/apex +[20]:http://twitter.com/apex +[21]:http://twitter.com/apex +[22]:https://github.com/apex/up/blob/master/docs/configuration.md#error-pages +[23]:http://serverlesscalc.com/ +[24]:https://opencollective.com/apex-up From 588af5f593ff1b7d008c593e28a126f10180ac52 Mon Sep 17 00:00:00 2001 From: geekpi Date: Thu, 12 Oct 2017 08:44:41 +0800 Subject: [PATCH 32/79] translating --- ...08 How to Install Multiple Linux Distributions on One USB.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sources/tech/20171008 How to Install Multiple Linux Distributions on One USB.md b/sources/tech/20171008 How to Install Multiple Linux Distributions on One USB.md index 215be4e4b7..35c9302c1d 100644 --- a/sources/tech/20171008 How to Install Multiple Linux Distributions on One USB.md +++ b/sources/tech/20171008 How to Install Multiple Linux Distributions on One USB.md @@ -1,3 +1,5 @@ +translating---geekpi + How to Install Multiple Linux Distributions on One USB ============================================================ From 64694bbb26f99509f841a91e39b3a0021031185c Mon Sep 17 00:00:00 2001 From: wxy Date: Thu, 12 Oct 2017 09:09:31 +0800 Subject: [PATCH 33/79] PRF&PUB:20170617 What all you need to know about HTML5.md @geekpi --- ...7 What all you need to know about HTML5.md | 311 ++++++++++++++++++ ...7 What all you need to know about HTML5.md | 272 --------------- 2 files changed, 311 insertions(+), 272 deletions(-) create mode 100644 published/20170617 What all you need to know about HTML5.md delete mode 100644 translated/tech/20170617 What all you need to know about HTML5.md diff --git a/published/20170617 What all you need to know about HTML5.md b/published/20170617 What all you need to know about HTML5.md new file mode 100644 index 0000000000..cd3851589e --- /dev/null +++ b/published/20170617 What all you need to know about HTML5.md @@ -0,0 +1,311 @@ +关于 HTML5 你需要了解的基础知识 +============================================================ + +![](https://i0.wp.com/opensourceforu.com/wp-content/uploads/2017/05/handwritten-html5-peter-booth-e-plus-getty-images-56a6faec5f9b58b7d0e5d1cf.jpg?resize=700%2C467) + +> HTML5 是第五个且是当前的 HTML 版本,它是用于在万维网上构建和呈现内容的标记语言。本文将帮助读者了解它。 + +HTML5 通过 W3C 和Web 超文本应用技术工作组Web Hypertext Application Technology Working Group之间的合作发展起来。它是一个更高版本的 HTML,它的许多新元素可以使你的页面更加语义化和动态。它是为所有人提供更好的 Web 体验而开发的。HTML5 提供了很多的功能,使 Web 更加动态和交互。 + +HTML5 的新功能是: + +* 新标签,如 `
` 和 `
` +* 用于 2D 绘图的 `` 元素 +* 本地存储 +* 新的表单控件,如日历、日期和时间 +* 新媒体功能 +* 地理位置 + +HTML5 还不是正式标准(LCTT 译注:HTML5 已于 2014 年成为“推荐标准”),因此,并不是所有的浏览器都支持它或其中一些功能。开发 HTML5 背后最重要的原因之一是防止用户下载并安装像 Silverlight 和 Flash 这样的多个插件。 + +**新标签和元素** + +- **语义化元素:** 图 1 展示了一些有用的语义化元素。 +- **表单元素:** HTML5 中的表单元素如图 2 所示。 +- **图形元素:** HTML5 中的图形元素如图 3 所示。 +- **媒体元素:** HTML5 中的新媒体元素如图 4 所示。 + + +[![](https://i0.wp.com/opensourceforu.com/wp-content/uploads/2017/05/Figure-1-7.jpg?resize=350%2C277)][3] + +*图 1:语义化元素* + +[![](https://i1.wp.com/opensourceforu.com/wp-content/uploads/2017/05/Figure-2-5.jpg?resize=350%2C108)][4] + +*图 2:表单元素* + +[![](https://i0.wp.com/opensourceforu.com/wp-content/uploads/2017/05/Figure-3-2.jpg?resize=350%2C72)][5] + +*图 3:图形元素* + +[![](https://i0.wp.com/opensourceforu.com/wp-content/uploads/2017/05/Figure-4-2.jpg?resize=350%2C144)][6] + +*图 4:媒体元素* + +### HTML5 的高级功能 + +#### 地理位置 + +这是一个 HTML5 API,用于获取网站用户的地理位置,用户必须首先允许网站获取他或她的位置。这通常通过按钮和/或浏览器弹出窗口来实现。所有最新版本的 Chrome、Firefox、IE、Safari 和 Opera 都可以使用 HTML5 的地理位置功能。 + +地理位置的一些用途是: + +* 公共交通网站 +* 出租车及其他运输网站 +* 电子商务网站计算运费 +* 旅行社网站 +* 房地产网站 +* 在附近播放的电影的电影院网站 +* 在线游戏 +* 网站首页提供本地标题和天气 +* 工作职位可以自动计算通勤时间 + +**工作原理:** 地理位置通过扫描位置信息的常见源进行工作,其中包括以下: + +* 全球定位系统(GPS)是最准确的 +* 网络信号 - IP地址、RFID、Wi-Fi 和蓝牙 MAC地址 +* GSM/CDMA 蜂窝 ID +* 用户输入 + +该 API 提供了非常方便的函数来检测浏览器中的地理位置支持: + +``` +if (navigator.geolocation) { +// do stuff +} +``` +`getCurrentPosition` API 是使用地理位置的主要方法。它检索用户设备的当前地理位置。该位置被描述为一组地理坐标以及航向和速度。位置信息作为位置对象返回。 + +语法是: + +``` +getCurrentPosition(showLocation, ErrorHandler, options); +``` + +* `showLocation`:定义了检索位置信息的回调方法。 +* `ErrorHandler`(可选):定义了在处理异步调用时发生错误时调用的回调方法。 +* `options` (可选): 定义了一组用于检索位置信息的选项。 + +我们可以通过两种方式向用户提供位置信息:测地和民用。 + +1. 描述位置的测地方式直接指向纬度和经度。 +2. 位置信息的民用表示法是人类可读的且容易理解。 + +如下表 1 所示,每个属性/参数都具有测地和民用表示。 + + [![](https://i0.wp.com/opensourceforu.com/wp-content/uploads/2017/05/table-1.jpg?resize=350%2C132)][7] + +图 5 包含了一个位置对象返回的属性集。 + +[![](https://i0.wp.com/opensourceforu.com/wp-content/uploads/2017/05/Figure5-1.jpg?resize=350%2C202)][8] + +*图5:位置对象属性* + +#### 网络存储 + +在 HTML 中,为了在本机存储用户数据,我们需要使用 JavaScript cookie。为了避免这种情况,HTML5 已经引入了 Web 存储,网站利用它在本机上存储用户数据。 + +与 Cookie 相比,Web 存储的优点是: + +* 更安全 +* 更快 +* 存储更多的数据 +* 存储的数据不会随每个服务器请求一起发送。只有在被要求时才包括在内。这是 HTML5 Web 存储超过 Cookie 的一大优势。 + +有两种类型的 Web 存储对象: + +1. 本地 - 存储没有到期日期的数据。 +2. 会话 - 仅存储一个会话的数据。 + +**如何工作:** `localStorage` 和 `sessionStorage` 对象创建一个 `key=value` 对。比如: `key="Name"`,`   value="Palak"`。 + +这些存储为字符串,但如果需要,可以使用 JavaScript 函数(如 `parseInt()` 和 `parseFloat()`)进行转换。 + +下面给出了使用 Web 存储对象的语法: + +- 存储一个值: + - `localStorage.setItem("key1", "value1");` + - `localStorage["key1"] = "value1";` +- 得到一个值: + - `alert(localStorage.getItem("key1"));` + - `alert(localStorage["key1"]);` +- 删除一个值: + -`removeItem("key1");` +- 删除所有值: + - `localStorage.clear();` + +#### 应用缓存(AppCache) + +使用 HTML5 AppCache,我们可以使 Web 应用程序在没有 Internet 连接的情况下脱机工作。除 IE 之外,所有浏览器都可以使用 AppCache(截止至此时)。 + +应用缓存的优点是: + +* 网页浏览可以脱机 +* 页面加载速度更快 +* 服务器负载更小 + +`cache manifest` 是一个简单的文本文件,其中列出了浏览器应缓存的资源以进行脱机访问。 `manifest` 属性可以包含在文档的 HTML 标签中,如下所示: + +``` + +... + +``` + +它应该在你要缓存的所有页面上。 + +缓存的应用程序页面将一直保留,除非: + +1. 用户清除它们 +2. `manifest` 被修改 +3. 缓存更新 + +#### 视频 + +在 HTML5 发布之前,没有统一的标准来显示网页上的视频。大多数视频都是通过 Flash 等不同的插件显示的。但 HTML5 规定了使用 video 元素在网页上显示视频的标准方式。 + +目前,video 元素支持三种视频格式,如表 2 所示。 + + [![](https://i0.wp.com/opensourceforu.com/wp-content/uploads/2017/05/table-2.jpg?resize=350%2C115)][9] + +下面的例子展示了 video 元素的使用: + +``` + + + + + + + + +``` + +例子使用了 Ogg 文件,并且可以在 Firefox、Opera 和 Chrome 中使用。要使视频在 Safari 和未来版本的 Chrome 中工作,我们必须添加一个 MPEG4 和 WebM 文件。 + +`video` 元素允许多个 `source` 元素。`source` 元素可以链接到不同的视频文件。浏览器将使用第一个识别的格式,如下所示: + +``` + +``` + +[![](https://i0.wp.com/opensourceforu.com/wp-content/uploads/2017/05/Figure6-1.jpg?resize=350%2C253)][10] + +*图6:Canvas 的输出* + +#### 音频 + +对于音频,情况类似于视频。在 HTML5 发布之前,在网页上播放音频没有统一的标准。大多数音频也通过 Flash 等不同的插件播放。但 HTML5 规定了通过使用音频元素在网页上播放音频的标准方式。音频元素用于播放声音文件和音频流。 + +目前,HTML5 `audio` 元素支持三种音频格式,如表 3 所示。 + + [![](https://i1.wp.com/opensourceforu.com/wp-content/uploads/2017/05/table-3.jpg?resize=350%2C123)][11] + +`audio` 元素的使用如下所示: + +``` + + + + +