Merge pull request #8 from LCTT/master

Update  from LCTT
This commit is contained in:
heguangzhi 2020-02-25 20:17:55 +08:00 committed by GitHub
commit bd2ba47281
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
16 changed files with 2278 additions and 1331 deletions

View File

@ -0,0 +1,113 @@
[#]: collector: (lujun9972)
[#]: translator: (zhangxiangping)
[#]: reviewer: (wxy)
[#]: publisher: (wxy)
[#]: url: (https://linux.cn/article-11927-1.html)
[#]: subject: (12 open source tools for natural language processing)
[#]: via: (https://opensource.com/article/19/3/natural-language-processing-tools)
[#]: author: (Dan Barker https://opensource.com/users/barkerd427)
12 种自然语言处理的开源工具
======
> 让我们看看可以用在你自己的 NLP 应用中的十几个工具吧。
![](https://img.linux.net.cn/data/attachment/album/202002/25/103230j77i7zx8uyymj7y3.jpg)
在过去的几年里自然语言处理NLP推动了聊天机器人、语音助手、文本预测等这些渗透到我们的日常生活中的语音或文本应用程技术的发展。目前有着各种各样开源的 NLP 工具,所以我决定调查一下当前开源的 NLP 工具来帮助你制定开发下一个基于语音或文本的应用程序的计划。
尽管我并不熟悉所有工具,但我将从我所熟悉的编程语言出发来介绍这些工具(对于我不熟悉的语言,我无法找到大量的工具)。也就是说,出于各种原因,我排除了三种我熟悉的语言之外的工具。
R 语言可能是没有被包含在内的最重要的语言,因为我发现的大多数库都有一年多没有更新了。这并不一定意味着它们没有得到很好的维护,但我认为它们应该得到更多的更新,以便和同一领域的其他工具竞争。我还选择了最有可能用在生产场景中的语言和工具(而不是在学术界和研究中使用),而我主要是使用 R 作为研究和发现工具。
我也惊讶地发现 Scala 的很多库都没有更新了。我上次使用 Scala 已经过去了两年了,当时它非常流行。但是大多数库从那个时候就再没有更新过,或者只有少数一些有更新。
最后,我排除了 C++。 这主要是因为我上次使用 C++ 编写程序已经有很多年了,而我所工作的组织还没有将 C++ 用于 NLP 或任何数据科学方面的工作。
### Python 工具
#### 自然语言工具包NLTK
毋庸置疑,[自然语言工具包NLTK][2]是我调研过的所有工具中功能最完善的一个。它几乎实现了自然语言处理中多数功能组件,比如分类、令牌化、词干化、标注、分词和语义推理。每一个都有多种不同的实现方式,所以你可以选择具体的算法和方式。同时,它也支持不同的语言。然而,它以字符串的形式表示所有的数据,对于一些简单的数据结构来说可能很方便,但是如果要使用一些高级的功能来说就可能有点困难。它的使用文档有点复杂,但也有很多其他人编写的使用文档,比如[这本很棒的书][3]。和其他的工具比起来,这个工具库的运行速度有点慢。但总的来说,这个工具包非常不错,可以用于需要具体算法组合的实验、探索和实际应用当中。
#### SpaCy
[SpaCy][4] 可能是 NLTK 的主要竞争者。在大多数情况下都比 NLTK 的速度更快,但是 SpaCy 的每个自然语言处理的功能组件只有一个实现。SpaCy 把所有的东西都表示为一个对象而不是字符串从而简化了应用构建接口。这也方便它与多种框架和数据科学工具的集成使得你更容易理解你的文本数据。然而SpaCy 不像 NLTK 那样支持多种语言。它确实接口简单,具有简化的选项集和完备的文档,以及用于语言处理和分析各种组件的多种神经网络模型。总的来说,对于需要在生产中表现出色且不需要特定算法的新应用程序,这是一个很不错的工具。
#### TextBlob
[TextBlob][5] 是 NLTK 的一个扩展库。你可以通过 TextBlob 用一种更简单的方式来使用 NLTK 的功能TextBlob 也包括了 Pattern 库中的功能。如果你刚刚开始学习这将会是一个不错的工具可以用于对性能要求不太高的生产环境的应用。总体来说TextBlob 适用于任何场景,但是对小型项目尤佳。
#### Textacy
这个工具是我用过的名字最好听的。先重读“ex”再带出“cy”多读“[Textacy][6]”几次试试。它不仅仅是名字读起来好,同时它本身也是一个很不错的工具。它使用 SpaCy 作为它自然语言处理核心功能,但它在处理过程的前后做了很多工作。如果你想要使用 SpaCy那么最好使用 Textacy从而不用去编写额外的附加代码就可以处理不同种类的数据。
#### PyTorch-NLP
[PyTorch-NLP][7] 才出现短短的一年但它已经有一个庞大的社区了。它适用于快速原型开发。当出现了最新的研究或大公司或者研究人员推出了完成新奇的处理任务的其他工具时比如图像转换它就会被更新。总体来说PyTorch 的目标用户是研究人员,但它也能用于原型开发,或使用最先进算法的初始生产载荷中。基于此基础上的创建的库也是值得研究的。
### Node.js 工具
#### Retext
[Retext][8] 是 [Unified 集合][9]的一部分。Unified 是一个接口能够集成不同的工具和插件以便它们能够高效的工作。Retext 是 Unified 工具中使用的三种语法之一,另外的两个分别是用于 Markdown 的 Remark 和用于 HTML 的 Rehype。这是一个非常有趣的想法我很高兴看到这个社区的发展。Retext 没有涉及很多的底层技术,更多的是使用插件去完成你在 NLP 任务中想要做的事情。拼写检查、字形修复、情绪检测和增强可读性都可以用简单的插件来完成。总体来说,如果你不想了解底层处理技术又想完成你的任务的话,这个工具和社区是一个不错的选择。
#### Compromise
[Compromise][10] 显然不是最复杂的工具如果你正在找拥有最先进的算法和最完备的系统的话它可能不适合你。然而如果你想要一个性能好、功能广泛、还能在客户端运行的工具的话Compromise 值得一试。总体来说,它的名字(“折中”)是准确的,因为作者更关注更具体功能的小软件包,而在功能性和准确性上有所折中,这些小软件包得益于用户对使用环境的理解。
#### Natural
[Natural][11] 包含了常规自然语言处理库所具有的大多数功能。它主要是处理英文文本,但也包括一些其它语言,它的社区也欢迎支持其它的语言。它能够进行令牌化、词干化、分类、语音处理、词频-逆文档频率计算TF-IDF、WordNet、字符相似度计算和一些变换。它和 NLTK 有的一比,因为它想要把所有东西都包含在一个包里头,但它更易于使用,而且不一定专注于研究。总的来说,这是一个非常完整的库,目前仍在活跃开发中,但可能需要对底层实现有更多的了解才能完全发挥效力。
#### Nlp.js
[Nlp.js][12] 建立在其他几个 NLP 库之上,包括 Franc 和 Brain.js。它为许多 NLP 组件提供了一个很好的接口,比如分类、情感分析、词干化、命名实体识别和自然语言生成。它也支持一些其它语言,在你处理英语之外的语言时能提供一些帮助。总之,它是一个不错的通用工具,并且提供了调用其他工具的简化接口。在你需要更强大或更灵活的工具之前,这个工具可能会在你的应用程序中用上很长一段时间。
### Java 工具
#### OpenNLP
[OpenNLP][13] 是由 Apache 基金会管理的,所以它可以很方便地集成到其他 Apache 项目中,比如 Apache Flink、Apache NiFi 和 Apache Spark。这是一个通用的 NLP 工具,包含了所有 NLP 组件中的通用功能可以通过命令行或者以包的形式导入到应用中来使用它。它也支持很多种语言。OpenNLP 是一个很高效的工具,包含了很多特性,如果你用 Java 开发生产环境产品的话,它是个很好的选择。
#### Stanford CoreNLP
[Stanford CoreNLP][14] 是一个工具集,提供了统计 NLP、深度学习 NLP 和基于规则的 NLP 功能。这个工具也有许多其他编程语言的版本,所以可以脱离 Java 来使用。它是由高水平的研究机构创建的一个高效的工具,但在生产环境中可能不是最好的。此工具采用双许可证,具有可以用于商业目的的特定许可证。总之,在研究和实验中它是一个很棒的工具,但在生产系统中可能会带来一些额外的成本。比起 Java 版本来说,读者可能对它的 Python 版本更感兴趣。同样,在 Coursera 上最好的机器学习课程之一是斯坦福教授提供的,[点此][15]访问其他不错的资源。
#### CogCompNLP
[CogCompNLP][16] 由伊利诺斯大学开发的一个工具,它也有一个相似功能的 Python 版本。它可以用于处理文本,包括本地处理和远程处理,能够极大地缓解你本地设备的压力。它提供了很多处理功能,比如令牌化、词性标注、断句、命名实体标注、词型还原、依存分析和语义角色标注。它是一个很好的研究工具,你可以自己探索它的不同功能。我不确定它是否适合生产环境,但如果你使用 Java 的话,它值得一试。
* * *
你最喜欢的开源 NLP 工具和库是什么?请在评论区分享文中没有提到的工具。
--------------------------------------------------------------------------------
via: https://opensource.com/article/19/3/natural-language-processing-tools
作者:[Dan Barker][a]
选题:[lujun9972][b]
译者:[zxp](https://github.com/zhangxiangping)
校对:[wxy](https://github.com/wxy)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/barkerd427
[b]: https://github.com/lujun9972
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/talk_chat_communication_team.png?itok=CYfZ_gE7 (Chat bubbles)
[2]: http://www.nltk.org/
[3]: http://www.nltk.org/book_1ed/
[4]: https://spacy.io/
[5]: https://textblob.readthedocs.io/en/dev/
[6]: https://readthedocs.org/projects/textacy/
[7]: https://pytorchnlp.readthedocs.io/en/latest/
[8]: https://www.npmjs.com/package/retext
[9]: https://unified.js.org/
[10]: https://www.npmjs.com/package/compromise
[11]: https://www.npmjs.com/package/natural
[12]: https://www.npmjs.com/package/node-nlp
[13]: https://opennlp.apache.org/
[14]: https://stanfordnlp.github.io/CoreNLP/
[15]: https://opensource.com/article/19/2/learn-data-science-ai
[16]: https://github.com/CogComp/cogcomp-nlp

View File

@ -0,0 +1,158 @@
[#]: collector: (lujun9972)
[#]: translator: (guevaraya)
[#]: reviewer: (wxy)
[#]: publisher: (wxy)
[#]: url: (https://linux.cn/article-11930-1.html)
[#]: subject: (Troubleshoot Kubernetes with the power of tmux and kubectl)
[#]: via: (https://opensource.com/article/20/2/kubernetes-tmux-kubectl)
[#]: author: (Abhishek Tamrakar https://opensource.com/users/tamrakar)
利用 Tmux 和 kubectl 解决 Kubernetes 故障
======
> 一个使用 tmux 的 kubectl 插件可以使 Kubernetes 疑难问题变得更简单。
![](https://img.linux.net.cn/data/attachment/album/202002/25/125435a4v3vpss3s4w3sks.jpg)
[Kubernetes][2] 是一个活跃的开源容器管理平台,它提供了可扩展性、高可用性、健壮性和富有弹性的应用程序管理。它的众多特性之一是支持通过其主要的二进制客户端 [kubectl][3] 运行定制脚本或可执行程序kubectl 很强大的,允许用户在 Kubernetes 集群上用它直接做很多事情。
### 使用别名进行 Kubernetes 的故障排查
使用 Kubernetes 进行容器编排的人都知道由于设计上原因带来了其功能的复杂性。举例说,迫切需要以更快的速度并且几乎不需要手动干预的方式来简化 Kubernetes 中的故障排除(除过特殊情况)。
在故障排查功能方面,有很多场景需要考虑。在一种场景下,你知道你需要运行什么,但是这个命令的语法(即使作为一个单独的命令运行)过于复杂,或需要一、两次交互才能起作用。
例如,如果你需要经常进入一个系统命名空间中运行的容器,你可能发现自己在重复地键入:
```
kubectl --namespace=kube-system exec -i -t <your-pod-name>
```
为了简化故障排查,你可以用这些指令的命令行别名。比如,你可以增加下面命令到你的隐藏配置文件(`.bashrc` 或 `.zshrc`
```
alias ksysex='kubectl --namespace=kube-system exec -i -t'
```
这是来自于 [Kubernetes 常见别名][4]存储库的一个例子,它展示了一种简化 `kubectl` 中的功能的方法。像这种场景下的简单情形,使用别名很有用。
### 切换到 kubectl 插件
更复杂的故障排查场景是需要一个一个的执行很多命令,调查环境,最后得出结论。仅仅用别名方法是不能解决这种情况的;你需要知道你所部署的 Kubernetes 之间逻辑和相关性,你真正需要的是自动化,以在更短的时间内输出你想要的。
考虑到你的集群有 10 ~ 20 或 50 ~ 100 个命名空间来提供不同的微服务。一般在进行故障排查时,什么对你有帮助?
* 你需要能够快速分辨出抛出错误的是哪个 命名空间的哪个 Pod 的东西。
* 你需要一些可监视一个命名空间的所有 Pod 日志的东西。
* 你可能也需要监视特定命名空间的出现错误的某个 Pod 的日志。
涵盖这些要点的解决方案对于定位生产环境的问题有很大的帮助,以及在开发和测试环节中也很有用。
你可以用 [kubectl 插件][5]创建比简单的别名更强大的功能。插件类似于其它用任何语言编写的独立脚本,但被设计为可以扩充 Kubernetes 管理员的主要命令。
创建一个插件,你必须用 `kubectl-<your-plugin-name>` 的正确的语法来拷贝这个脚本到 `$PATH` 中的导出目录之一,并需要为其赋予可执行权限(`chmod +x`)。
创建插件之后将其移动到路径中,你可以立即运行它。例如,我的路径下有一个 `kubectl-krawl``kubectl-kmux`
```
$ kubectl plugin list
The following compatible plugins are available:
/usr/local/bin/kubectl-krawl
/usr/local/bin/kubectl-kmux
$ kubectl kmux
```
现在让我们见识下带有 tmux 的 Kubernetes 的有多强大。
### 驾驭强大的 tmux
[Tmux][6] 是一个非常强大的工具,许多管理员和运维团队都依赖它来解决与易操作性相关的问题:通过将窗口分成多个窗格以便在多台计算机上运行并行的调试来监视日志。它的主要的优点是可在命令行或自动化脚本中使用。
我创建[一个 kubectl 插件][7],使用 tmux 使故障排查更加简单。我将通过注释来解析插件背后的逻辑(插件的完整代码留待给你实现):
```
# NAMESPACE 是要监控的名字空间
# POD 是 Pod 名称
# Containers 是容器名称
# 初始化一个计数器 n 以计算循环计数的数量,
# 之后 tmux 使用它来拆分窗格。
n=0;
# 在 Pod 和容器列表上开始循环
while IFS=' ' read -r POD CONTAINERS
do
# tmux 为每个 Pod 创建一个新窗口
tmux neww $COMMAND -n $POD 2>/dev/null
# 对运行中的 Pod 中 的所有容器启动循环
for CONTAINER in ${CONTAINERS//,/ }
do
if [ x$POD = x -o x$CONTAINER = x ]; then
# 如果任何值为 null则退出。
warn "Looks like there is a problem getting pods data."
break
fi
# 设置要执行的命令
COMMAND=”kubectl logs -f $POD -c $CONTAINER -n $NAMESPACE”
# 检查 tmux 会话
if tmux has-session -t <会话名> 2>/dev/null;
then
<设置会话退出>
else
<创建会话>
fi
# 在当前窗口为每个容器切分窗格
tmux selectp -t $n \; \
splitw $COMMAND \; \
select-layout tiled \;
# 终止容器循环
done
# 用 Pod 名称重命名窗口以识别
tmux renamew $POD 2>/dev/null
# 增加计数器
((n+=1))
# 终止 Pod 循环
done<<(< kubernetes 集群获取 Pod 和容器的列表>)
# 最后选择窗口并附加会话
tmux selectw -t <会话名>:1 \; \
attach-session -t <会话名>\;
```
运行插件脚本后,将产生类似于下图的输出。每个 Pod 有一个自己的窗口,每个容器(如果有多个)被分割到其窗口中 Pod 窗格中并在日志到达时输出。Tmux 之美如下可见;通过正确的配置,你甚至会看到哪个窗口正处于激活运行状态(可看到标签是白色的)。
![kmux 插件的输出][8]
### 总结
别名是在 Kubernetes 环境下常见的也有用的简易故障排查方法。当环境变得复杂用高级脚本生成的kubectl 插件是一个更强大的方法。至于用哪个编程语言来编写 kubectl 插件是没有限制。唯一的要求是该名字在路径中是可执行的,并且不能与已知的 kubectl 命令重复。
要阅读完整的插件源码,或试试我创建的插件,请查看我的 [kube-plugins-github][7] 存储库。欢迎提交提案和补丁。
--------------------------------------------------------------------------------
via: https://opensource.com/article/20/2/kubernetes-tmux-kubectl
作者:[Abhishek Tamrakar][a]
选题:[lujun9972][b]
译者:[guevaraya](https://github.com/guevaraya)
校对:[wxy](https://github.com/wxy)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/tamrakar
[b]: https://github.com/lujun9972
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/OSDC_women_computing_4.png?itok=VGZO8CxT (一个坐在笔记本面前的妇女)
[2]: https://opensource.com/resources/what-is-kubernetes
[3]: https://kubernetes.io/docs/reference/kubectl/overview/
[4]: https://github.com/ahmetb/kubectl-aliases/blob/master/.kubectl_aliases
[5]: https://kubernetes.io/docs/tasks/extend-kubectl/kubectl-plugins/
[6]: https://opensource.com/article/19/6/tmux-terminal-joy
[7]: https://github.com/abhiTamrakar/kube-plugins
[8]: https://raw.githubusercontent.com/abhiTamrakar/kube-plugins/master/kmux/kmux.png

View File

@ -1,8 +1,8 @@
[#]: collector: (lujun9972)
[#]: translator: (geekpi)
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: reviewer: (wxy)
[#]: publisher: (wxy)
[#]: url: (https://linux.cn/article-11929-1.html)
[#]: subject: (How to Install Latest Git Version on Ubuntu)
[#]: via: (https://itsfoss.com/install-git-ubuntu/)
[#]: author: (Abhishek Prakash https://itsfoss.com/author/abhishek/)
@ -25,7 +25,7 @@ sudo apt install git
这就是为什么当你检查 Git 版本时,会看到安装的版本会比 [Git 网站上当前最新 Git 版本][4]旧:
```
[email protected]:~$ git --version
$ git --version
git version 2.17.1
```
@ -43,35 +43,35 @@ sudo apt update
sudo apt install git
```
即使你以前使用 apt 安装了 Git它也将更新为最新的稳定版本。
即使你以前使用 `apt` 安装了 Git它也将更新为最新的稳定版本。
```
[email protected]:~$ git --version
$ git --version
git version 2.25.0
```
[使用PPA][8] 的好处在于,如果发布了新的 Git 稳定版本,那么就可以通过系统更新获得它。[仅更新 Ubuntu][9]来获取最新的 Git 稳定版本。
[使用PPA][8] 的好处在于,如果发布了新的 Git 稳定版本,那么就可以通过系统更新获得它。[仅更新 Ubuntu][9] 来获取最新的 Git 稳定版本。
### 配置Git (推荐给开发者)
### 配置 Git (推荐给开发者)
如果你出于开发目的安装了 Git你会很快开始克隆仓库进行更改并提交更改。
如果你尝试提交代码,那么你可能会看到 “Please tell me who you are” 这样的错误:
```
[email protected]:~/compress-pdf$ git commit -m "update readme"
$ git commit -m "update readme"
*** Please tell me who you are.
Run
git config --global user.email "[email protected]"
git config --global user.email "you@example.com"
git config --global user.name "Your Name"
to set your account's default identity.
Omit --global to set the identity only in this repository.
fatal: unable to auto-detect email address (got '[email protected](none)')
fatal: unable to auto-detect email address (got 'abhishek@itsfoss.(none)')
```
这是因为你还没配置必要的个人信息。
@ -80,7 +80,7 @@ fatal: unable to auto-detect email address (got '[email protected](none)')
```
git config --global user.name "Your Name"
git config --global user.email "[email protected]"
git config --global user.email "you@example.com"
```
你可以使用以下命令检查 Git 配置:
@ -92,15 +92,13 @@ git config --list
它应该显示如下输出:
```
[email protected]
user.name=abhishek
user.email=you@example.com
user.name=Your Name
```
配置保存在 \~/.gitconfig 中。你可以手动修改配置。
配置保存在 `~/.gitconfig` 中。你可以手动修改配置。
* * *
**结尾**
### 结尾
我希望这个小教程可以帮助你在 Ubuntu 上安装 Git。使用 PPA你可以轻松获得最新的 Git 版本。
@ -113,7 +111,7 @@ via: https://itsfoss.com/install-git-ubuntu/
作者:[Abhishek Prakash][a]
选题:[lujun9972][b]
译者:[geekpi](https://github.com/geekpi)
校对:[校对者ID](https://github.com/校对者ID)
校对:[wxy](https://github.com/wxy)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
@ -127,4 +125,4 @@ via: https://itsfoss.com/install-git-ubuntu/
[6]: https://itsfoss.com/install-software-from-source-code/
[7]: https://launchpad.net/~git-core/+archive/ubuntu/ppa
[8]: https://itsfoss.com/ppa-guide/
[9]: https://itsfoss.com/update-ubuntu/
[9]: https://itsfoss.com/update-ubuntu/

View File

@ -1,152 +0,0 @@
[#]: collector: (lujun9972)
[#]: translator: ( )
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (Send email and check your calendar with Emacs)
[#]: via: (https://opensource.com/article/20/1/emacs-mail-calendar)
[#]: author: (Kevin Sonney https://opensource.com/users/ksonney)
Send email and check your calendar with Emacs
======
Manage your email and view your schedule with the Emacs text editor in
the eighteenth in our series on 20 ways to be more productive with open
source in 2020.
![Document sending][1]
Last year, I brought you 19 days of new (to you) productivity tools for 2019. This year, I'm taking a different approach: building an environment that will allow you to be more productive in the new year, using tools you may or may not already be using.
### Doing (almost) all the things with Emacs, part 1
Two days ago, I shared that I use both [Vim][2] and [Emacs][3] regularly, and on days [16][4] and [17][5] of this series, I explained how to do almost everything in Vim. Now, it's time for Emacs!
![Mail and calendar in Emacs][6]
Before I get too far, I should explain two things. First, I'm doing everything here using the default Emacs configuration, not [Spacemacs][7], which I have [written about][8]. Why? Because I will be using the default keyboard mappings so that you can refer back to the documentation and not have to translate things from "native Emacs" to Spacemacs. Second, I'm not setting up Org mode in this series. Org mode almost needs an entire series on its own, and, while it is very powerful, the setup can be quite complex.
#### Configure Emacs
Configuring Emacs is a little bit more complicated than configuring Vim, but in my opinion, it is worth it in the long run. Start by creating a configuration file and opening it in Emacs:
```
mkdir ~/.emacs.d
emacs ~/.emacs.d/init.el
```
Next, add some additional package sources to the built-in package manager. Add the following to **init.el**:
```
(package-initialize)
(add-to-list 'package-archives '("melpa" . "<http://melpa.org/packages/>"))
(add-to-list 'package-archives '("org" . "<http://orgmode.org/elpa/>") t)
(add-to-list 'package-archives '("gnu" . "<https://elpa.gnu.org/packages/>"))
(package-refresh-contents)
```
Save the file with **Ctrl**+**x** **Ctrl**+**s**, exit with **Ctrl**+**x** **Ctrl**+**c**, and restart Emacs. It will download all the package lists at startup, and then you should be ready to install things with the built-in package manager. Start by typing **Meta**+**x** to bring up a command prompt (the **Meta** key is the **Alt** key on most keyboards or **Option** on MacOS). At the command prompt, type **package-list-packages** to bring up a list of packages you can install. Go through the list and select the following packages with the **i** key:
```
bbdb
bbdb-vcard
calfw
calfw-ical
notmuch
```
Once the packages are selected, press **x** to install them. Depending on your internet connection, this could take a while. You may see some compile errors, but it's safe to ignore them. Once it completes, open **~/.emacs.d/init.el** with the key combination **Ctrl**+**x** **Ctrl**+**f**, and add the following lines to the file after **(package-refresh-packages)** and before **(custom-set-variables**. Emacs uses the **(custom-set-variables** line internally, and you should never, ever modify anything below it. Lines beginning with **;;** are comments.
```
;; Set up bbdb
(require 'bbdb)
(bbdb-initialize 'message)
(bbdb-insinuate-message)
(add-hook 'message-setup-hook 'bbdb-insinuate-mail)
;; set up calendar
(require 'calfw)
(require 'calfw-ical)
;; Set this to the URL of your calendar. Google users will use
;; the Secret Address in iCalendar Format from the calendar settings
(cfw:open-ical-calendar "<https://path/to/my/ics/file.ics>")
;; Set up notmuch
(require 'notmuch)
;; set up mail sending using sendmail
(setq send-mail-function (quote sendmail-send-it))
(setq user-mail-address "[myemail@mydomain.com][9]"
      user-full-name "My Name")
```
Now you are ready to start Emacs with your setup! Save the **init.el** file (**Ctrl**+**x** **Ctrl**+**s**), exit Emacs (**Ctrl**+**x** **Ctrl**+**c**), and then restart it. It will take a little longer to start this time.
#### Read and write email in Emacs with Notmuch
Once you are at the Emacs splash screen, you can start reading your email with [Notmuch][10]. Type **Meta**+**x notmuch**, and you'll get Notmuch's Emacs interface.
![Reading mail with Notmuch][11]
All the items in bold type are links to email views. You can access them with either a mouse click or by tabbing between them and pressing **Return** or **Enter**. You can use the search bar to
search Notmuch's database using the [same syntax][12] as you use on Notmuch's command line. If you want, you can save any searches for later use with the **[save]** button, and they will be added to the list at the top of the screen. If you follow one of the links, you will get a list of the relevant email messages. You can navigate the list with the **Arrow** keys, and press **Enter** on the message you want to read. Pressing **r** will reply to a message, **f** will forward the message, and **q** will exit the current screen.
You can write a new message by typing **Meta**+**x compose-mail**. Composing, replying, and forwarding all bring up the mail writing interface. When you are done writing your email, press **Ctrl**+**c Ctrl**+**c** to send it. If you decide you don't want to send it, press **Ctrl**+**c Ctrl**+**k** to kill the message compose buffer (window).
#### Autocomplete email addresses in Emacs with BBDB
![Composing a message with BBDB addressing][13]
But what about your address book? That's where [BBDB][14] comes in. But first, import all your addresses from [abook][15] by opening a command line and running the following export command:
```
`abook --convert --outformat vcard --outfile ~/all-my-addresses.vcf --infile ~/.abook/addresses`
```
Once Emacs starts, run **Meta**+**x bbdb-vcard-import-file**. It will prompt you for the file name you want to import, which is **~/all-my-addresses.vcf**. After the import finishes, when you compose a message, you can start typing a name and use **Tab** to search and autocomplete the "To" field. BBDB will also open a buffer for the contact so you can make sure it's the correct one.
Why do it this way when you already have each address as a **vcf.** file from [vdirsyncer][16]? If you are like me, you have a LOT of addresses, and doing them one at a time is a lot of work. This way, you can take everything you have in abook and make one big file.
#### View your calendar in Emacs with calfw
![calfw calendar][17]
Finally, you can use Emacs to look at your calendar. In the configuration section above, you installed the [calfw][18] package and added lines to tell it where to find the calendars to load. Calfw is short for the Calendar Framework for Emacs, and it supports many calendar formats. Since I use Google calendar, that is the link I put into my config. Your calendar will auto-load at startup, and you can view it by switching the **cfw-calendar** buffer with the **Ctrl**+**x**+**b** command.
Calfw offers views by the day, week, two weeks, and month. You can select the view from the top of the calendar and navigate your calendar with the **Arrow** keys. Unfortunately, calfw can only view calendars, so you'll still need to use something like [khal][19] or a web interface to add, delete, and modify events.
So there you have it: mail, calendars, and addresses in Emacs. Tomorrow I'll do even more.
--------------------------------------------------------------------------------
via: https://opensource.com/article/20/1/emacs-mail-calendar
作者:[Kevin Sonney][a]
选题:[lujun9972][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/ksonney
[b]: https://github.com/lujun9972
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/email_paper_envelope_document.png?itok=uPj_kouJ (Document sending)
[2]: https://www.vim.org/
[3]: https://www.gnu.org/software/emacs/
[4]: https://opensource.com/article/20/1/vim-email-calendar
[5]: https://opensource.com/article/20/1/vim-task-list-reddit-twitter
[6]: https://opensource.com/sites/default/files/uploads/productivity_18-1.png (Mail and calendar in Emacs)
[7]: https://www.spacemacs.org/
[8]: https://opensource.com/article/19/12/spacemacs
[9]: mailto:myemail@mydomain.com
[10]: https://notmuchmail.org/
[11]: https://opensource.com/sites/default/files/uploads/productivity_18-2.png (Reading mail with Notmuch)
[12]: https://opensource.com/article/20/1/organize-email-notmuch
[13]: https://opensource.com/sites/default/files/uploads/productivity_18-3.png (Composing a message with BBDB addressing)
[14]: https://www.jwz.org/bbdb/
[15]: https://opensource.com/article/20/1/sync-contacts-locally
[16]: https://opensource.com/article/20/1/open-source-calendar
[17]: https://opensource.com/sites/default/files/uploads/productivity_18-4.png (calfw calendar)
[18]: https://github.com/kiwanami/emacs-calfw
[19]: https://khal.readthedocs.io/en/v0.9.2/index.html

View File

@ -1,5 +1,5 @@
[#]: collector: (lujun9972)
[#]: translator: ( )
[#]: translator: (geekpi)
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )

View File

@ -1,132 +0,0 @@
[#]: collector: (lujun9972)
[#]: translator: ( )
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (How to get MongoDB Server on Fedora)
[#]: via: (https://fedoramagazine.org/how-to-get-mongodb-server-on-fedora/)
[#]: author: (Honza Horak https://fedoramagazine.org/author/hhorak/)
How to get MongoDB Server on Fedora
======
![][1]
Mongo (from “humongous”) is a high-performance, open source, schema-free document-oriented database, which is one of the most favorite so-called [NoSQL][2] databases. It uses JSON as a document format, and it is designed to be scalable and replicable across multiple server nodes.
### Story about license change
Its been more than a year when the upstream MongoDB decided to change the license of the Server code. The previous license was GNU Affero General Public License v3 (AGPLv3). However, upstream wrote a new license designed to make companies running MongoDB as a service contribute back to the community. The new license is called Server Side Public License (SSPLv1) and more about this step and its rationale can be found at [MongoDB SSPL FAQ][3].
Fedora has always included only free (as in “freedom”) software. When SSPL was released, Fedora [determined][4] that it is not a free software license in this meaning. All versions of MongoDB released before the license change date (October 2018) could be potentially kept in Fedora, but never updating the packages in the future would bring security issues. Hence the Fedora community decided to [remove the MongoDB server][5] entirely, starting Fedora 30.
### What options are left to developers?
Well, alternatives exist, for example PostgreSQL also supports JSON in the recent versions, and it can be used in cases when MongoDB cannot be used any more. With JSONB type, indexing works very well in PostgreSQL with performance comparable with MongoDB, and even without any compromises from ACID.
The technical reasons that a developer may have chosen MongoDB did not change with the license, so many still want to use it. What is important to realize is that the SSPL license was only changed to the MongoDB server. There are other projects that MongoDB upstream develops, like MongoDB tools, C and C++ client libraries and connectors for various dynamic languages, that are used on the client side (in applications that want to communicate with the server over the network). Since the license is kept free (Apache License mostly) for those packages, they are staying in Fedora repositories, so users can use them for the application development.
The only change is really the server package itself, which was removed entirely from Fedora repos. Lets see what a Fedora user can do to get the non-free packages.
### How to install MongoDB server from the upstream
When Fedora users want to install a MongoDB server, they need to approach MongoDB upstream directly. However, the upstream does not ship RPM packages for Fedora itself. Instead, the MongoDB server is either available as the source tarball, that users need to compile themselves (which requires some developer knowledge), or Fedora user can use some compatible packages. From the compatible options, the best choice is the RHEL-8 RPMs at this point. The following steps describe, how to install them and how to start the daemon.
#### 1\. Create a repository with upstream RPMs (RHEL-8 builds)
```
```
$ sudo cat &gt; /etc/yum.repos.d/mongodb.repo &amp;lt;&amp;lt;EOF
[mongodb-upstream]
name=MongoDB Upstream Repository
baseurl=<https://repo.mongodb.org/yum/redhat/8Server/mongodb-org/4.2/x86\_64/>
gpgcheck=1
enabled=1
gpgkey=<https://www.mongodb.org/static/pgp/server-4.2.asc>
EOF
```
```
#### 2\. Install the meta-package, that pulls the server and tools packages
```
```
$ sudo dnf install mongodb-org
&amp;lt;snipped&gt;
Installed:
  mongodb-org-4.2.3-1.el8.x86_64           mongodb-org-mongos-4.2.3-1.el8.x86_64  
  mongodb-org-server-4.2.3-1.el8.x86_64    mongodb-org-shell-4.2.3-1.el8.x86_64
  mongodb-org-tools-4.2.3-1.el8.x86_64          
Complete!
```
```
#### 3\. Start the MongoDB daemon
```
```
$ sudo systemctl status mongod
● mongod.service - MongoDB Database Server
   Loaded: loaded (/usr/lib/systemd/system/mongod.service; enabled; vendor preset: disabled)
   Active: active (running) since Sat 2020-02-08 12:33:45 EST; 2s ago
     Docs: <https://docs.mongodb.org/manual>
  Process: 15768 ExecStartPre=/usr/bin/mkdir -p /var/run/mongodb (code=exited, status=0/SUCCESS)
  Process: 15769 ExecStartPre=/usr/bin/chown mongod:mongod /var/run/mongodb (code=exited, status=0/SUCCESS)
  Process: 15770 ExecStartPre=/usr/bin/chmod 0755 /var/run/mongodb (code=exited, status=0/SUCCESS)
  Process: 15771 ExecStart=/usr/bin/mongod $OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 15773 (mongod)
   Memory: 70.4M
      CPU: 611ms
   CGroup: /system.slice/mongod.service
           └─15773 /usr/bin/mongod -f /etc/mongod.conf
```
```
#### 4\. Verify that the server runs by connecting to it from the mongo shell
```
```
$ mongo
MongoDB shell version v4.2.3
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&amp;amp;gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("20b6e61f-c7cc-4e9b-a25e-5e306d60482f") }
MongoDB server version: 4.2.3
Welcome to the MongoDB shell.
For interactive help, type "help".
For more comprehensive documentation, see
    <http://docs.mongodb.org/>
\---
&gt; _
```
```
Thats all. As you see, the RHEL-8 packages are pretty compatible and it should stay that way for as long as the Fedora packages remain compatible with whats in RHEL-8. Just be careful that you comply with the SSPLv1 license in your use.
--------------------------------------------------------------------------------
via: https://fedoramagazine.org/how-to-get-mongodb-server-on-fedora/
作者:[Honza Horak][a]
选题:[lujun9972][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://fedoramagazine.org/author/hhorak/
[b]: https://github.com/lujun9972
[1]: https://fedoramagazine.org/wp-content/uploads/2020/02/mongodb-816x348.png
[2]: https://en.wikipedia.org/wiki/NoSQL
[3]: https://www.mongodb.com/licensing/server-side-public-license/faq
[4]: https://lists.fedoraproject.org/archives/list/legal@lists.fedoraproject.org/thread/IQIOBOGWJ247JGKX2WD6N27TZNZZNM6C/
[5]: https://fedoraproject.org/wiki/Changes/MongoDB_Removal

View File

@ -1,720 +0,0 @@
[#]: collector: (lujun9972)
[#]: translator: (heguangzhi)
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (Using Python and GNU Octave to plot data)
[#]: via: (https://opensource.com/article/20/2/python-gnu-octave-data-science)
[#]: author: (Cristiano L. Fontana https://opensource.com/users/cristianofontana)
Using Python and GNU Octave to plot data
======
Learn how to do a common data science task with Python and GNU Octave.
![Analytics: Charts and Graphs][1]
Data science is a domain of knowledge that spans programming languages. Some are well-known for solving problems in this space, while others are lesser-known. This article will help you become familiar with doing data science with some popular languages.
### Choosing Python and GNU Octave for data science
Every so often, I try to learn a new programming language. Why? It is mostly a combination of boredom with the old ways and curiosity about the new ways. When I started programming, the only language I knew was C. Life was hard and dangerous in those years, as I had to manually allocate memory, manage pointers, and remember to free memory.
Then a friend suggested I try Python, and life became much easier. Programs became much slower, but I did not have to suffer through writing analysis software. However, I soon realized that each language was more suitable than others for some applications. I later studied some other languages, and each one brought some new bit of enlightenment. Discovering new programming styles let me backport some solutions to other languages, and everything became much more interesting.
To get a feeling for a new programming language (and its documentation), I always start by writing some example programs that perform a task I know well. To that ends, I will explain how to write a program in Python and GNU Octave for a particular task you could classify as data science. If you are already familiar with one of the languages, start with that one and go through the others to look for similarities and differences. It is not intended to be an exhaustive comparison of the languages, just a little showcase.
All of the programs are meant to be run on the [command line][2], not with a [graphical user interface][3] (GUI). The full examples are available in the [polyglot_fit repository][4].
### The programming task
The program you will write in this series:
* Reads data from a [CSV file][5]
* Interpolates the data with a straight line (i.e., _f(x)=m ⋅ x + q_)
* Plots the result to an image file
This is a common situation that many data scientists have encountered. The example data is the first set of [Anscombe's quartet][6], shown in the table below. This is a set of artificially constructed data that gives the same results when fitted with a straight line, but their plots are very different. The data file is a text file with tabs as column separators and a few lines as a header. This task will use only the first set (i.e., the first two columns).
[**Anscombe's quartet**][6]
I
II
III
IV
x
y
x
y
x
y
x
y
10.0
8.04
10.0
9.14
10.0
7.46
8.0
6.58
8.0
6.95
8.0
8.14
8.0
6.77
8.0
5.76
13.0
7.58
13.0
8.74
13.0
12.74
8.0
7.71
9.0
8.81
9.0
8.77
9.0
7.11
8.0
8.84
11.0
8.33
11.0
9.26
11.0
7.81
8.0
8.47
14.0
9.96
14.0
8.10
14.0
8.84
8.0
7.04
6.0
7.24
6.0
6.13
6.0
6.08
8.0
5.25
4.0
4.26
4.0
3.10
4.0
5.39
19.0
12.50
12.0
10.84
12.0
9.13
12.0
8.15
8.0
5.56
7.0
4.82
7.0
7.26
7.0
6.42
8.0
7.91
5.0
5.68
5.0
4.74
5.0
5.73
8.0
6.89
### The Python way
[Python][7] is a general-purpose programming language that is among the most popular languages in use today (as evidenced by findings from the [TIOBE index][8], [RedMonk Programming Language Rankings][9], [Popularity of Programming Language Index][10], [State of the Octoverse of GitHub][11], and other sources). It is an [interpreted language][12]; therefore, the source code is read and evaluated by a program that executes the instructions. It has a comprehensive [standard library][13] and is generally very pleasant to use (I have no reference for this last statement; it is just my humble opinion).
#### Installation
To develop with Python, you need the interpreter and a few libraries. The minimum requirements are:
* [NumPy][14] for convenient array and matrices manipulation
* [SciPy][15] for scientific calculations
* [Matplotlib][16] for plotting
Installing them in [Fedora][17] is easy:
```
`sudo dnf install python3 python3-numpy python3-scipy python3-matplotlib`
```
#### Commenting code
In Python, [comments][18] are achieved by putting a **#** at the beginning of the line, and the rest of the line will be discarded by the interpreter:
```
`# This is a comment ignored by the interpreter.`
```
The [fitting_python.py][19] example uses comments to insert licensing information in the source code, and the first line is a [special comment][20] that enables the script to be executed on the command line:
```
`#! /usr/bin/env python3`
```
This line informs the command-line interpreter that the script needs to be executed by the program **python3**.
#### Required libraries
Libraries and modules can be imported in Python as an object (as in the first line in the example) with all the functions and members of the library. There is a convenient option to rename them with a custom label by using the **as** specification:
```
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
```
You may also decide to import only a submodule (as in the second and third lines). The syntax has two (more or less) equivalent options: **import module.submodule** and **from module import submodule**.
#### Defining variables
Python's variables are declared the first time a value is assigned to them:
```
input_file_name = "anscombe.csv"
delimiter = "\t"
skip_header = 3
column_x = 0
column_y = 1
```
The variable types are inferred by the value that is assigned to the variable. There are no variables with constant values unless they are declared in a module and can only be read. Idiomatically, variables that should not be modified should be named in uppercase.
#### Printing output
Running the programs through the command line means that the output is just printed on the terminal. Python has the [**print()**][21] function that, by default, prints its argument and adds a newline at the end of the output:
```
`print("#### Anscombe's first set with Python ####")`
```
It is possible to combine the **print()** function with the [formatting power][22] of the [string class][23] in Python. Strings have the **format** method that can be used to add some formatted text to the string itself. For instance, it is possible to add a formatted float number, e.g.:
```
`print("Slope: {:f}".format(slope))`
```
#### Reading data
Reading CSV files is very easy with NumPy and the function [**genfromtxt()**][24], which generates a [NumPy array][25]:
```
`data = np.genfromtxt(input_file_name, delimiter = delimiter, skip_header = skip_header)`
```
In Python, a function can have a variable number of arguments, and you can have it pass a subset by specifying the desired ones. Arrays are very powerful matrix-like objects that can be easily sliced into smaller arrays:
```
x = data[:, column_x]
y = data[:, column_y]
```
The colons select the whole range, and they can also be used to select a subrange. For instance, to select the first two rows of the array, you would use:
```
`first_two_rows = data[0:1, :]`
```
#### Fitting data
SciPy provides convenient functions for data fitting, such as the [**linregress()**][26] function. This function provides some significant values related to the fit, such as the slope, intercept, and the correlation coefficient of the two datasets:
```
slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)
print("Slope: {:f}".format(slope))
print("Intercept: {:f}".format(intercept))
print("Correlation coefficient: {:f}".format(r_value))
```
Since **linregress()** provides several pieces of information, the result can be saved to several variables at the same time.
#### Plotting
The Matplotlib library plots only data points; therefore, you should define the points you want to plot. The **x** and **y** arrays were already defined, so you can directly plot them, but you also need data points that will represent the straight line.
```
`fit_x = np.linspace(x.min() - 1, x.max() + 1, 100)`
```
The [**linspace()**][27] function conveniently generates a set of equally spaced values between two values. The ordinates can be easily calculated by exploiting the powerful NumPy arrays, which can be used in a formula as if they were ordinary numeric variables:
```
`fit_y = slope * fit_x + intercept`
```
The formula is applied element-by-element on the array; therefore, the result has the same number of entries in the initial array.
To create the plot, first, define a [figure object][28] that will contain all the graphics:
```
fig_width = 7 #inch
fig_height = fig_width / 16 * 9 #inch
fig_dpi = 100
fig = plt.figure(figsize = (fig_width, fig_height), dpi = fig_dpi)
```
Several plots can be drawn on a figure; in Matplotlib, the plots are called [axes][29]. This example defines a single axis object to plot the data points:
```
ax = fig.add_subplot(111)
ax.plot(fit_x, fit_y, label = "Fit", linestyle = '-')
ax.plot(x, y, label = "Data", marker = '.', linestyle = '')
ax.legend()
ax.set_xlim(min(x) - 1, max(x) + 1)
ax.set_ylim(min(y) - 1, max(y) + 1)
ax.set_xlabel('x')
ax.set_ylabel('y')
```
Save the figure to a [PNG image file][30] with:
```
`fig.savefig('fit_python.png')`
```
If you want to display (instead of saving) the plot, call:
```
`plt.show()`
```
This example references all the objects used in the plotting section: it defines the object **fig** and the object **ax**. This technicality is not necessary, as the **plt** object can be used directly to plot the datasets. The [Matplotlib tutorial][31] shows an interface such as:
```
`plt.plot(fit_x, fit_y)`
```
Frankly, I do not like this approach because it hides the non-trivial interactions that happen between the various objects. Unfortunately, sometimes the [official examples][32] are a bit confusing because they tend to use different approaches. Referencing graphical objects is not necessary in this simple example, but it becomes important in more complex ones (such as when embedding plots in GUIs).
#### Results
The output on the command line is:
```
#### Anscombe's first set with Python ####
Slope: 0.500091
Intercept: 3.000091
Correlation coefficient: 0.816421
```
Here is the image Matplotlib generates.
![Plot and fit of the dataset obtained with Python][33]
### The GNU Octave way
The [GNU Octave][34] language is primarily intended for numerical computations. It offers a simple syntax for manipulating vectors and matrices and has some powerful plotting facilities. It is an interpreted language like Python. Since Octave's syntax is [mostly compatible][35] with [MATLAB][36], it is often described as a free alternative to MATLAB. Octave is not listed among the most popular programming languages, but MATLAB is, so Octave is rather popular in a sense. MATLAB predates NumPy, and I have the feeling that it was inspired by the former. While you go through the example, you will see the analogies.
#### Installation
The [fitting_octave.m][37] example only needs the basic Octave package, making the installation in Fedora rather simple:
```
`sudo dnf install octave`
```
#### Commenting code
In Octave, you can add comments to code with the percent symbol (**%**), and you can also use **#** if MATLAB compatibility is not needed. The option to use **#** allows you to write the same special comment line from the Python example to execute the script directly on the command line.
#### Necessary libraries
Everything used in this example is contained in the basic package, so you do not need to load any new libraries. If you need a library, the [syntax][38] is **pkg load module**. This command adds the module's functions to the list of available functions. In this regard, Python has more flexibility.
#### Defining variables
Variables are defined with pretty much the same syntax as Python:
```
input_file_name = "anscombe.csv";
delimiter = "\t";
skip_header = 3;
column_x = 1;
column_y = 2;
```
Note that the end of the line has a semicolon; this is not necessary, but it suppresses the output of the results of the line. Without a semicolon, the interpreter would print the result of the expression:
```
octave:1&gt; input_file_name = "anscombe.csv"
input_file_name = anscombe.csv
octave:2&gt; sqrt(2)
ans =  1.4142
```
#### Printing output
The powerful function [**printf()**][39] is used to print on the terminal. Unlike in Python, the **printf()** function does not automatically add a newline at the end of the printed string, so you have to add it. The first argument is a string that can contain format information for the other arguments to be passed to the function, such as:
```
`printf("Slope: %f\n", slope);`
```
In Python, the formatting is built into the string itself, but in Octave, it is specific to the **printf()** function.
#### Reading data
The [**dlmread()**][40] function can read text files structured like CSV files:
```
`data = dlmread(input_file_name, delimiter, skip_header, 0);`
```
The result is a [matrix][41] object, which is one of the fundamental data types in Octave. Matrices may be sliced with a syntax similar to Python:
```
x = data(:, column_x);
y = data(:, column_y);
```
The fundamental difference is that the indexes start at one instead of zero. Therefore, in the example, the __x__ column is column number one.
#### Fitting data
To fit the data with a straight line, you can use the [**polyfit()**][42] function. It fits the input data with a polynomial, so you just need to use a polynomial of order one:
```
p = polyfit(x, y, 1);
slope = p(1);
intercept = p(2);
```
The result is a matrix with the polynomial coefficients; therefore, it selects the first two indexes. To determine the correlation coefficient, use the [**corr()**][43] function:
```
`r_value = corr(x, y);`
```
Finally, print the results with the **printf()** function:
```
printf("Slope: %f\n", slope);
printf("Intercept: %f\n", intercept);
printf("Correlation coefficient: %f\n", r_value);
```
#### Plotting
As in the Matplotlib example, you first need to create a dataset that represents the fitted line:
```
fit_x = linspace(min(x) - 1, max(x) + 1, 100);
fit_y = slope * fit_x + intercept;
```
The analogy with NumPy is also evident here, as it uses the [**linspace()**][44] function that behaves just like the Python's equivalent version.
Again, as with Matplotlib, create a [figure][45] object first, then create an [axes][46] object to hold the plots:
```
fig_width = 7; %inch
fig_height = fig_width / 16 * 9; %inch
fig_dpi = 100;
fig = figure("units", "inches",
             "position", [1, 1, fig_width, fig_height]);
ax = axes("parent", fig);
set(ax, "fontsize", 14);
set(ax, "linewidth", 2);
```
To set properties of the axes object, use the [**set()**][47] function. The interface is rather confusing, though, as the function expects a comma-separated list of property and value pairs. These pairs are just a succession of a string representing the property name and a second object representing the value for that property. There are also other functions to set various properties:
```
xlim(ax, [min(x) - 1, max(x) + 1]);
ylim(ax, [min(y) - 1, max(y) + 1]);
xlabel(ax, 'x');
ylabel(ax, 'y');
```
Plotting is achieved with the [**plot()**][48] function. The default behavior is that each call resets the axes, so you need to use the function [**hold()**][49].
```
hold(ax, "on");
plot(ax, fit_x, fit_y,
     "marker", "none",
     "linestyle", "-",
     "linewidth", 2);
plot(ax, x, y,
     "marker", ".",
     "markersize", 20,
     "linestyle", "none");
hold(ax, "off");
```
Also, it is possible in the **plot()** function to add the property and value pairs. The [legend][50] must be created separately, and the labels should be stated manually:
```
lg = legend(ax, "Fit", "Data");
set(lg, "location", "northwest");
```
Finally, save the output to a PNG image:
```
image_size = sprintf("-S%f,%f", fig_width * fig_dpi, fig_height * fig_dpi);
image_resolution = sprintf("-r%f,%f", fig_dpi);
print(fig, 'fit_octave.png',
      '-dpng',
      image_size,
      image_resolution);
```
Confusingly, in this case, the options are passed as a single string with the property name and the value. Since in Octave strings do not have the formatting facilities of Python, you must use the [**sprintf()**][51] function. It behaves just like the **printf()** function, but its result is not printed, rather it is returned as a string.
In this example, as in the Python one, the graphical objects are referenced to keep their interactions evident. If Python's documentation in this regard is a little bit confusing, [Octave's documentation][52] is even worse. Most of the examples I found did not care about referencing the objects; instead, they rely on the fact that the plotting commands act on the currently active figure. A global [root graphics object][53] keeps track of the existing figures and axes.
#### Results
The resulting output on the command line is:
```
#### Anscombe's first set with Octave ####
Slope: 0.500091
Intercept: 3.000091
Correlation coefficient: 0.816421
```
And this shows the resulting image generated with Octave.
![Plot and fit of the dataset obtained with Octave][54]
### Next up
Both Python and GNU Octave can plot the same information, though they differ in how they get there. If you're looking to explore other languages to complete similar tasks, I highly recommend looking at [Rosetta Code][55]. It's a marvelous resource to see how to solve the same problems in many languages. 
What language do you like to plot data in? Share your thoughts in the comments.
--------------------------------------------------------------------------------
via: https://opensource.com/article/20/2/python-gnu-octave-data-science
作者:[Cristiano L. Fontana][a]
选题:[lujun9972][b]
译者:[heguangzhi](https://github.com/heguangzhi)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/cristianofontana
[b]: https://github.com/lujun9972
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/analytics-graphs-charts.png?itok=sersoqbV (Analytics: Charts and Graphs)
[2]: https://en.wikipedia.org/wiki/Command-line_interface
[3]: https://en.wikipedia.org/wiki/Graphical_user_interface
[4]: https://gitlab.com/cristiano.fontana/polyglot_fit
[5]: https://en.wikipedia.org/wiki/Comma-separated_values
[6]: https://en.wikipedia.org/wiki/Anscombe%27s_quartet
[7]: https://www.python.org/
[8]: https://www.tiobe.com/tiobe-index/
[9]: https://redmonk.com/sogrady/2019/07/18/language-rankings-6-19/
[10]: http://pypl.github.io/PYPL.html
[11]: https://octoverse.github.com/
[12]: https://en.wikipedia.org/wiki/Interpreted_language
[13]: https://docs.python.org/3/library/
[14]: https://numpy.org/
[15]: https://www.scipy.org/
[16]: https://matplotlib.org/
[17]: https://getfedora.org/
[18]: https://en.wikipedia.org/wiki/Comment_(computer_programming)
[19]: https://gitlab.com/cristiano.fontana/polyglot_fit/-/blob/master/fitting_python.py
[20]: https://en.wikipedia.org/wiki/Shebang_(Unix)
[21]: https://docs.python.org/3/library/functions.html#print
[22]: https://docs.python.org/3/library/string.html#string-formatting
[23]: https://docs.python.org/3/library/string.html
[24]: https://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html
[25]: https://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html
[26]: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html
[27]: https://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html
[28]: https://matplotlib.org/api/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure
[29]: https://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes
[30]: https://en.wikipedia.org/wiki/Portable_Network_Graphics
[31]: https://matplotlib.org/tutorials/introductory/pyplot.html#sphx-glr-tutorials-introductory-pyplot-py
[32]: https://matplotlib.org/gallery/index.html
[33]: https://opensource.com/sites/default/files/uploads/fit_python.png (Plot and fit of the dataset obtained with Python)
[34]: https://www.gnu.org/software/octave/
[35]: https://wiki.octave.org/FAQ#Differences_between_Octave_and_Matlab
[36]: https://en.wikipedia.org/wiki/MATLAB
[37]: https://gitlab.com/cristiano.fontana/polyglot_fit/-/blob/master/fitting_octave.m
[38]: https://octave.org/doc/v5.1.0/Using-Packages.html#Using-Packages
[39]: https://octave.org/doc/v5.1.0/Formatted-Output.html#XREFprintf
[40]: https://octave.org/doc/v5.1.0/Simple-File-I_002fO.html#XREFdlmread
[41]: https://octave.org/doc/v5.1.0/Matrices.html
[42]: https://octave.org/doc/v5.1.0/Polynomial-Interpolation.html
[43]: https://octave.org/doc/v5.1.0/Correlation-and-Regression-Analysis.html#XREFcorr
[44]: https://octave.sourceforge.io/octave/function/linspace.html
[45]: https://octave.org/doc/v5.1.0/Multiple-Plot-Windows.html
[46]: https://octave.org/doc/v5.1.0/Graphics-Objects.html#XREFaxes
[47]: https://octave.org/doc/v5.1.0/Graphics-Objects.html#XREFset
[48]: https://octave.org/doc/v5.1.0/Two_002dDimensional-Plots.html#XREFplot
[49]: https://octave.org/doc/v5.1.0/Manipulation-of-Plot-Windows.html#XREFhold
[50]: https://octave.org/doc/v5.1.0/Plot-Annotations.html#XREFlegend
[51]: https://octave.org/doc/v5.1.0/Formatted-Output.html#XREFsprintf
[52]: https://octave.org/doc/v5.1.0/Two_002dDimensional-Plots.html#Two_002dDimensional-Plots
[53]: https://octave.org/doc/v5.1.0/Graphics-Objects.html#XREFgroot
[54]: https://opensource.com/sites/default/files/uploads/fit_octave.png (Plot and fit of the dataset obtained with Octave)
[55]: http://www.rosettacode.org/

View File

@ -0,0 +1,237 @@
[#]: collector: (lujun9972)
[#]: translator: ( )
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (Make free encrypted backups to the cloud on Fedora)
[#]: via: (https://fedoramagazine.org/make-free-encrypted-backups-to-the-cloud-on-fedora/)
[#]: author: (Curt Warfield https://fedoramagazine.org/author/rcurtiswarfield/)
Make free encrypted backups to the cloud on Fedora
======
![][1]
Most free cloud storage is limited to 5GB or less. Even Google Drive is limited to 15GB. While not heavily advertised, IBM offers free accounts with a whopping **25GB** of cloud storage for free. This is not a limited time offer, and you dont have to provide a credit card. Its absolutely free! Better yet, since its S3 compatible, most of the S3 tools available for backups should work fine.
This article will show you how to use restic for encrypted backups onto this free storage. Please also refer to [this previous Magazine article about installing and configuring restic.][2] Lets get started!
### Creating your free IBM account and storage
Head over to the IBM cloud services site and follow the steps to sign up for a free account here: <https://cloud.ibm.com/registration>. Youll need to verify your account from the email confirmation that IBM sends to you.
Then log in to your account to bring up your dashboard, at <https://cloud.ibm.com/>.
Click on the **Create resource** button.
![][3]
Click on **Storage** and then **Object Storage**.
![][4]
Next click on the **Create Bucket** button.
![][5]
This brings up the **Configure your resource** section.
![][6]
Next, click on the ****Create** button to use the default settings.
![][7]
Under **Predefined buckets** click on the **Standard** box:
![][8]
A unique bucket name is automatically created, but its suggested that you change this.
![][9]
In this example, the bucket name is changed to __freecloudstorage_._**
Click on the **Next** button after choosing a bucket name:
![][10]
Continue to click on the **Next** button until you get the the **Summary** page:
![][11]
Scroll down to the **Endpoints** section.
![][12]
The information in the **Public** section is the location of your bucket. This is what you need to specify in restic when you create your backups. In this example, the location is **s3.us-south.cloud-object-storage.appdomain.cloud**.
### Making your credentials
The last thing that you need to do is create an access ID and secret key. To start, click on **Service credentials**.
![][13]
Click on the **New credential** button.
![][14]
Choose a name for your credential, make sure you check the **Include HMAC Credential** box and then click on the **Add** button. In this example Im using the name **resticbackup**.
![][15]
Click on **View credentials**.
![][16]
The _access_key_id_ and _secret_access_key_ is what you are looking for. (For obvious reasons, the authors details here are obscured.)
You will need to export these by calling them with the _export_ alias in the shell, or putting them into a backup script.
![][17]
### Preparing a new repository
Restic refers to your backup as a _repository_, and can make backups to any bucket on your IBM cloud account. First, setup the following environment variables using your _access_key_id_ and _secret_access_key_ that you retrieved from your IBM cloud bucket. These can also be set in any backup script you may create.
```
$ export AWS_ACCESS_KEY_ID=<MY_ACCESS_KEY>
$ export AWS_SECRET_ACCESS_KEY=<MY_SECRET_ACCESS_KEY>
```
Even though you are using IBM Cloud and not AWS, as previously mentioned, IBM Cloud storage is S3 compatible, and restic uses its interal AWS commands for any S3 compatible storage. So these AWS keys really refer to the keys from your IBM bucket.
Create the repository by initializing it. A prompt appears for you to type a password for the repository. _**Do not lose this password because your data is irrecoverable without it!**_
```
restic -r s3:http://PUBLIC_ENDPOINT_LOCATION/BUCKET init
```
The _PUBLIC_ENDPOINT_LOCATION_ was specified in the Endpoint section of your Bucket summary.
![][18]
For example:
```
$ restic -r s3:http://s3.us-south.cloud-object-storage.appdomain.cloud/freecloudstorage init
```
### Creating backups
Now its time to backup some data. Backups are called _snapshots_. Run the following command and enter the repository password when prompted.
```
restic -r s3:http://PUBLIC_ENDPOINT_LOCATION/BUCKET backup files_to_backup
```
For example:
```
$ restic -r s3:http://s3.us-south.cloud-object-storage.appdomain.cloud/freecloudstorage backup Documents/
Enter password for repository:
repository 106a2eb4 opened successfully, password is correct
Files: 51 new, 0 changed, 0 unmodified
Dirs: 0 new, 0 changed, 0 unmodified
Added to the repo: 11.451 MiB
processed 51 files, 11.451 MiB in 0:06
snapshot 611e9577 saved
```
### Restoring from backups
Now that youve backed up some files, its time to make sure you know how to restore them. To get a list of all of your backup snapshots, use this command:
```
restic -r s3:http://PUBLIC_ENDPOINT_LOCATION/BUCKET snapshots
```
For example:
```
$ restic -r s3:http://s3.us-south.cloud-object-storage.appdomain.cloud/freecloudstorage snapshots
Enter password for repository:
ID Date Host Tags Directory
-------------------------------------------------------------------
106a2eb4 2020-01-15 15:20:42 client /home/curt/Documents
```
To restore an entire snapshot, run a command like this:
```
restic -r s3:http://s3.us-south.cloud-object-storage.appdomain.cloud/freecloudstorage restore snapshotID --target restoreDirectory
```
For example:
```
$ restic -r s3:http://s3.us-south.cloud-object-storage.appdomain.cloud/freecloudstorage restore 106a2eb4 --target ~
Enter password for repository:
repository 106a2eb4 opened successfully, password is correct
restoring <Snapshot 106a2eb4 of [/home/curt/Documents]
```
If the directory still exists on your system, be sure to specify a different location for the *restoreDirectory. *For example:
```
restic -r s3:http://s3.us-south.cloud-object-storage.appdomain.cloud/freecloudstorage restore 106a2eb4 --target /tmp
```
To restore an individual file, run a command like this:
```
restic -r s3:http://PUBLIC_ENDPOINT_LOCATION/BUCKET restore snapshotID --target restoreDirectory --include filename
```
For example:
```
$ restic -r s3:http://s3.us-south.cloud-object-storage.appdomain.cloud/freecloudstorage restore 106a2eb4 --target /tmp --include file1.txt
Enter password for repository:
restoring <Snapshot 106a2eb4 of [/home/curt/Documents] at 2020-01-16 15:20:42.833131988 -0400 EDT by curt@client> to /tmp
```
* * *
_Photo by [Alex Machado][19] on [Unsplash][20]._
[EDITORS NOTE: The Fedora Project is [sponsored][21] by [Red Hat][22], which is owned by [IBM][23].]
[EDITORS NOTE: Updated at 1647 UTC on 24 February 2020 to correct a broken link.]
--------------------------------------------------------------------------------
via: https://fedoramagazine.org/make-free-encrypted-backups-to-the-cloud-on-fedora/
作者:[Curt Warfield][a]
选题:[lujun9972][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://fedoramagazine.org/author/rcurtiswarfield/
[b]: https://github.com/lujun9972
[1]: https://fedoramagazine.org/wp-content/uploads/2020/01/encrypted-backups-ibm-cloud-816x345.jpg
[2]: https://fedoramagazine.org/use-restic-encrypted-backups/
[3]: https://fedoramagazine.org/wp-content/uploads/2020/01/ibmclouddash-3-e1579713553261.png
[4]: https://fedoramagazine.org/wp-content/uploads/2020/01/ibmcloudresourcestorage-3.png
[5]: https://fedoramagazine.org/wp-content/uploads/2020/01/ibmcloudbucket-3.png
[6]: https://fedoramagazine.org/wp-content/uploads/2020/01/ibmcloudbucket2.png
[7]: https://fedoramagazine.org/wp-content/uploads/2020/01/ibmcloudbucket3-e1579713758635.png
[8]: https://fedoramagazine.org/wp-content/uploads/2020/01/ibmcloudbucket4.png
[9]: https://fedoramagazine.org/wp-content/uploads/2020/01/createbucket1.png
[10]: https://fedoramagazine.org/wp-content/uploads/2020/01/next.png
[11]: https://fedoramagazine.org/wp-content/uploads/2020/01/bucketsummary-1024x368.png
[12]: https://fedoramagazine.org/wp-content/uploads/2020/01/endpoints-1024x272.png
[13]: https://fedoramagazine.org/wp-content/uploads/2020/01/servicecreds.png
[14]: https://fedoramagazine.org/wp-content/uploads/2020/01/newcred.png
[15]: https://fedoramagazine.org/wp-content/uploads/2020/01/addnewcred.png
[16]: https://fedoramagazine.org/wp-content/uploads/2020/01/keys-1024x298.png
[17]: https://fedoramagazine.org/wp-content/uploads/2020/01/keys2.png
[18]: https://fedoramagazine.org/wp-content/uploads/2020/01/publicendpoint.png
[19]: https://unsplash.com/@alexmachado?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText
[20]: https://unsplash.com/s/photos/backups-to-cloud?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText
[21]: https://getfedora.org/sponsors/
[22]: https://redhat.com
[23]: https://www.ibm.com/cloud/redhat

View File

@ -0,0 +1,747 @@
[#]: collector: (lujun9972)
[#]: translator: ( )
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (Using C and C++ for data science)
[#]: via: (https://opensource.com/article/20/2/c-data-science)
[#]: author: (Cristiano L. Fontana https://opensource.com/users/cristianofontana)
Using C and C++ for data science
======
Let's work through common data science task with C99 and C++11.
![metrics and data shown on a computer screen][1]
While languages like [Python][2] and [R][3] are increasingly popular for data science, C and C++ can be a strong choice for efficient and effective data science. In this article, we will use [C99][4] and [C++11][5] to write a program that uses the [Anscombes quartet][6] dataset, which I'll explain about next.
I wrote about my motivation for continually learning languages in an article covering [Python and GNU Octave][7], which is worth reviewing. All of the programs are meant to be run on the [command line][8], not with a [graphical user interface][9] (GUI). The full examples are available in the [polyglot_fit repository][10].
### The programming task
The program you will write in this series:
* Reads data from a [CSV file][11]
* Interpolates the data with a straight line (i.e., _f(x)=m ⋅ x + q_)
* Plots the result to an image file
This is a common situation that many data scientists have encountered. The example data is the first set of [Anscombe's quartet][6], shown in the table below. This is a set of artificially constructed data that gives the same results when fitted with a straight line, but their plots are very different. The data file is a text file with tabs as column separators and a few lines as a header. This task will use only the first set (i.e., the first two columns).
[**Anscombe's quartet**][6]
I
II
III
IV
x
y
x
y
x
y
x
y
10.0
8.04
10.0
9.14
10.0
7.46
8.0
6.58
8.0
6.95
8.0
8.14
8.0
6.77
8.0
5.76
13.0
7.58
13.0
8.74
13.0
12.74
8.0
7.71
9.0
8.81
9.0
8.77
9.0
7.11
8.0
8.84
11.0
8.33
11.0
9.26
11.0
7.81
8.0
8.47
14.0
9.96
14.0
8.10
14.0
8.84
8.0
7.04
6.0
7.24
6.0
6.13
6.0
6.08
8.0
5.25
4.0
4.26
4.0
3.10
4.0
5.39
19.0
12.50
12.0
10.84
12.0
9.13
12.0
8.15
8.0
5.56
7.0
4.82
7.0
7.26
7.0
6.42
8.0
7.91
5.0
5.68
5.0
4.74
5.0
5.73
8.0
6.89
### The C way
[C][12] is a general-purpose programming language that is among the most popular languages in use today (according to data from the [TIOBE Index][13], [RedMonk Programming Language Rankings][14], [Popularity of Programming Language Index][15], and [State of the Octoverse of GitHub][16]). It is a quite old language (circa 1973), and many successful programs were written in it (e.g., the Linux kernel and Git to name just two). It is also one of the closest languages to the inner workings of the computer, as it is used to manipulate memory directly. It is a [compiled language][17]; therefore, the source code has to be translated by a [compiler][18] into [machine code][19]. Its [standard library][20] is small and light on features, so other libraries have been developed to provide missing functionalities.
It is the language I use the most for [number crunching][21], mostly because of its performance. I find it rather tedious to use, as it needs a lot of [boilerplate code][22], but it is well supported in various environments. The C99 standard is a recent revision that adds some nifty features and is well supported by compilers.
I will cover the necessary background of C and C++ programming along the way so both beginners and advanced users can follow along.  
#### Installation
To develop with C99, you need a compiler. I normally use [Clang][23], but [GCC][24] is another valid open source compiler. For linear fitting, I chose to use the [GNU Scientific Library][25]. For plotting, I could not find any sensible library, and therefore this program relies on an external program: [Gnuplot][26]. The example also uses a dynamic data structure to store data, which is defined in the [Berkeley Software Distribution][27] (BSD).
Installing in [Fedora][28] is as easy as running:
```
`sudo dnf install clang gnuplot gsl gsl-devel`
```
#### Commenting code
In C99, [comments][29] are formatted by putting **//** at the beginning of the line, and the rest of the line will be discarded by the interpreter. Alternatively, anything between **/*** and ***/** is discarded, as well.
```
// This is a comment ignored by the interpreter.
/* Also this is ignored */
```
#### Necessary libraries
Libraries are composed of two parts:
* A [header file][30] that contains a description of the functions
* A source file that contains the functions' definitions
Header files are included in the source, while the libraries' sources are [linked][31] against the executable. Therefore, the header files needed for this example are:
```
// Input/Output utilities
#include &lt;stdio.h&gt;
// The standard library
#include &lt;stdlib.h&gt;
// String manipulation utilities
#include &lt;string.h&gt;
// BSD queue
#include &lt;sys/queue.h&gt;
// GSL scientific utilities
#include &lt;gsl/gsl_fit.h&gt;
#include &lt;gsl/gsl_statistics_double.h&gt;
```
#### Main function
In C, the program must be inside a special function called **[main()][32]:**
```
int main(void) {
    ...
}
```
This differs from Python, as covered in the last tutorial, which will run whatever code it finds in the source files.
#### Defining variables
In C, variables have to be declared before they are used, and they have to be associated with a type. Whenever you want to use a variable, you have to decide what kind of data to store in it. You can also specify if you intend to use a variable as a constant value, which is not necessary, but the compiler can benefit from this information. From the [fitting_C99.c program][33] in the repository:
```
const char *input_file_name = "anscombe.csv";
const char *delimiter = "\t";
const unsigned int skip_header = 3;
const unsigned int column_x = 0;
const unsigned int column_y = 1;
const char *output_file_name = "fit_C99.csv";
const unsigned int N = 100;
```
Arrays in C are not dynamic, in the sense that their length has to be decided in advance (i.e., before compilation):
```
`int data_array[1024];`
```
Since you normally do not know how many data points are in a file, use a [singly linked list][34]. This is a dynamic data structure that can grow indefinitely. Luckily, the BSD [provides linked lists][35]. Here is an example definition:
```
struct data_point {
    double x;
    double y;
    SLIST_ENTRY(data_point) entries;
};
SLIST_HEAD(data_list, data_point) head = SLIST_HEAD_INITIALIZER(head);
SLIST_INIT(&amp;head);
```
This example defines a **data_point** list comprised of structured values that contain both an **x** value and a **y** value. The syntax is rather complicated but intuitive, and describing it in detail would be too wordy.
#### Printing output
To print on the terminal, you can use the [**printf()**][36] function, which works like Octave's **printf()** function (described in the first article):
```
`printf("#### Anscombe's first set with C99 ####\n");`
```
The **printf()** function does not automatically add a newline at the end of the printed string, so you have to add it. The first argument is a string that can contain format information for the other arguments that can be passed to the function, such as:
```
`printf("Slope: %f\n", slope);`
```
#### Reading data
Now comes the hard part… There are some libraries for CSV file parsing in C, but none seemed stable or popular enough to be in the Fedora packages repository. Instead of adding a dependency for this tutorial, I decided to write this part on my own. Again, going into details would be too wordy, so I will only explain the general idea. Some lines in the source will be ignored for the sake of brevity, but you can find the complete example in the repository.
First, open the input file:
```
`FILE* input_file = fopen(input_file_name, "r");`
```
Then read the file line-by-line until there is an error or the file ends:
```
while (![ferror][37](input_file) &amp;&amp; ![feof][38](input_file)) {
    size_t buffer_size = 0;
    char *buffer = NULL;
   
    getline(&amp;buffer, &amp;buffer_size, input_file);
    ...
}
```
The [**getline()**][39] function is a nice recent addition from the [POSIX.1-2008 standard][40]. It can read a whole line in a file and take care of allocating the necessary memory. Each line is then split into [tokens][41] with the [**strtok()**][42] function. Looping over the token, select the columns that you want:
```
char *token = [strtok][43](buffer, delimiter);
while (token != NULL)
{
    double value;
    [sscanf][44](token, "%lf", &amp;value);
    if (column == column_x) {
        x = value;
    } else if (column == column_y) {
        y = value;
    }
    column += 1;
    token = [strtok][43](NULL, delimiter);
}
```
Finally, when the **x** and **y** values are selected, insert the new data point in the linked list:
```
struct data_point *datum = [malloc][45](sizeof(struct data_point));
datum-&gt;x = x;
datum-&gt;y = y;
SLIST_INSERT_HEAD(&amp;head, datum, entries);
```
The [**malloc()**][46] function dynamically allocates (reserves) some persistent memory for the new data point.
#### Fitting data
The GSL linear fitting function [**gsl_fit_linear()**][47] expects simple arrays for its input. Therefore, since you won't know in advance the size of the arrays you create, you must manually allocate their memory:
```
const size_t entries_number = row - skip_header - 1;
double *x = [malloc][45](sizeof(double) * entries_number);
double *y = [malloc][45](sizeof(double) * entries_number);
```
Then, loop over the linked list to save the relevant data to the arrays:
```
SLIST_FOREACH(datum, &amp;head, entries) {
    const double current_x = datum-&gt;x;
    const double current_y = datum-&gt;y;
    x[i] = current_x;
    y[i] = current_y;
    i += 1;
}
```
Now that you are done with the linked list, clean it up. _Always_ release the memory that has been manually allocated to prevent a [memory leak][48]. Memory leaks are bad, bad, bad. Every time memory is not released, a garden gnome loses its head:
```
while (!SLIST_EMPTY(&amp;head)) {
    struct data_point *datum = SLIST_FIRST(&amp;head);
    SLIST_REMOVE_HEAD(&amp;head, entries);
    [free][49](datum);
}
```
Finally, finally(!), you can fit your data:
```
gsl_fit_linear(x, 1, y, 1, entries_number,
               &amp;intercept, &amp;slope,
               &amp;cov00, &amp;cov01, &amp;cov11, &amp;chi_squared);
const double r_value = gsl_stats_correlation(x, 1, y, 1, entries_number);
[printf][50]("Slope: %f\n", slope);
[printf][50]("Intercept: %f\n", intercept);
[printf][50]("Correlation coefficient: %f\n", r_value);
```
#### Plotting
You must use an external program for the plotting. Therefore, save the fitting function to an external file:
```
const double step_x = ((max_x + 1) - (min_x - 1)) / N;
for (unsigned int i = 0; i &lt; N; i += 1) {
    const double current_x = (min_x - 1) + step_x * i;
    const double current_y = intercept + slope * current_x;
    [fprintf][51](output_file, "%f\t%f\n", current_x, current_y);
}
```
The Gnuplot command for plotting both files is:
```
`plot 'fit_C99.csv' using 1:2 with lines title 'Fit', 'anscombe.csv' using 1:2 with points pointtype 7 title 'Data'`
```
#### Results
Before running the program, you must compile it:
```
`clang -std=c99 -I/usr/include/ fitting_C99.c -L/usr/lib/ -L/usr/lib64/ -lgsl -lgslcblas -o fitting_C99`
```
This command tells the compiler to use the C99 standard, read the **fitting_C99.c** file, load the libraries **gsl** and **gslcblas**, and save the result to **fitting_C99**. The resulting output on the command line is:
```
#### Anscombe's first set with C99 ####
Slope: 0.500091
Intercept: 3.000091
Correlation coefficient: 0.816421
```
Here is the resulting image generated with Gnuplot.
![Plot and fit of the dataset obtained with C99][52]
### The C++11 way
[C++][53] is a general-purpose programming language that is also among the most popular languages in use today. It was created as a [successor of C][54] (in 1983) with an emphasis on [object-oriented programming][55] (OOP). C++ is commonly regarded as a superset of C, so a C program should be able to be compiled with a C++ compiler. This is not exactly true, as there are some corner cases where they behave differently. In my experience, C++ needs less boilerplate than C, but the syntax is more difficult if you want to develop objects. The C++11 standard is a recent revision that adds some nifty features and is more or less supported by compilers.
Since C++ is largely compatible with C, I will just highlight the differences between the two. If I do not cover a section in this part, it means that it is the same as in C.
#### Installation
The dependencies for the C++ example are the same as the C example. On Fedora, run:
```
`sudo dnf install clang gnuplot gsl gsl-devel`
```
#### Necessary libraries
Libraries work in the same way as in C, but the **include** directives are slightly different:
```
#include &lt;cstdlib&gt;
#include &lt;cstring&gt;
#include &lt;iostream&gt;
#include &lt;fstream&gt;
#include &lt;string&gt;
#include &lt;vector&gt;
#include &lt;algorithm&gt;
extern "C" {
#include &lt;gsl/gsl_fit.h&gt;
#include &lt;gsl/gsl_statistics_double.h&gt;
}
```
Since the GSL libraries are written in C, you must inform the compiler about this peculiarity.
#### Defining variables
C++ supports more data types (classes) than C, such as a **string** type that has many more features than its C counterpart. Update the definition of the variables accordingly:
```
`const std::string input_file_name("anscombe.csv");`
```
For structured objects like strings, you can define the variable without using the **=** sign.
#### Printing output
You can use the **printf()** function, but the **cout** object is more idiomatic. Use the operator **&lt;&lt;** to indicate the string (or objects) that you want to print with **cout**:
```
std::cout &lt;&lt; "#### Anscombe's first set with C++11 ####" &lt;&lt; std::endl;
...
std::cout &lt;&lt; "Slope: " &lt;&lt; slope &lt;&lt; std::endl;
std::cout &lt;&lt; "Intercept: " &lt;&lt; intercept &lt;&lt; std::endl;
std::cout &lt;&lt; "Correlation coefficient: " &lt;&lt; r_value &lt;&lt; std::endl;
```
#### Reading data
The scheme is the same as before. The file is opened and read line-by-line, but with a different syntax:
```
std::ifstream input_file(input_file_name);
while (input_file.good()) {
    std::string line;
    getline(input_file, line);
    ...
}
```
The line tokens are extracted with the same function as in the C99 example. Instead of using standard C arrays, use two [vectors][56]. Vectors are an extension of C arrays in the [C++ standard library][57] that allows dynamic management of memory without explicitly calling **malloc()**:
```
std::vector&lt;double&gt; x;
std::vector&lt;double&gt; y;
// Adding an element to x and y:
x.emplace_back(value);
y.emplace_back(value);
```
#### Fitting data
For fitting in C++, you do not have to loop over the list, as vectors are guaranteed to have contiguous memory. You can directly pass to the fitting function the pointers to the vectors buffers:
```
gsl_fit_linear(x.data(), 1, y.data(), 1, entries_number,
               &amp;intercept, &amp;slope,
               &amp;cov00, &amp;cov01, &amp;cov11, &amp;chi_squared);
const double r_value = gsl_stats_correlation(x.data(), 1, y.data(), 1, entries_number);
std::cout &lt;&lt; "Slope: " &lt;&lt; slope &lt;&lt; std::endl;
std::cout &lt;&lt; "Intercept: " &lt;&lt; intercept &lt;&lt; std::endl;
std::cout &lt;&lt; "Correlation coefficient: " &lt;&lt; r_value &lt;&lt; std::endl;
```
#### Plotting
Plotting is done with the same approach as before. Write to a file:
```
const double step_x = ((max_x + 1) - (min_x - 1)) / N;
for (unsigned int i = 0; i &lt; N; i += 1) {
    const double current_x = (min_x - 1) + step_x * i;
    const double current_y = intercept + slope * current_x;
    output_file &lt;&lt; current_x &lt;&lt; "\t" &lt;&lt; current_y &lt;&lt; std::endl;
}
output_file.close();
```
And then use Gnuplot for the plotting.
#### Results
Before running the program, it must be compiled with a similar command:
```
`clang++ -std=c++11 -I/usr/include/ fitting_Cpp11.cpp -L/usr/lib/ -L/usr/lib64/ -lgsl -lgslcblas -o fitting_Cpp11`
```
The resulting output on the command line is:
```
#### Anscombe's first set with C++11 ####
Slope: 0.500091
Intercept: 3.00009
Correlation coefficient: 0.816421
```
And this is the resulting image generated with Gnuplot.
![Plot and fit of the dataset obtained with C++11][58]
### Conclusion
This article provides examples for a data fitting and plotting task in C99 and C++11. Since C++ is largely compatible with C, this article exploited their similarities for writing the second example. In some aspects, C++ is easier to use because it partially relieves the burden of explicitly managing memory. But the syntax is more complex because it introduces the possibility of writing classes for OOP. However, it is still possible to write software in C with the OOP approach. Since OOP is a style of programming, it can be used in any language. There are some great examples of OOP in C, such as the [GObject][59] and [Jansson][60] libraries.
For number crunching, I prefer working in C99 due to its simpler syntax and widespread support. Until recently, C++11 was not as widely supported, and I tended to avoid the rough edges in the previous versions. For more complex software, C++ could be a good choice.
Do you use C or C++ for data science as well? Share your experiences in the comments.
--------------------------------------------------------------------------------
via: https://opensource.com/article/20/2/c-data-science
作者:[Cristiano L. Fontana][a]
选题:[lujun9972][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/cristianofontana
[b]: https://github.com/lujun9972
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/metrics_data_dashboard_system_computer_analytics.png?itok=oxAeIEI- (metrics and data shown on a computer screen)
[2]: https://opensource.com/article/18/9/top-3-python-libraries-data-science
[3]: https://opensource.com/article/19/5/learn-python-r-data-science
[4]: https://en.wikipedia.org/wiki/C99
[5]: https://en.wikipedia.org/wiki/C%2B%2B11
[6]: https://en.wikipedia.org/wiki/Anscombe%27s_quartet
[7]: https://opensource.com/article/20/2/python-gnu-octave-data-science
[8]: https://en.wikipedia.org/wiki/Command-line_interface
[9]: https://en.wikipedia.org/wiki/Graphical_user_interface
[10]: https://gitlab.com/cristiano.fontana/polyglot_fit
[11]: https://en.wikipedia.org/wiki/Comma-separated_values
[12]: https://en.wikipedia.org/wiki/C_%28programming_language%29
[13]: https://www.tiobe.com/tiobe-index/
[14]: https://redmonk.com/sogrady/2019/07/18/language-rankings-6-19/
[15]: http://pypl.github.io/PYPL.html
[16]: https://octoverse.github.com/
[17]: https://en.wikipedia.org/wiki/Compiled_language
[18]: https://en.wikipedia.org/wiki/Compiler
[19]: https://en.wikipedia.org/wiki/Machine_code
[20]: https://en.wikipedia.org/wiki/C_standard_library
[21]: https://en.wiktionary.org/wiki/number-crunching
[22]: https://en.wikipedia.org/wiki/Boilerplate_code
[23]: https://clang.llvm.org/
[24]: https://gcc.gnu.org/
[25]: https://www.gnu.org/software/gsl/
[26]: http://www.gnuplot.info/
[27]: https://en.wikipedia.org/wiki/Berkeley_Software_Distribution
[28]: https://getfedora.org/
[29]: https://en.wikipedia.org/wiki/Comment_(computer_programming)
[30]: https://en.wikipedia.org/wiki/Include_directive
[31]: https://en.wikipedia.org/wiki/Linker_%28computing%29
[32]: https://en.wikipedia.org/wiki/Entry_point#C_and_C++
[33]: https://gitlab.com/cristiano.fontana/polyglot_fit/-/blob/master/fitting_C99.c
[34]: https://en.wikipedia.org/wiki/Linked_list#Singly_linked_list
[35]: http://man7.org/linux/man-pages/man3/queue.3.html
[36]: https://en.wikipedia.org/wiki/Printf_format_string
[37]: http://www.opengroup.org/onlinepubs/009695399/functions/ferror.html
[38]: http://www.opengroup.org/onlinepubs/009695399/functions/feof.html
[39]: http://man7.org/linux/man-pages/man3/getline.3.html
[40]: https://en.wikipedia.org/wiki/POSIX
[41]: https://en.wikipedia.org/wiki/Lexical_analysis#Token
[42]: http://man7.org/linux/man-pages/man3/strtok.3.html
[43]: http://www.opengroup.org/onlinepubs/009695399/functions/strtok.html
[44]: http://www.opengroup.org/onlinepubs/009695399/functions/sscanf.html
[45]: http://www.opengroup.org/onlinepubs/009695399/functions/malloc.html
[46]: http://man7.org/linux/man-pages/man3/malloc.3.html
[47]: https://www.gnu.org/software/gsl/doc/html/lls.html
[48]: https://en.wikipedia.org/wiki/Memory_leak
[49]: http://www.opengroup.org/onlinepubs/009695399/functions/free.html
[50]: http://www.opengroup.org/onlinepubs/009695399/functions/printf.html
[51]: http://www.opengroup.org/onlinepubs/009695399/functions/fprintf.html
[52]: https://opensource.com/sites/default/files/uploads/fit_c99.png (Plot and fit of the dataset obtained with C99)
[53]: https://en.wikipedia.org/wiki/C%2B%2B
[54]: http://www.cplusplus.com/info/history/
[55]: https://en.wikipedia.org/wiki/Object-oriented_programming
[56]: https://en.wikipedia.org/wiki/Sequence_container_%28C%2B%2B%29#Vector
[57]: https://en.wikipedia.org/wiki/C%2B%2B_Standard_Library
[58]: https://opensource.com/sites/default/files/uploads/fit_cpp11.png (Plot and fit of the dataset obtained with C++11)
[59]: https://en.wikipedia.org/wiki/GObject
[60]: http://www.digip.org/jansson/

View File

@ -0,0 +1,130 @@
[#]: collector: (lujun9972)
[#]: translator: ( )
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (What developers need to know about domain-specific languages)
[#]: via: (https://opensource.com/article/20/2/domain-specific-languages)
[#]: author: (Girish Managoli https://opensource.com/users/gammay)
What developers need to know about domain-specific languages
======
DSLs are used for a specific context in a particular domain. Learn more
about what they are and why you might want to use one.
![Various programming languages in use][1]
A [domain-specific language][2] (DSL) is a language meant for use in the context of a particular domain. A domain could be a business context (e.g., banking, insurance, etc.) or an application context (e.g., a web application, database, etc.) In contrast, a general-purpose language (GPL) can be used for a wide range of business problems and applications.
A DSL does not attempt to please all. Instead, it is created for a limited sphere of applicability and use, but it's powerful enough to represent and address the problems and solutions in that sphere. A good example of a DSL is HTML. It is a language for the web application domain. It can't be used for, say, number crunching, but it is clear how widely used HTML is on the web.
A GPL creator does not know where the language might be used or the problems the user intends to solve with it. So, a GPL is created with generic constructs that potentially are usable for any problem, solution, business, or need. Java is a GPL, as it's used on desktops and mobile devices, embedded in the web across banking, finance, insurance, manufacturing, etc., and more.
### Classifying DSLs
In the DSL world, there are two types of languages:
* **Domain-specific language (DSL):** The language in which a DSL is written or presented
* **Host language:** The language in which a DSL is executed or processed
A DSL written in a distinct language and processed by another host language is called an **external** DSL.
This is a DSL in SQL that can be processed in a host language:
```
SELECT account
FROM accounts
WHERE account = '123' AND branch = 'abc' AND amount &gt;= 1000
```
For that matter, a DSL could be written in English with a defined vocabulary and form that can be processed in another host language using a parser generator like ANTLR:
```
`if smokes then increase premium by 10%`
```
If the DSL and host language are the same, then the DSL type is **internal**, where the DSL is written in the language's semantics and processed by it. These are also referred to as **embedded** DSLs. Here are two examples.
* A Bash DSL that can be executed in a Bash engine: [code]`if today_is_christmas; then apply_christmas_discount; fi` [/code] This is valid Bash that is written like English.
* A DSL written in a GPL like Java: [code] orderValue = orderValue
                .applyFestivalDiscount()
                .applyCustomerLoyalityDiscount()
                .applyCustomerAgeDiscount(); [/code] This uses a fluent style and is readable like English.
Yes, the boundaries between DSL and GPL sometimes blur.
### DSL examples
Some languages used for DSLs include:
* Web: HTML
* Shell: sh, Bash, CSH, and the likes for *nix; MS-DOS, Windows Terminal, PowerShell for Windows
* Markup languages: XML
* Modeling: UML
* Data management: SQL and its variants
* Business rules: Drools
* Hardware: Verilog, VHD
* Build tools: Maven, Gradle
* Numerical computation and simulation: MATLAB (commercial), GNU Octave, Scilab
* Various types of parsers and generators: Lex, YACC, GNU Bison, ANTLR
### Why DSL?
The purpose of a DSL is to capture or document the requirements and behavior of one domain. A DSL's usage might be even narrower for particular aspects within the domain (e.g., commodities trading in finance). DSLs bring business and technical teams together. This does not imply a DSL is for business use alone. For example, designers and developers can use a DSL to represent or design an application.
A DSL can also be used to generate source code for an addressed domain or problem. However, code generation from a DSL is not considered mandatory, as its primary purpose is domain knowledge. However, when it is used, code generation is a serious advantage in domain engineering.
### DSL pros and cons
On the plus side, DSLs are powerful for capturing a domain's attributes. Also, since DSLs are small, they are easy to learn and use. Finally, a DSL offers a language for domain experts and between domain experts and developers.
On the downside, a DSL is narrowly used within the intended domain and purpose. Also, a DSL has a learning curve, although it may not be very high. Additionally, although there may be advantages to using tools for DSL capture, they are not essential, and the development or configuration of such tools is an added effort. Finally, DSL creators need domain knowledge as well as language-development knowledge, and individuals rarely have both.
### DSL software options
Open source DSL software options include:
* **Xtext:** Xtext enables the development of DSLs and is integrated with Eclipse. It makes code generation possible and has been used by several open source and commercial products to provide specific functions. [MADS][3] (Multipurpose Agricultural Data System) is an interesting idea based on Xtext for "modeling and analysis of agricultural activities" (however, the project seems to be no longer active).
* **JetBrains MPS:** JetBrains MPS is an integrated development environment (IDE) to create DSLs. It calls itself a projectional editor that stores a document as its underlying abstract tree structure. (This concept is also used by programs such as Microsoft Word.) JetBrains MPS also supports code generation to Java, C, JavaScript, or XML.
### DSL best practices
Want to use a DSL? Here are a few tips:
* DSLs are not GPLs. Try to address limited ranges of problems in the definitive domain.
* You do not need to define your own DSL. That would be tedious. Look for an existing DSL that solves your need on sites like [DSLFIN][4], which lists DSLs for the finance domain. If you are unable to find a suitable DSL, you could define your own.
* It is better to make DSLs "like English" rather than too technical.
* Code generation from a DSL is not mandatory, but it offers significant and productive advantages when it is done.
* DSLs are called languages but, unlike GPLs, they need not be executable. Being executable is not the intent of a DSL.
* DSLs can be written with word processors. However, using a DSL editor makes syntax and semantics checks easier.
If you are using DSL now or plan to do so in the future, please share your experience in the comments.
--------------------------------------------------------------------------------
via: https://opensource.com/article/20/2/domain-specific-languages
作者:[Girish Managoli][a]
选题:[lujun9972][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/gammay
[b]: https://github.com/lujun9972
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/programming_language_c.png?itok=mPwqDAD9 (Various programming languages in use)
[2]: https://en.wikipedia.org/wiki/Domain-specific_language
[3]: http://mads.sourceforge.net/
[4]: http://www.dslfin.org/resources.html

View File

@ -1,111 +0,0 @@
[#]: collector: (lujun9972)
[#]: translator: (zhangxiangping)
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (12 open source tools for natural language processing)
[#]: via: (https://opensource.com/article/19/3/natural-language-processing-tools)
[#]: author: (Dan Barker https://opensource.com/users/barkerd427)
12种自然语言处理的开源工具
======
看看可以用在你自己NLP应用中的十几个工具吧。
![Chat bubbles][1]
在过去的几年里,自然语言处理(NLP)推动了聊天机器人、语音助手、文本预测这些在我们的日常生活中常用的语音或文本应用程技术的发展。目前有着各种各样开源的NLP工具所以我决定调查一下当前开源的NLP工具来帮助您制定您开发下一个基于语音或文本的应用程序的计划。
我将从我所熟悉的编程语言出发来介绍这些工具,尽管我对这些工具不是很熟悉(我没有在我不熟悉的语言中找工具)。也就是说,出于各种原因,我排除了三种我熟悉的语言中的工具。
R语言是没有被包含在内的因为我发现的大多数库都有一年多没有更新了。这并不总是意味着他们没有得到很好的维护但我认为他们应该得到更多的更新以便和同一领域的其他工具竞争。我还选择了最有可能在生产场景中使用的语言和工具而不是在学术界和研究中使用虽然我主要是使用R作为研究和发现工具。
我发现Scala的很多库都没有更新了。我上次使用Scala已经有好几年了当时它非常流行。但是大多数库从那个时候就再没有更新过或者只有少数一些有更新。
最后我排除了C++。这主要是因为我在的公司很久没有使用C++来进行NLP或者任何数据科学的工作。
### Python工具
#### Natural Language Toolkit (NLTK)
[Natural Language Toolkit (NLTK)][2]是我调研的所有工具中功能最完善的一个。它完美地实现了自然语言处理中多数功能组件,比如分类,令牌化,词干化,标注,分词和语义推理。每一种方法都有多种不同的实现方式,所以你可以选择具体的算法和方式去使用它。同时,它也支持不同语言。然而,它将所有的数据都表示为字符串的形式,对于一些简单的数据结构来说可能很方便,但是如果要使用一些高级的功能来说就可能有点困难。它的使用文档有点复杂,但也有很多其他人编写的使用文档,比如[a great book][3]。和其他的工具比起来,这个工具库的运行速度有点慢。但总的来说,这个工具包非常不错,可以用于需要具体算法组合的实验,探索和实际应用当中。
#### SpaCy
[SpaCy][4]是NLTK的主要竞争者。在大多数情况下都比NLTK的速度更快但是SpaCy对自然语言处理的功能组件只有单一实现。SpaCy把所有的东西都表示为一个对象而不是字符串这样就能够为构建应用简化接口。这也方便它能够集成多种框架和数据科学的工具使得你更容易理解你的文本数据。然而SpaCy不像NLTK那样支持多种语言。它对每个接口都有一些简单的选项和文档包括用于语言处理和分析各种组件的多种神经网络模型。总的来说如果创造一个新的应用的生产过程中不需要使用特定的算法的话这是一个很不错的工具。
#### TextBlob
[TextBlob][5]是NLTK的一个扩展库。你可以通过TextBlob用一种更简单的方式来使用NLTK的功能TextBlob也包括了Pattern库中的功能。如果你刚刚开始学习这将会是一个不错的工具可以用于生产对性能要求不太高的应用。TextBlob适用于任何场景但是对小型项目会更加合适。
#### Textacy
这个工具是我用过的名字最好听的。读"[Textacy][6]" 时先发出"ex"再发出"cy"。它不仅仅是名字好同时它本身也是一个很不错的工具。它使用SpaCy作为它自然语言处理核心功能但它在处理过程的前后做了很多工作。如果你想要使用SpaCy你可以先使用Textacy从而不用去多写额外的附加代码你就可以处理不同种类的数据。
#### PyTorch-NLP
[PyTorch-NLP][7]才出现短短的一年但它已经有一个庞大的社区了。它适用于快速原型开发。当公司或者研究人员推出很多其他工具去完成新奇的处理任务比如图像转换它就会被更新。PyTorch的目标用户是研究人员但它也能用于原型开发或在最开始的生产任务中使用最好的算法。基于此基础上的创建的库也是值得研究的。
### 节点工具
#### Retext
[Retext][8]是[unified collective][9]的一部分。Unified是一个接口能够集成不同的工具和插件以便他们能够高效的工作。Retext是unified工具集三个中的一个另外的两个分别是用于markdown编辑的Remark和用于HTML处理的Rehype。这是一个非常有趣的想法我很高兴看到这个社区的发展。Retext没有暴露过多的底层技术更多的是使用插件去完成你在NLP任务中想要做的事情。拼写检查固定排版情绪检测和可读性分析都可以用简单的插件来完成。如果你不想了解底层处理技术又想完成你的任务的话这个工具和社区是一个不错的选择。
#### Compromise
如果你在找拥有最高级的功能和最复杂的系统的工具的话,[Compromise][10]不是你的选择。 然而如果你想要一个性能好应用广泛还能在客户端运行的工具的话Compromise值得一试。实际上它的名字是准确的因为作者更关注更具体功能的小软件包而在功能性和准确性上做出了牺牲这些功能得益于用户对使用环境的理解。
#### Natural
[Natural][11]包含了一般自然语言处理库所具有的大多数功能。它主要是处理英文文本,但也包括一些其他语言,它的社区也支持额外的语言。它能够进行令牌化,词干化,分类,语音处理,词频-逆文档频率计算(TF-IDF)WordNet字符相似度计算和一些变换。它和NLTK有的一比因为它想要把所有东西都包含在一个包里头使用方便但是可能不太适合专注的研究。总的来说这是一个不错的功能齐全的库目前仍在开发但可能需要对底层实现有更多的了解才能完更有效。
#### Nlp.js
[Nlp.js][12]是在其他几个NLP库上开发的包括Franc和Brain.js。它提供了一个能很好支持NLP组件的接口比如分类情感分析词干化命名实体识别和自然语言生成。它也支持一些其他语言在你处理除了英语之外的语言时也能提供一些帮助。总之它是一个不错的通用工具能够提供简单的接口去调用其他工具。在你需要更强大或更灵活的工具之前这个工具可能会在你的应用程序中用上很长一段时间。
### Java工具
#### OpenNLP
[OpenNLP][13]是由Apache基金会维护的所以它可以很方便地集成到其他Apache项目中比如Apache FlinkApache NiFi和Apache Spark。这是一个通用的NLP工具包含了所有NLP组件中的通用功能可以通过命令行或者以包的形式导入到应用中来使用它。它也支持很多种语言。OpenNLP是一个很高效的工具包含了很多特性如果你用Java开发生产的话它是个很好的选择。
#### StanfordNLP
[Stanford CoreNLP][14]是一个工具集提供了基于统计的基于深度学习和基于规则的NLP功能。这个工具也有许多其他编程语言的版本所以可以脱离Java来使用。它是由高水平的研究机构创建的一个高效的工具但在生产环境中可能不是最好的。此工具具有双重许可并具有可以用于商业目的的特殊许可。总之在研究和实验中它是一个很棒的工具但在生产系统中可能会带来一些额外的开销。比起Java版本来说读者可能对它的Python版本更感兴趣。斯坦福教授在Coursera上教的最好的机器学习课程之一[点此][15]访问其他不错的资源。
#### CogCompNLP
[CogCompNLP][16]由伊利诺斯大学开发的一个工具它也有一个相似功能的Python版本事项。它可以用于处理文本包括本地处理和远程处理能够极大地缓解你本地设备的压力。它提供了很多处理函数比如令牌化词性分析标注断句命名实体标注词型还原依存分析和语义角色标注。它是一个很好的研究工具你可以自己探索它的不同功能。我不确定它是否适合生产环境但如果你使用Java的话它值得一试。
* * *
你最喜欢的开源的NLP工具和库是什么请在评论区分享文中没有提到的工具。
--------------------------------------------------------------------------------
via: https://opensource.com/article/19/3/natural-language-processing-tools
作者:[Dan Barker (Community Moderator)][a]
选题:[lujun9972][b]
译者:[zxp](https://github.com/zhangxiangping)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/barkerd427
[b]: https://github.com/lujun9972
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/talk_chat_communication_team.png?itok=CYfZ_gE7 (Chat bubbles)
[2]: http://www.nltk.org/
[3]: http://www.nltk.org/book_1ed/
[4]: https://spacy.io/
[5]: https://textblob.readthedocs.io/en/dev/
[6]: https://readthedocs.org/projects/textacy/
[7]: https://pytorchnlp.readthedocs.io/en/latest/
[8]: https://www.npmjs.com/package/retext
[9]: https://unified.js.org/
[10]: https://www.npmjs.com/package/compromise
[11]: https://www.npmjs.com/package/natural
[12]: https://www.npmjs.com/package/node-nlp
[13]: https://opennlp.apache.org/
[14]: https://stanfordnlp.github.io/CoreNLP/
[15]: https://opensource.com/article/19/2/learn-data-science-ai
[16]: https://github.com/CogComp/cogcomp-nlp

View File

@ -0,0 +1,154 @@
[#]: collector: (lujun9972)
[#]: translator: (lujun9972)
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (Send email and check your calendar with Emacs)
[#]: via: (https://opensource.com/article/20/1/emacs-mail-calendar)
[#]: author: (Kevin Sonney https://opensource.com/users/ksonney)
使用 Emacs 发送电子邮件和检查日历
======
使用 Emacs 文本编辑器管理电子邮件和查看日程安排,这是本系列文章 (2020 年使用开放源码提高生产力的 20 种方法)的第十八篇,。
![Document sending][1]
去年,我给你们带来了 2019 年的 19 天新生产力工具系列。今年,我将采取一种不同的方式:建立一个新的环境,让你使用已用或未用的工具来在新的一年里变得更有效率。
### 使用 Emacs 做(几乎)所有的事情,第 1 部分
两天前,我曾经说过我经常使用 [Vim][2] 和 [Emacs][3],在本系列的 [16][4] 和 [17][5] 天,我讲解了如何在 Vim 中做几乎所有的事情。现在Emacs 的时间到了!
[Emacs 中的邮件和日历 ][6]
在深入之前,我需要说明两件事。首先,我这里使用默认的 Emacs 配置,而不是我之前[写过 ][8] 的 [Spacemacs][7]。为什么呢?因为这样一来我使用的就是默认快捷键,从而使你可以参考文档,而不必将“本机 Emacs” 转换为 Spacemacs。第二在本系列文章中我没有对 Org 模式进行任何设置。Org 模式本身几乎可以自成一个完整的系列,它非常强大,但是设置可能非常复杂。
#### 配置 Emacs
配置 Emacs 比配置 Vim 稍微复杂一些,但以我之见,从长远来看,这样做是值得的。首先我们创建一个配置文件,并在 Emacs 中打开它:
```
mkdir ~/.emacs.d
emacs ~/.emacs.d/init.el
```
Nextadd some additional package sources to the built-in package manager。Add the following to **init.el**:
接下来,向内置的包管理器添加一些额外的包源。在 **init.el** 中添加以下内容:
```
(package-initialize)
(add-to-list 'package-archives '("melpa" . "<http://melpa.org/packages/>"))
(add-to-list 'package-archives '("org" . "<http://orgmode.org/elpa/>") t)
(add-to-list 'package-archives '("gnu" . "<https://elpa.gnu.org/packages/>"))
(package-refresh-contents)
```
使用 `Ctrl+x Ctrl+s` 保存文件,然后按下 `Ctrl+x Ctrl+c` 退出,再重启 Emacs。Emacs 会在启动时下载所有的插件包列表,之后你就可以使用内置的包管理器安装插件了。
输入 `Meta+x` 会弹出命令提示符(大多数键盘上 **Meta** 键就是的 **Alt** 键,而在 MacOS 上则是 **Option**)。在命令提示符下输入 **package-list-packages** 就会显示可以安装的包列表。遍历该列表并使用 **i** 键选择以下包:
```
bbdb
bbdb-vcard
calfw
calfw-ical
notmuch
```
选好软件包后按 **x** 安装它们。根据你的网络连接情况,这可能需要一段时间。你也许会看到一些编译错误,但是可以忽略它们。
安装完成后,使用组合键 `Ctrl+x Ctrl+f` 打开 `~/.emacs.d/init.el`,并在 `(package-refresh-packages)` 之后 `(custom-set-variables` 之前添加以下行到文件中。
`(custom-set-variables` 行由 Emacs 内部维护,你永远不应该修改它之后的任何内容。以**;;**开头的行则是注释。
```
;; Set up bbdb
(require 'bbdb)
(bbdb-initialize 'message)
(bbdb-insinuate-message)
(add-hook 'message-setup-hook 'bbdb-insinuate-mail)
;; set up calendar
(require 'calfw)
(require 'calfw-ical)
;; Set this to the URL of your calendar. Google users will use
;; the Secret Address in iCalendar Format from the calendar settings
(cfw:open-ical-calendar "<https://path/to/my/ics/file.ics>")
;; Set up notmuch
(require 'notmuch)
;; set up mail sending using sendmail
(setq send-mail-function (quote sendmail-send-it))
(setq user-mail-address "[myemail@mydomain.com][9]"
      user-full-name "My Name")
```
现在,您已经准备好使用自己的配置启动 Emacs 了!保存 `init.el` 文件 (`Ctrl+x Ctrl+s`),退出 Emacs(`Ctrl+x Ctrl+c`),然后重启之。这次重启要多花些时间。
#### 使用 Notmuch 在 Emacs 中读写电子邮件
一旦你看到了 Emacs 启动屏幕,你就可以使用 [Notmuch][10] 来阅读电子邮件了。键入 `Meta+x notmuch`,您将看到 notmuch 的 Emacs 接口。
![使用 notmuch 阅读邮件 ][11]
所有加粗的项目都是指向电子邮件视图的链接。你可以通过点击鼠标或者使用 tab 键在它们之间跳转并按 **Return****Enter** 来访问它们。你可以使用搜索栏来搜索 Notmuch 的数据库,语法与 Notmuch 命令行上的[语法 ][12] 相同。如果你愿意,还可以使用 **[save]** 按钮保存搜索以便未来使用,这些搜索会被添加到屏幕顶部的列表中。如果你进入一个链接就会看到一个相关电子邮件的列表。您可以使用**箭头**键在列表中导航,并在要读取的消息上按 **Enter**。按 **r** 可以回复一条消息,**f** 转发该消息,**q** 退出当前屏幕。
You can write a new message by typing **Meta**+**x compose-mail**。Composingreplyingand forwarding all bring up the mail writing interface。When you are done writing your emailpress **Ctrl**+**c Ctrl**+**c** to send it。If you decide you don't want to send itpress **Ctrl**+**c Ctrl**+**k** to kill the message compose buffer (window)。
您可以通过键入 `Meta+x compose-mail` 来编写新消息。撰写、回复和转发都将打开编写邮件的接口。写完邮件后,按 `Ctrl+c Ctrl+c` 发送。如果你决定不发送它,按 `Ctrl+c Ctrl+k` 关闭消息撰写缓冲区(窗口)。
#### 使用 BBDB 在 Emacs 中自动补完电子邮件地址
[在消息中使用 BBDB 地址 ][13]
那么通讯录怎么办?这就是 [BBDB][14] 发挥作用的地方。但首先我们需要从 [abook][15] 导入所有地址,方法是打开命令行并运行以下导出命令:
```
`abook --convert --outformat vcard --outfile ~/all-my-addresses.vcf --infile ~/.abook/addresses`
```
Emacs 启动后,运行 `Meta+x bbdb-vcard-import-file`。它将提示你输入要导入的文件名,即 `~/all-my-address.vcf`。导入完成后,在编写消息时,可以开始输入名称并使用 **Tab** 搜索和自动完成 “to” 字段的内容。BBDB 还会打开一个联系人缓冲区,以便你确保它是正确的。
既然在 [vdirsyncer][16] 中已经为每个地址都生成了对应的 vcf。文件了为什么我们还要这样做呢如果你像我一样有许多地址一次处理一个地址是很麻烦的。这样做你就可以把所有的东西都放在一本书里做成一个大文件。
#### 使用 calfw 在 Emacs 中浏览日历
![calfw 日历 ][17]
最后,你可以使用 Emacs 查看日历。在上面的配置中,你安装了 [calfw][18] 包并添加了一些行来告诉它在哪里可以找到要加载的日历。Calfw 是 Emacs 日历框架的简称,它支持多种日历格式。我使用的是谷歌日历,这也是我放在配置中的链接。日历将在启动时自动加载,您可以通过 `Ctrl+x+b` 命令切换到 **cfw-calendar** 缓冲区来查看日历。
Calfw 提供日、周、双周和月视图。您可以在日历顶部选择视图,并使用**箭头**键导航日历。不幸的是calfw 只能查看日历,所以您仍然需要使用 [khal][19] 之类的工具或通过 web 界面来添加、删除和修改事件。
这就是 Emacs 中的邮件、日历和邮件地址。明天我会展示更多。
--------------------------------------------------------------------------------
via: https://opensource.com/article/20/1/emacs-mail-calendar
作者:[Kevin Sonney][a]
选题:[lujun9972][b]
译者:[lujun9972](https://github.com/lujun9972)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/ksonney
[b]: https://github.com/lujun9972
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/email_paper_envelope_document.png?itok=uPj_kouJ (Document sending)
[2]: https://www.vim.org/
[3]: https://www.gnu.org/software/emacs/
[4]: https://opensource.com/article/20/1/vim-email-calendar
[5]: https://opensource.com/article/20/1/vim-task-list-reddit-twitter
[6]: https://opensource.com/sites/default/files/uploads/productivity_18-1.png (Mail and calendar in Emacs)
[7]: https://www.spacemacs.org/
[8]: https://opensource.com/article/19/12/spacemacs
[9]: mailto:myemail@mydomain.com
[10]: https://notmuchmail.org/
[11]: https://opensource.com/sites/default/files/uploads/productivity_18-2.png (Reading mail with Notmuch)
[12]: https://opensource.com/article/20/1/organize-email-notmuch
[13]: https://opensource.com/sites/default/files/uploads/productivity_18-3.png (Composing a message with BBDB addressing)
[14]: https://www.jwz.org/bbdb/
[15]: https://opensource.com/article/20/1/sync-contacts-locally
[16]: https://opensource.com/article/20/1/open-source-calendar
[17]: https://opensource.com/sites/default/files/uploads/productivity_18-4.png (calfw calendar)
[18]: https://github.com/kiwanami/emacs-calfw
[19]: https://khal.readthedocs.io/en/v0.9.2/index.html

View File

@ -1,161 +0,0 @@
[#]: collector: (lujun9972)
[#]: translator: ( guevaraya)
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (Troubleshoot Kubernetes with the power of tmux and kubectl)
[#]: via: (https://opensource.com/article/20/2/kubernetes-tmux-kubectl)
[#]: author: (Abhishek Tamrakar https://opensource.com/users/tamrakar)
解决 Kubernetes 问题的利器 Tmux 和 kubectl
======
一个 kubectl 插件 用 tmux 使 Kubernetes 疑难问题变得更简单。
![一个坐在笔记本面前的妇女][1]
[Kubernetes][2] 是一个活跃的开源容器管理平台,它提供了可扩展性,高可用性,健壮性和富有弹性的应用程序管理。它的众多特性之一是支持通过原生的客户端程序 [kubectl][3] 运行定制脚本或可执行程序Kubectl 很强大的,允许用户在 Kubernetes 集群上用它直接做很多事情。
### 使用别名进行 Kubernetes 的故障排查
使用 Kubernetes 的容器管理的人都知道由于设计上原因带来了其复杂性。因此迫切的需要快速的以及几乎不需要人工干预方式简化故障排查(除过特殊情况)。
在故障排查功能方面,这有很多场景需要考虑。有一个场景,你知道你需要运行什么,但是这个命令的语法(即使作为一个单独的命令运行)过于复杂,或需要一、两次交互才能起作用。
例如,如果你频繁的需要调整一个系统命名空间里正在运行的容器,你可能发现自己在重复的写入:
```
`kubectl --namespace=kube-system exec -i -t <your-pod-name>`
```
为了简化故障排查,你可以用这些指令的命令行补全功能。比如,你可以增加下面命令到你的隐藏配置文件(.bashrc 或 .zshrc
```
`alias ksysex='kubectl --namespace=kube-system exec -i -t'`
```
这是来自于常见的 [Kubernetes 别名仓][4]的一个例子,它展示了一个 kubectl 简化的功能的方法。像这个场景的简化情况,使用别名很有用。
### 切换到 kubectl 插件
更复杂的故障排查场景是需要执行很多命令一个一个的执行然后去调查环境最后得出结论。单用别名方法是不能解决这种情况的你需要知道你所部署的Kubernetes 之间逻辑和和相关性,你真是需要的是自动化来短时间输出你想要的。
考虑到你的集群有10到20或50到100个命名空间来提供不同的微服务。一般在进行故障排查时做什么事情对你有帮助
* 你需要某个东西可快速的告知哪个 Pod 哪个 命名空间抛的错误。
* 你需要某个东西可监视一个命名空间的所有 pod 的日志。
* 你可能也需要监视出现错误的指定命名空间的特定 pod 的日志。
只要包含以上任意的解决方案将对定位产品问题很大的帮助,包含对开发和测试周期过程。
你可以用 [kubectl 插件][5] 创建比简易别名更强大的方法。插件类似于其他用任何语言编写的独立脚本,被设计为 Kubernetes 管理员的主要命令扩展。
创建一个插件,你必须用正确的语法 **kubectl-&lt;your-plugin-name&gt;** 来拷贝这个脚本到导出目录 **$PATH** ,需要赋予可执行权限(**chmod +x**)。
创建插件之后把他移动到你的目录,你需要立即运行。例如,你的目录下有一个 kubectl-krawl 和 kubectl-kmux:
```
$ kubectl plugin list
The following compatible plugins are available:
/usr/local/bin/kubectl-krawl
/usr/local/bin/kubectl-kmux
$ kubectl kmux
```
现在让我们见识下带有 tmux 的 Kubernetes 的有多强大。
### 驾驭强大的 tmux
[Tmux][6] 是一个非常强大的工具,许多管理员和操作团队通过它来反馈问题故障,通过易于分屏的方式到窗口上并行调试多个机器以及管理日志。他的主要的优点是可基于命令行或自动化的脚本。
我创建[一个 kubectl 插件][7] 用 tmux 使故障排查更加简单。我将通过注释来了解插件背后的逻辑(我们来瞅一瞅插件的整个源码):
```
#NAMESPACE is namespace to monitor.
#POD is pod name
#Containers is container names
# initialize a counter n to count the number of loop counts, later be used by tmux to split panes.
n=0;
# start a loop on a list of pod and containers
while IFS=' ' read -r POD CONTAINERS
do
           # tmux create the new window for each pod
            tmux neww $COMMAND -n $POD 2&gt;/dev/null
           # start a loop for all containers inside a running pod
        for CONTAINER in ${CONTAINERS//,/ }
        do
        if [ x$POD = x -o x$CONTAINER = x ]; then
        # if any of the values is null, exit.
        warn "Looks like there is a problem getting pods data."
        break
        fi
           
            # set the command to execute
        COMMAND=”kubectl logs -f $POD -c $CONTAINER -n $NAMESPACE”
        # check tmux session
        if tmux has-session -t &lt;session name&gt; 2&gt;/dev/null;
        then
        &lt;set session exists&gt;
        else
        &lt;create session&gt;
        fi
           # split planes in the current window for each containers
        tmux selectp -t $n \; \
        splitw $COMMAND \; \
        select-layout tiled \;
           # end loop for containers
        done
           # rename the window to identify by pod name
        tmux renamew $POD 2&gt;/dev/null
       
            # increment the counter
        ((n+=1))
# end loop for pods
done&lt; &lt;(&lt;fetch list of pod and containers from kubernetes cluster&gt;)
# finally select the window and attach session
 tmux selectw -t &lt;session name&gt;:1 \; \
  attach-session -t &lt;session name&gt;\;
```
运行插件脚本后,它将在当前目录会生成一个同名的镜像。每个 pod 有一个窗口,每个容器(如果有多个)被分割成不同 pos 窗口,日志以数据流形式输出。 漂亮的tmux 如下;如果配置正确,你将会看到哪个窗口是否处于激活运行状态(可看到标签是白色的)。
![kmux 插件的输出][8]
### 总结
别名是在 Kubernetes 环境下常见的也有用的简易故障排查方法。当环境变得复杂用高级脚本生成的kubectl 插件是一个很强大的方法。至于用哪个编程语言来编写 kubectl 插件是没有限制。唯一的要求是路径命名是可执行的,并且不能与已知的 kubectl 命令重复。
为了阅读完整的插件源码,我们尝试创建了一个插件,请查看我的 [kube-plugins-github][7] 仓。欢迎提交问题和补丁。
--------------------------------------------------------------------------------
via: https://opensource.com/article/20/2/kubernetes-tmux-kubectl
作者:[Abhishek Tamrakar][a]
选题:[lujun9972][b]
译者:[译者ID](https://github.com/guevaraya)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/tamrakar
[b]: https://github.com/lujun9972
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/OSDC_women_computing_4.png?itok=VGZO8CxT (一个坐在笔记本面前的妇女)
[2]: https://opensource.com/resources/what-is-kubernetes
[3]: https://kubernetes.io/docs/reference/kubectl/overview/
[4]: https://github.com/ahmetb/kubectl-aliases/blob/master/.kubectl_aliases
[5]: https://kubernetes.io/docs/tasks/extend-kubectl/kubectl-plugins/
[6]: https://opensource.com/article/19/6/tmux-terminal-joy
[7]: https://github.com/abhiTamrakar/kube-plugins
[8]: https://opensource.com/sites/default/files/uploads/kmux-output.png (Output of kmux plugin)

View File

@ -1,5 +1,5 @@
[#]: collector: (lujun9972)
[#]: translator: ( )
[#]: translator: (HankChow)
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
@ -7,27 +7,26 @@
[#]: via: (https://www.networkworld.com/article/3527430/digging-up-ip-addresses-with-the-dig-command.html)
[#]: author: (Sandra Henry-Stocker https://www.networkworld.com/author/Sandra-Henry_Stocker/)
Digging up IP addresses with the Linux dig command
使用 dig 命令查询 IP 地址
======
The dig command is extremely versatile both for retrieving information from domain name servers and for troubleshooting.
Thinkstock
命令行工具 `dig` 是用于解析域名和故障排查的一个利器。
Not unlike **nslookup** in function, but with a lot more options, the **dig** command provides information that name servers manage and can be very useful for troubleshooting problems. Its both simple to use and has lots of useful options.
从主要功能上来说,`dig` 和 `nslookup` 之间差异不大,但 `dig` 更像一个加强版的 `nslookup`,可以查询到一些由域名服务器管理的信息,这在排查某些问题的时候非常有用。总的来说,`dig` 是一个既简单易用又功能强大的命令行工具。
The name “dig” stands for “domain information groper” since domain groping is basically what it does. The amount of information that it provides depends on a series of options that you can use to tailor its output to your needs. Dig can provide a lot of detail or be surprisingly terse.
`dig` 最基本的功能就是查询域名信息,因此它的名称实际上是“<ruby>域名信息查询工具<rt>Domain Information Groper</rt></ruby>”的缩写。`dig` 向用户返回的内容可以非常详尽,也可以非常简洁,展现内容的多少完全由用户在查询时使用的选项来决定。
[[Get regularly scheduled insights by signing up for Network World newsletters.]][1]
### Just the IP, please
### 我只需要查询 IP 地址
To get _just_ the IP address for a system, add the **+short** option to your dig command like this:
如果只需要查询某个域名指向的 IP 地址,可以使用 `+short` 选项:
```
$ dig facebook.com +short
31.13.66.35
```
Don't be surprised, however, if some domains are tied to multiple IP addresses to make the sites they support more reliable.
在查询的时候发现有的域名会指向多个 IP 地址?这其实是网站提高其可用性的一种措施。
```
$ dig networkworld.com +short
@ -37,7 +36,7 @@ $ dig networkworld.com +short
151.101.194.165
```
Also, don't be surprised if the order of the IP addresses changes from one query to the next. This is a side effect of load balancing.
也正是由于这些网站通过负载均衡实现高可用,在下一次查询的时候,或许会发现这几个 IP 地址的排序有所不同。
```
$ dig networkworld.com +short
@ -47,9 +46,9 @@ $ dig networkworld.com +short
151.101.66.165
```
### Standard dig output
### 标准返回
The standard dig display provides details on dig itself along with the response from the name server.
`dig` 的标准返回内容则包括这个工具本身的一些信息,以及请求域名服务器时返回的响应内容:
```
$ dig networkworld.com
@ -77,7 +76,7 @@ networkworld.com. 300 IN A 151.101.2.165
;; MSG SIZE rcvd: 109
```
Since name servers generally cache collected data for a while, the query time shown at the bottom of dig output might sometimes might say "0 msec":
由于域名服务器有缓存机制,返回的内容可能是之前缓存好的信息。在这种情况下,`dig` 最后显示的<ruby>查询时间<rt>Query time</rt></ruby>会是 0 毫秒0 msec
[][2]
@ -88,11 +87,11 @@ Since name servers generally cache collected data for a while, the query time sh
;; MSG SIZE rcvd: 109
```
### Who you gonna ask?
### 向谁查询?
By default, dig will refer to your **/etc/resolv.conf** file to determine what name server to query, but you can refer queries to other DNS servers by adding an **@** option.
在默认情况下,`dig` 会根据 `/etc/resolv.conf` 这个文件的内容决定向哪个域名服务器获取查询结果。你也可以使用 `@` 来指定 `dig` 请求的域名服务器。
In the example below, for example, the query is being sent to Google's name server (i.e., 8.8.8.8).
在下面的例子中,就指定了 `dig` 向 Google 的域名服务器 8.8.8.8 查询域名信息。
```
$ dig @8.8.8.8 networkworld.com
@ -121,21 +120,21 @@ networkworld.com. 299 IN A 151.101.2.165
;; MSG SIZE rcvd: 109
```
To determine what version of dig youre using, use the **-v** option. You should see something like this:
想要知道正在使用的 `dig` 工具的版本,可以使用 `-v` 选项。你会看到类似这样:
```
$ dig -v
DiG 9.11.5-P4-5.1ubuntu2.1-Ubuntu
```
or this:
或者这样的返回信息:
```
$ dig -v
DiG 9.11.4-P2-RedHat-9.11.4-22.P2.el8
```
To get just the answer portion of this response, you can omit name server details, but still get the answer you're looking for by using both a **+noall** (don't show everything) and a **+answer** (but show the answer section) like this:
如果你觉得 `dig` 返回的内容过于详细,可以使用 `+noall`(不显示所有内容)和 `+answer`(仅显示域名服务器的响应内容)选项,域名服务器的详细信息就会被忽略,只保留域名解析结果。
```
$ dig networkworld.com +noall +answer
@ -148,9 +147,9 @@ networkworld.com. 300 IN A 151.101.66.165
networkworld.com. 300 IN A 151.101.2.165
```
### Looking up a batch of systems
### 批量查询域名
If you want to dig for a series of domain names, you can list the domain names in a file and then use a command like this one to have dig run through the list and provide the information.
如果你要查询多个域名,可以把这些域名写入到一个文件内,然后使用下面的 `dig` 命令遍历整个文件并给出所有查询结果。
```
$ dig +noall +answer -f domains
@ -165,7 +164,7 @@ amazon.com. 18 IN A 176.32.98.166
amazon.com. 18 IN A 205.251.242.103
```
You could add +short to the command above but, with some sites having multiple IP addresses, this might not be very useful. To cut down on the detail but be sure that you can tell which IP belongs to which domain, you could instead pass the output to **awk** to display just the first and last columns of data:
你也可以在上面的命令中使用 `+short` 选项,但如果其中有些域名指向多个 IP 地址,就无法看出哪些 IP 地址对应哪个域名了。在这种情况下,更好地做法应该是让 `awk` 对返回内容进行处理,只留下第一列和最后一列:
```
$ dig +noall +answer -f domains | awk '{print $1,$NF}'
@ -179,15 +178,13 @@ amazon.com. 205.251.242.103
amazon.com. 176.32.103.205
```
Join the Network World communities on [Facebook][3] and [LinkedIn][4] to comment on topics that are top of mind.
--------------------------------------------------------------------------------
via: https://www.networkworld.com/article/3527430/digging-up-ip-addresses-with-the-dig-command.html
作者:[Sandra Henry-Stocker][a]
选题:[lujun9972][b]
译者:[译者ID](https://github.com/译者ID)
译者:[HankChow](https://github.com/HankChow)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出

View File

@ -7,27 +7,27 @@
[#]: via: (https://opensource.com/article/20/2/no-ide-script)
[#]: author: (Yedidyah Bar David https://opensource.com/users/didib)
Don't like IDEs? Try grepgitvi
不喜欢 IDE 么?试试看 grepgitvi
======
A simple and primitive script to open Vim with your file of choice.
一个简单又原始的脚本来用 Vim 打开你选择的文件。
![Files in a folder][1]
Like most developers, I search and read source code all day long. Personally, I've never gotten used to integrated development environments (IDEs), and for years, I mainly used **grep** and copy/pasted file names to open Vi(m).
像大多数开发者一样,我整天都在搜索和阅读源码。就我个人而言,我从来没有习惯集成开发环境 IDE多年来我主要使用 **grep** 并复制/粘贴的文件名来打开 Vim
Eventually, I came up with this script, slowly refining it as needed.
最终,我写了这个脚本,并根据需要缓慢地对其进行了完善。
Its dependencies are [Vim][2] and [rlwrap][3], and it is open source under the Apache 2.0 license. To use the script, [put it in your PATH][4], and run it inside a directory of text files with:
它依赖 [Vim][2] 和 [rlwrap][3],并使用 Apache 2.0 许可开源。要使用该脚本,请[将它放到 PATH 中][4],然后在文本目录下运行:
```
`grepgitvi <grep options> <grep/vim search pattern>`
```
It will return a numbered list of search results, prompt you for the number of the result you want to use, and open Vim with that result. After you exit Vim, it will show the list again in a loop until you enter anything other than a result number. You can also use the Up and Down arrow keys to select a file; this makes it easier (for me) to find which results I've already looked at.
它将返回搜索结果的编号列表,并提示你输入结果编号并打开 Vim。退出 Vim 后,它将再次显示列表,直到你输入除结果编号以外的任何内容。你也可以使用向上和向下箭头键选择一个文件。(这对我来说)更容易找到我已经看过的结果。
It's simple and primitive compared to modern IDEs, or even to more sophisticated uses of Vim, but that's what does the job for me.
与现代 IDE 甚至与 Vim 的更复杂的用法相比,它简单而原始,但它对我有用。
### The script
### 脚本
```
@ -90,7 +90,7 @@ via: https://opensource.com/article/20/2/no-ide-script
作者:[Yedidyah Bar David][a]
选题:[lujun9972][b]
译者:[译者ID](https://github.com/译者ID)
译者:[geekpi](https://github.com/geekpi)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出

View File

@ -0,0 +1,689 @@
[#]: collector: (lujun9972)
[#]: translator: (heguangzhi)
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (Using Python and GNU Octave to plot data)
[#]: via: (https://opensource.com/article/20/2/python-gnu-octave-data-science)
[#]: author: (Cristiano L. Fontana https://opensource.com/users/cristianofontana)
使用 Python 和 GNU Octave 绘制数据
======
了解如何使用 Python 和 GNU Octave 完成一项常见的数据科学任务。
[分析:图表和图形][1]
数据科学是跨越编程语言的知识领域。有些人以解决这一领域的问题而闻名,而另一些人则鲜为人知。这篇文章将帮助你熟悉用一些流行语言做数据科学。
### 为数据科学选择 Python 和 GNU Octave
我经常尝试学习一种新的编程语言。为什么?这主要是对旧方式的厌倦和对新方式的好奇的结合。当我开始编程时,我唯一知道的语言是 C 语言。那些年的编程生涯既艰难又危险,因为我不得不手动分配内存,管理指针,并记得释放内存。
然后一个朋友建议我试试 Python现在编程生活变得简单多了。虽然程序运行变得慢多了但我不必通过编写分析软件来受苦了。然而我很快就意识到每种语言都有比其他语言更适合自己应用场景。后来我学习了其他一些语言每种语言都给我带来了一些新的启发。发现新的编程风格让我可以将一些解决方案移植到其他语言中这样一切都变得有趣多了。
为了对一种新的编程语言(及其文档)有所了解,我总是从编写一些执行我熟悉的任务的示例程序开始。为此,我将解释如何用 Python 和 GNU Octave 编写一个程序来完成一个你可以归类为数据科学的特殊任务。如果你已经熟悉其中一种语言,从中开始,浏览其他语言,寻找相似之处和不同之处。这并不是对编程语言的详尽比较,只是一个小小的展示。
所有的程序都应该在[命令行][2]上运行,而不是用[图形用户界面][3](GUI)。完整的例子可以在[多语种知识库][4]中找到。
### 编程任务
你将在本系列中编写的程序:
* 从[CSV文件][5]中读取数据
* 用直线插入数据(例如 _f(x)=m ⋅ x + q_)
* 将结果生成图像文件
这是许多数据科学家遇到的常见情况。示例数据是第一组[Anscombe's quartet][6],如下表所示。这是一组人工构建的数据,当用直线拟合时会给出相同的结果,但是它们的曲线非常不同。数据文件是一个文本文件,以制表符作为列分隔,以几行作为标题。此任务将仅使用第一组(例如:前两列)。
I
II
III
IV
x
y
x
y
x
y
x
y
10.0
8.04
10.0
9.14
10.0
7.46
8.0
6.58
8.0
6.95
8.0
8.14
8.0
6.77
8.0
5.76
13.0
7.58
13.0
8.74
13.0
12.74
8.0
7.71
9.0
8.81
9.0
8.77
9.0
7.11
8.0
8.84
11.0
8.33
11.0
9.26
11.0
7.81
8.0
8.47
14.0
9.96
14.0
8.10
14.0
8.84
8.0
7.04
6.0
7.24
6.0
6.13
6.0
6.08
8.0
5.25
4.0
4.26
4.0
3.10
4.0
5.39
19.0
12.50
12.0
10.84
12.0
9.13
12.0
8.15
8.0
5.56
7.0
4.82
7.0
7.26
7.0
6.42
8.0
7.91
5.0
5.68
5.0
4.74
5.0
5.73
8.0
6.89
### Python 方式
[Python][7]是一种通用编程语言,是当今最流行的语言之一(从[TIOBE index][8]、[RedMonk编程语言排名][9]、[编程语言流行指数][10]、[State of the Octoverse of GitHub][11]和其他来源的调查结果可以看出)。这是一种[解释的语言][12];因此,源代码由执行指令的程序读取和评估。它有一个全面的[标准库][13]并且总体上非常好用(我没有参考这最后一句话;这只是我的拙见)。
#### 安装
要使用 Python 开发,你需要解释器和一些库。最低要求是:
* [NumPy][14]用于合适的数组和矩阵操作
* [SciPy][15]进行数据科学
* [Matplotlib][16]绘图
在 [Fedora][17] 安装它们是很容易的:
```
`sudo dnf install python3 python3-numpy python3-scipy python3-matplotlib`
```
#### 注释代码
在 Python中[注释][18]是通过在行首添加一个 **#** 来实现的,该行的其余部分将被解释器丢弃:
```
`# This is a comment ignored by the interpreter.`
```
[fitting_python.py][19]示例使用注释在源代码中插入许可信息,第一行是[特殊注释][20],它允许在命令行上执行脚本:
```
`#! /usr/bin/env python3`
```
这一行通知命令行解释器,脚本需要由程序**python3**执行。
#### Required libraries
在 Python 中,库和模块可以作为一个对象导入(如示例中的第一行),其中包含库的所有函数和成员。通过使用 **as** 规范可以用于定义标签并重命名它们:
```
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
```
你也可以决定只导入一个子模块(如第二行和第三行)。语法有两个(或多或少)等效选项: **import module.submodule****from module import submodule**
#### 定义变量
Python 的变量是在第一次赋值时被声明的:
```
input_file_name = "anscombe.csv"
delimiter = "\t"
skip_header = 3
column_x = 0
column_y = 1
```
变量类型由分配给变量的值推断。没有常量值的变量,除非它们在模块中声明并且只能被读取。习惯上,不被修改的变量应该用大写字母命名。
#### 打印输出
通过命令行运行程序意味着输出只能打印在终端上。Python 有[**print()**][21]函数,默认情况下,该函数打印其参数,并在输出的末尾添加一个换行符:
```
`print("#### Anscombe's first set with Python ####")`
```
在 Python 中,可以将**print()**函数与[字符串类][23]的[格式化能力][22]相结合。字符串具有**format**方法,可用于向字符串本身添加一些格式化文本。例如,可以添加格式化的浮点数,例如:
```
`print("Slope: {:f}".format(slope))`
```
#### 读取数据
使用 NumPy 和 函数[**genfromtxt()**][24]读取CSV文件非常容易该函数生成[NumPy数组][25]:
```
`data = np.genfromtxt(input_file_name, delimiter = delimiter, skip_header = skip_header)`
```
在 Python中一个函数可以有可变数量的参数您可以通过指定所需的参数来让它传递一个子集。数组是非常强大的矩阵状对象可以很容易地分割成更小的数组:
```
x = data[:, column_x]
y = data[:, column_y]
```
冒号选择整个范围,也可以用来选择子范围。例如,要选择数组的前两行,可以使用:
```
`first_two_rows = data[0:1, :]`
```
#### 拟合数据
SciPy提供了方便的数据拟合功能例如[**linregress()**][26]功能。该函数提供了一些与拟合相关的重要值,如斜率、截距和两个数据集的相关系数:
```
slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)
print("Slope: {:f}".format(slope))
print("Intercept: {:f}".format(intercept))
print("Correlation coefficient: {:f}".format(r_value))
```
因为**linregress()**提供了几条信息,所以结果可以同时保存到几个变量中。
#### 绘图
Matplotlib 库仅仅绘制数据点,因此,你应该定义要绘制的点的坐标。已经定义了**x** 和 **y** 数组,所以你可以直接绘制它们,但是你还需要代表直线的数据点。
```
`fit_x = np.linspace(x.min() - 1, x.max() + 1, 100)`
```
[**linspace()**][27]函数可以方便地在两个值之间生成一组等距值。利用强大的 NumPy 数组可以轻松计算纵坐标,该数组可以像普通数值变量一样在公式中使用:
```
`fit_y = slope * fit_x + intercept`
```
公式在数组中逐元素应用;因此,结果在初始数组中具有相同数量的条目。
要绘图,首先,定义一个包含所有图形的[图形对象][28]:
```
fig_width = 7 #inch
fig_height = fig_width / 16 * 9 #inch
fig_dpi = 100
fig = plt.figure(figsize = (fig_width, fig_height), dpi = fig_dpi)
```
一个图形可以画几个图;在 Matplotlib 中,这些图块被称为[轴][29]。本示例定义一个单轴对象来绘制数据点:
```
ax = fig.add_subplot(111)
ax.plot(fit_x, fit_y, label = "Fit", linestyle = '-')
ax.plot(x, y, label = "Data", marker = '.', linestyle = '')
ax.legend()
ax.set_xlim(min(x) - 1, max(x) + 1)
ax.set_ylim(min(y) - 1, max(y) + 1)
ax.set_xlabel('x')
ax.set_ylabel('y')
```
将该图保存到[PNG image file][30]中,有:
```
`fig.savefig('fit_python.png')`
```
如果要显示(而不是保存)绘图,请调用:
```
`plt.show()`
```
此示例引用了绘图部分中使用的所有对象:它定义了对象 **fig** 和对象 **ax**。这种技术细节是不必要的,因为 **plt** 对象可以直接用于绘制数据集。《[Matplotlib 教程][31]展示了这样一个界面:
```
`plt.plot(fit_x, fit_y)`
```
坦率地说,我不喜欢这种方法,因为它隐藏了各种对象之间发生的重要的的交互。不幸的是,有时[官方的例子][32]有点令人困惑,因为他们倾向于使用不同的方法。在这个简单的例子中,引用图形对象是不必要的,但是在更复杂的例子中(例如在图形用户界面中嵌入图形时),引用图形对象就变得很重要了。
#### 结果
命令行输入:
```
#### Anscombe's first set with Python ####
Slope: 0.500091
Intercept: 3.000091
Correlation coefficient: 0.816421
```
这是 Matplotlib 产生的图像:
![Plot and fit of the dataset obtained with Python][33]
### GNU Octave 方式
[GNU Octave][34]语言主要用于数值计算。它提供了一个简单的操作向量和矩阵的语法,并且有一些强大的绘图工具。这是一种像 Python 一样的解释语言。由于 Octave的语法是[最兼容][35] [MATLAB][36],它经常被描述为一个免费的替代 MATLAB 的方案。Octave 没有被列为最流行的编程语言,但是 MATLAB 是,所以 Octave 在某种意义上是相当流行的。MATLAB 早于 NumPy我觉得它是受到了前者的启发。当你看这个例子时你会看到相似之处。
#### 安装
[fitting_octave.m][37]的例子只需要基本的 Octave 包,在 Fedora 中安装相当简单:
```
`sudo dnf install octave`
```
#### 注释代码
在Octave中你可以用百分比符号(**%**)为代码添加注释,如果不需要与 MATLAB 兼容,你也可以使用 **#**。使用 **#** 的选项允许你从 Python 示例中编写相同的特殊注释行,以便直接在命令行上执行脚本。
#### 必要的库
本例中使用的所有内容都包含在基本包中,因此你不需要加载任何新的库。如果你需要一个库,[语法][38]是 **pkg load module**。该命令将模块的功能添加到可用功能列表中。在这方面Python 具有更大的灵活性。
#### 定义变量
变量的定义与 Python 的语法基本相同:
```
input_file_name = "anscombe.csv";
delimiter = "\t";
skip_header = 3;
column_x = 1;
column_y = 2;
```
请注意,行尾有一个分号;这不是必需的,但是它会抑制行结果的输出。如果没有分号,解释器将打印表达式的结果:
```
octave:1&gt; input_file_name = "anscombe.csv"
input_file_name = anscombe.csv
octave:2&gt; sqrt(2)
ans =  1.4142
```
#### 打印输出结果
强大的功能[**printf()**][39]是用来在终端上打印的。与 Python 不同,**printf()** 函数不会自动在打印字符串的末尾添加换行,因此你必须添加它。第一个参数是一个字符串,可以包含要传递给函数的其他参数的格式信息,例如:
```
`printf("Slope: %f\n", slope);`
```
在 Python 中,格式是内置在字符串本身中的,但是在 Octave 中,它是特定于 **printf()** 函数。
#### 读取数据
[**dlmread()**][40]函数可以读取类似CSV文件的文本内容:
```
`data = dlmread(input_file_name, delimiter, skip_header, 0);`
```
结果是一个[矩阵][41]对象,这是 Octave 中的基本数据类型之一。矩阵可以用类似于 Python 的语法进行切片:
```
x = data(:, column_x);
y = data(:, column_y);
```
根本的区别是索引从1开始而不是从0开始。因此在该示例中__x__列是第一列。
#### 拟合数据
要用直线拟合数据,可以使用[**polyfit()**][42]函数。它用一个多项式拟合输入数据,所以你只需要使用一阶多项式:
```
p = polyfit(x, y, 1);
slope = p(1);
intercept = p(2);
```
结果是具有多项式系数的矩阵;因此,它选择前两个索引。要确定相关系数,请使用[**corr()**][43]函数:
```
`r_value = corr(x, y);`
```
最后,使用 **printf()** 函数打印结果:
```
printf("Slope: %f\n", slope);
printf("Intercept: %f\n", intercept);
printf("Correlation coefficient: %f\n", r_value);
```
#### 绘图
与 Matplotlib 示例一样,首先需要创建一个表示拟合直线的数据集:
```
fit_x = linspace(min(x) - 1, max(x) + 1, 100);
fit_y = slope * fit_x + intercept;
```
与 NumPy 的相似性也很明显,因为它使用了[**linspace()**][44]函数,其行为就像 Python 的等效版本一样。
同样,与 Matplotlib 一样,首先创建一个[图][45]对象,然后创建一个[轴][46]对象来保存这些图:
```
fig_width = 7; %inch
fig_height = fig_width / 16 * 9; %inch
fig_dpi = 100;
fig = figure("units", "inches",
             "position", [1, 1, fig_width, fig_height]);
ax = axes("parent", fig);
set(ax, "fontsize", 14);
set(ax, "linewidth", 2);
```
要设置轴对象的属性,请使用[**set()**][47]函数。然而,该接口相当混乱,因为该函数需要一个逗号分隔的属性和值对列表。这些对只是代表属性名的一个字符串和代表该属性值的第二个对象的连续。还有其他设置各种属性的功能:
```
xlim(ax, [min(x) - 1, max(x) + 1]);
ylim(ax, [min(y) - 1, max(y) + 1]);
xlabel(ax, 'x');
ylabel(ax, 'y');
```
标图是用[**plot()**][48]功能实现的。默认行为是每次调用都会重置坐标轴,因此需要使用函数[**hold()**][49]。
```
hold(ax, "on");
plot(ax, fit_x, fit_y,
     "marker", "none",
     "linestyle", "-",
     "linewidth", 2);
plot(ax, x, y,
     "marker", ".",
     "markersize", 20,
     "linestyle", "none");
hold(ax, "off");
```
此外,还可以在 **plot()** 函数中添加属性和值对。[legend][50]必须单独创建,标签应手动声明:
```
lg = legend(ax, "Fit", "Data");
set(lg, "location", "northwest");
```
最后将输出保存到PNG图像:
```
image_size = sprintf("-S%f,%f", fig_width * fig_dpi, fig_height * fig_dpi);
image_resolution = sprintf("-r%f,%f", fig_dpi);
print(fig, 'fit_octave.png',
      '-dpng',
      image_size,
      image_resolution);
```
令人困惑的是,在这种情况下,选项被作为一个字符串传递,带有属性名和值。因为在 Octave 字符串中没有 Python 的格式化工具,所以必须使用[**sprintf()**][51]函数。它的行为就像**printf()**函数,但是它的结果不是打印出来的,而是作为字符串返回的。
在这个例子中,就像在 Python 中一样,图形对象很明显被引用以保持它们之间的交互。如果说 Python 在这方面的文档有点混乱,那么[Octave 的文档][52]就更糟糕了。我发现的大多数例子都不关心引用对象;相反,它们依赖于绘图命令作用于当前活动图形。全局[根图形对象][53]跟踪现有的图形和轴。
#### 结果
命令行上的结果输出是:
```
#### Anscombe's first set with Octave ####
Slope: 0.500091
Intercept: 3.000091
Correlation coefficient: 0.816421
```
它显示了用 Octave 生成的结果图像。
![Plot and fit of the dataset obtained with Octave][54]
### 下一个
Python 和 GNU Octave 都可以绘制出相同的信息,尽管它们的实现方式不同。如果你想探索其他语言来完成类似的任务,我强烈建议你看看[Rosetta 代码][55]。这是一个了不起的资源,可以看到如何用多种语言解决同样的问题。
你喜欢用什么语言绘制数据?在评论中分享你的想法。
--------------------------------------------------------------------------------
via: https://opensource.com/article/20/2/python-gnu-octave-data-science
作者:[Cristiano L. Fontana][a]
选题:[lujun9972][b]
译者:[heguangzhi](https://github.com/heguangzhi)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/cristianofontana
[b]: https://github.com/lujun9972
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/analytics-graphs-charts.png?itok=sersoqbV (Analytics: Charts and Graphs)
[2]: https://en.wikipedia.org/wiki/Command-line_interface
[3]: https://en.wikipedia.org/wiki/Graphical_user_interface
[4]: https://gitlab.com/cristiano.fontana/polyglot_fit
[5]: https://en.wikipedia.org/wiki/Comma-separated_values
[6]: https://en.wikipedia.org/wiki/Anscombe%27s_quartet
[7]: https://www.python.org/
[8]: https://www.tiobe.com/tiobe-index/
[9]: https://redmonk.com/sogrady/2019/07/18/language-rankings-6-19/
[10]: http://pypl.github.io/PYPL.html
[11]: https://octoverse.github.com/
[12]: https://en.wikipedia.org/wiki/Interpreted_language
[13]: https://docs.python.org/3/library/
[14]: https://numpy.org/
[15]: https://www.scipy.org/
[16]: https://matplotlib.org/
[17]: https://getfedora.org/
[18]: https://en.wikipedia.org/wiki/Comment_(computer_programming)
[19]: https://gitlab.com/cristiano.fontana/polyglot_fit/-/blob/master/fitting_python.py
[20]: https://en.wikipedia.org/wiki/Shebang_(Unix)
[21]: https://docs.python.org/3/library/functions.html#print
[22]: https://docs.python.org/3/library/string.html#string-formatting
[23]: https://docs.python.org/3/library/string.html
[24]: https://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html
[25]: https://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html
[26]: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html
[27]: https://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html
[28]: https://matplotlib.org/api/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure
[29]: https://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes
[30]: https://en.wikipedia.org/wiki/Portable_Network_Graphics
[31]: https://matplotlib.org/tutorials/introductory/pyplot.html#sphx-glr-tutorials-introductory-pyplot-py
[32]: https://matplotlib.org/gallery/index.html
[33]: https://opensource.com/sites/default/files/uploads/fit_python.png (Plot and fit of the dataset obtained with Python)
[34]: https://www.gnu.org/software/octave/
[35]: https://wiki.octave.org/FAQ#Differences_between_Octave_and_Matlab
[36]: https://en.wikipedia.org/wiki/MATLAB
[37]: https://gitlab.com/cristiano.fontana/polyglot_fit/-/blob/master/fitting_octave.m
[38]: https://octave.org/doc/v5.1.0/Using-Packages.html#Using-Packages
[39]: https://octave.org/doc/v5.1.0/Formatted-Output.html#XREFprintf
[40]: https://octave.org/doc/v5.1.0/Simple-File-I_002fO.html#XREFdlmread
[41]: https://octave.org/doc/v5.1.0/Matrices.html
[42]: https://octave.org/doc/v5.1.0/Polynomial-Interpolation.html
[43]: https://octave.org/doc/v5.1.0/Correlation-and-Regression-Analysis.html#XREFcorr
[44]: https://octave.sourceforge.io/octave/function/linspace.html
[45]: https://octave.org/doc/v5.1.0/Multiple-Plot-Windows.html
[46]: https://octave.org/doc/v5.1.0/Graphics-Objects.html#XREFaxes
[47]: https://octave.org/doc/v5.1.0/Graphics-Objects.html#XREFset
[48]: https://octave.org/doc/v5.1.0/Two_002dDimensional-Plots.html#XREFplot
[49]: https://octave.org/doc/v5.1.0/Manipulation-of-Plot-Windows.html#XREFhold
[50]: https://octave.org/doc/v5.1.0/Plot-Annotations.html#XREFlegend
[51]: https://octave.org/doc/v5.1.0/Formatted-Output.html#XREFsprintf
[52]: https://octave.org/doc/v5.1.0/Two_002dDimensional-Plots.html#Two_002dDimensional-Plots
[53]: https://octave.org/doc/v5.1.0/Graphics-Objects.html#XREFgroot
[54]: https://opensource.com/sites/default/files/uploads/fit_octave.png (Plot and fit of the dataset obtained with Octave)
[55]: http://www.rosettacode.org/