Merge remote-tracking branch 'LCTT/master'

This commit is contained in:
Xingyu.Wang 2018-10-16 16:21:12 +08:00
commit 5adb0a00e5
19 changed files with 2305 additions and 591 deletions

View File

@ -3,7 +3,7 @@
![](https://www.ostechnix.com/wp-content/uploads/2016/11/kvm-720x340.jpg)
我们已经讲解了 [在 Ubuntu 18.04 上配置 Oracle VirtualBox][1] 无头服务器。在本教程中,我们将讨论如何使用 **KVM** 去配置无头虚拟化服务器以及如何从一个远程客户端去管理访客系统。正如你所知道的KVM**K** ernel-based **v** irtual **m** achine是开源的 Linux 的全虚拟化。使用 KVM我们可以在几分钟之内很轻松地将任意 Linux 服务器转换到一个完全的虚拟化环境中,以及部署不同种类的虚拟机,比如 GNU/Linux、*BSD、Windows 等等。
我们已经讲解了 [在 Ubuntu 18.04 无头服务器上配置 Oracle VirtualBox][1] 。在本教程中,我们将讨论如何使用 **KVM** 去配置无头虚拟化服务器以及如何从一个远程客户端去管理访客系统。正如你所知道的KVM**K**ernel-based **v**irtual **m**achine是开源的是 Linux 的全虚拟化。使用 KVM我们可以在几分钟之内很轻松地将任意 Linux 服务器转换到一个完全的虚拟化环境中,以及部署不同种类的虚拟机,比如 GNU/Linux、*BSD、Windows 等等。
### 使用 KVM 配置无头虚拟化服务器
@ -13,86 +13,81 @@
**KVM 虚拟化服务器:**
* **宿主机操作系统** 最小化安装的 Ubuntu 18.04 LTS没有 GUI
* **宿主机操作系统的 IP 地址**192.168.225.22/24
* **访客操作系统**(它将运行在 Ubuntu 18.04 的宿主机上Ubuntu 16.04 LTS server
* **宿主机操作系统** 最小化安装的 Ubuntu 18.04 LTS没有 GUI
* **宿主机操作系统的 IP 地址**192.168.225.22/24
* **访客操作系统**(它将运行在 Ubuntu 18.04 的宿主机上Ubuntu 16.04 LTS server
**远程桌面客户端:**
* **操作系统** Arch Linux
* **操作系统** Arch Linux
### 安装 KVM
首先,我们先检查一下我们的系统是否支持硬件虚拟化。为此,需要在终端中运行如下的命令:
```
$ egrep -c '(vmx|svm)' /proc/cpuinfo
```
假如结果是 **zero (0)**,说明系统不支持硬件虚拟化,或者在 BIOS 中禁用了虚拟化。进入你的系统 BIOS 并检查虚拟化选项,然后启用它。
假如结果是 `zero (0)`,说明系统不支持硬件虚拟化,或者在 BIOS 中禁用了虚拟化。进入你的系统 BIOS 并检查虚拟化选项,然后启用它。
假如结果是 **1** 或者 **更大的数**,说明系统将支持硬件虚拟化。然而,在你运行上面的命令之前,你需要始终保持 BIOS 中的虚拟化选项是启用的。
假如结果是 `1` 或者 **更大的数**,说明系统将支持硬件虚拟化。然而,在你运行上面的命令之前,你需要始终保持 BIOS 中的虚拟化选项是启用的。
或者,你也可以使用如下的命令去验证它。但是为了使用这个命令你需要先安装 KVM。
```
$ kvm-ok
```
**示例输出:**
示例输出:
```
INFO: /dev/kvm exists
KVM acceleration can be used
```
如果输出的是如下这样的错误,你仍然可以在 KVM 中运行访客虚拟机,但是它的性能将非常差。
```
INFO: Your CPU does not support KVM extensions
INFO: For more detailed results, you should run this as root
HINT: sudo /usr/sbin/kvm-ok
```
当然,还有其它的方法来检查你的 CPU 是否支持虚拟化。更多信息参考接下来的指南。
- [如何知道 CPU 是否支持虚拟技术VT](https://www.ostechnix.com/how-to-find-if-a-cpu-supports-virtualization-technology-vt/)
接下来,安装 KVM 和在 Linux 中配置虚拟化环境所需要的其它包。
在 Ubuntu 和其它基于 DEB 的系统上,运行如下命令:
```
$ sudo apt-get install qemu-kvm libvirt-bin virtinst bridge-utils cpu-checker
```
KVM 安装完成后,启动 libvertd 服务(如果它没有启动的话):
```
$ sudo systemctl enable libvirtd
$ sudo systemctl start libvirtd
```
### 创建虚拟机
所有的虚拟机文件和其它的相关文件都保存在 **/var/lib/libvirt/** 下。ISO 镜像的默认路径是 **/var/lib/libvirt/boot/**
所有的虚拟机文件和其它的相关文件都保存在 `/var/lib/libvirt/` 下。ISO 镜像的默认路径是 `/var/lib/libvirt/boot/`
首先,我们先检查一下是否有虚拟机。查看可用的虚拟机列表,运行如下的命令:
```
$ sudo virsh list --all
```
**示例输出:**
示例输出:
```
Id Name State
----------------------------------------------------
```
![][3]
@ -102,14 +97,14 @@ Id Name State
现在,我们来创建一个。
例如,我们来创建一个有 512 MB 内存、1 个 CPU 核心、8 GB 硬盘的 Ubuntu 16.04 虚拟机。
```
$ sudo virt-install --name Ubuntu-16.04 --ram=512 --vcpus=1 --cpu host --hvm --disk path=/var/lib/libvirt/images/ubuntu-16.04-vm1,size=8 --cdrom /var/lib/libvirt/boot/ubuntu-16.04-server-amd64.iso --graphics vnc
```
请确保在路径 **/var/lib/libvirt/boot/** 中有一个 Ubuntu 16.04 的 ISO 镜像文件,或者在上面命令中给定的其它路径中有相应的镜像文件。
请确保在路径 `/var/lib/libvirt/boot/` 中有一个 Ubuntu 16.04 的 ISO 镜像文件,或者在上面命令中给定的其它路径中有相应的镜像文件。
**示例输出:**
示例输出:
```
WARNING Graphics requested but DISPLAY is not set. Not running virt-viewer.
@ -121,37 +116,38 @@ Domain installation still in progress. Waiting for installation to complete.
Domain has shutdown. Continuing.
Domain creation completed.
Restarting guest.
```
![][4]
我们来分别讲解以上的命令和看到的每个选项的作用。
* **name** : 这个选项定义虚拟机名字。在我们的案例中,这个虚拟机的名字是 **Ubuntu-16.04**
* **ram=512** : 给虚拟机分配 512MB 内存。
* **vcpus=1** : 指明虚拟机中 CPU 核心的数量。
* **cpu host** : 通过暴露宿主机 CPU 的配置给访客系统来优化 CPU 属性。
* **hvm** : 要求完整的硬件虚拟化。
* **disk path** : 虚拟机硬盘的位置和大小。在我们的示例中,我分配了 8GB 的硬盘。
* **cdrom** : 安装 ISO 镜像的位置。请注意你必须在这个位置真的有一个 ISO 镜像。
* **graphics vnc** : 允许 VNC 从远程客户端访问虚拟机。
* `name`:这个选项定义虚拟机名字。在我们的案例中,这个虚拟机的名字是 `Ubuntu-16.04`
* `ram=512`:给虚拟机分配 512MB 内存。
* `vcpus=1`:指明虚拟机中 CPU 核心的数量。
* `cpu host`:通过暴露宿主机 CPU 的配置给访客系统来优化 CPU 属性。
* `hvm`:要求完整的硬件虚拟化。
* `disk path`:虚拟机硬盘的位置和大小。在我们的示例中,我分配了 8GB 的硬盘。
* `cdrom`:安装 ISO 镜像的位置。请注意你必须在这个位置真的有一个 ISO 镜像。
* `graphics vnc`:允许 VNC 从远程客户端访问虚拟机。
### 使用 VNC 客户端访问虚拟机
现在,我们在远程桌面系统上使用 SSH 登入到 Ubuntu 服务器上(虚拟化服务器),如下所示。
在这里,**sk** 是我的 Ubuntu 服务器的用户名,而 **192.168.225.22** 是它的 IP 地址。
```
$ ssh sk@192.168.225.22
```
在这里,`sk` 是我的 Ubuntu 服务器的用户名,而 `192.168.225.22` 是它的 IP 地址。
运行如下的命令找出 VNC 的端口号。我们从一个远程系统上访问虚拟机需要它。
```
$ sudo virsh dumpxml Ubuntu-16.04 | grep vnc
```
**示例输出:**
示例输出:
```
<graphics type='vnc' port='5900' autoport='yes' listen='127.0.0.1'>
@ -160,10 +156,10 @@ $ sudo virsh dumpxml Ubuntu-16.04 | grep vnc
![][5]
记下那个端口号 **5900**。安装任意的 VNC 客户端应用程序。在本指南中,我们将使用 TigerVnc。TigerVNC 是 Arch Linux 默认仓库中可用的客户端。在 Arch 上安装它,运行如下命令:
记下那个端口号 `5900`。安装任意的 VNC 客户端应用程序。在本指南中,我们将使用 TigerVnc。TigerVNC 是 Arch Linux 默认仓库中可用的客户端。在 Arch 上安装它,运行如下命令:
```
$ sudo pacman -S tigervnc
```
在安装有 VNC 客户端的远程客户端系统上输入如下的 SSH 端口转发命令。
@ -172,11 +168,11 @@ $ sudo pacman -S tigervnc
$ ssh sk@192.168.225.22 -L 5900:127.0.0.1:5900
```
再强调一次,**192.168.225.22** 是我的 Ubuntu 服务器(虚拟化服务器)的 IP 地址。
再强调一次,`192.168.225.22` 是我的 Ubuntu 服务器(虚拟化服务器)的 IP 地址。
然后,从你的 Arch Linux客户端打开 VNC 客户端。
在 VNC 服务器框中输入 **localhost:5900**,然后点击 **Connect** 按钮。
在 VNC 服务器框中输入 `localhost:5900`,然后点击 “Connect” 按钮。
![][6]
@ -188,43 +184,42 @@ $ ssh sk@192.168.225.22 -L 5900:127.0.0.1:5900
同样的,你可以根据你的服务器的硬件情况配置多个虚拟机。
或者,你可以使用 **virt-viewer** 实用程序在访客机器中安装操作系统。virt-viewer 在大多数 Linux 发行版的默认仓库中都可以找到。安装完 virt-viewer 之后,运行下列的命令去建立到虚拟机的访问连接。
或者,你可以使用 `virt-viewer` 实用程序在访客机器中安装操作系统。`virt-viewer` 在大多数 Linux 发行版的默认仓库中都可以找到。安装完 `virt-viewer` 之后,运行下列的命令去建立到虚拟机的访问连接。
```
$ sudo virt-viewer --connect=qemu+ssh://192.168.225.22/system --name Ubuntu-16.04
```
### 管理虚拟机
使用管理用户接口 virsh 从命令行去管理虚拟机是非常有趣的。命令非常容易记。我们来看一些例子。
使用管理用户接口 `virsh` 从命令行去管理虚拟机是非常有趣的。命令非常容易记。我们来看一些例子。
查看运行的虚拟机,运行如下命令:
```
$ sudo virsh list
```
或者,
```
$ sudo virsh list --all
```
**示例输出:**
示例输出:
```
Id Name State
----------------------------------------------------
2 Ubuntu-16.04 running
```
![][9]
启动一个虚拟机,运行如下命令:
```
$ sudo virsh start Ubuntu-16.04
```
或者,也可以使用虚拟机 id 去启动它。
@ -232,94 +227,85 @@ $ sudo virsh start Ubuntu-16.04
![][10]
正如在上面的截图所看到的Ubuntu 16.04 虚拟机的 Id 是 2。因此启动它时你也可以像下面一样只指定它的 ID。
```
$ sudo virsh start 2
```
重启动一个虚拟机,运行如下命令:
```
$ sudo virsh reboot Ubuntu-16.04
```
**示例输出:**
示例输出:
```
Domain Ubuntu-16.04 is being rebooted
```
![][11]
暂停一个运行中的虚拟机,运行如下命令:
```
$ sudo virsh suspend Ubuntu-16.04
```
**示例输出:**
示例输出:
```
Domain Ubuntu-16.04 suspended
```
让一个暂停的虚拟机重新运行,运行如下命令:
```
$ sudo virsh resume Ubuntu-16.04
```
**示例输出:**
示例输出:
```
Domain Ubuntu-16.04 resumed
```
关闭一个虚拟机,运行如下命令:
```
$ sudo virsh shutdown Ubuntu-16.04
```
**示例输出:**
示例输出:
```
Domain Ubuntu-16.04 is being shutdown
```
完全移除一个虚拟机,运行如下的命令:
```
$ sudo virsh undefine Ubuntu-16.04
$ sudo virsh destroy Ubuntu-16.04
```
**示例输出:**
示例输出:
```
Domain Ubuntu-16.04 destroyed
```
![][12]
关于它的更多选项,建议你去查看 man 手册页:
```
$ man virsh
```
今天就到这里吧。开始在你的新的虚拟化环境中玩吧。对于研究和开发者、以及测试目的KVM 虚拟化将是很好的选择,但它能做的远不止这些。如果你有充足的硬件资源,你可以将它用于大型的生产环境中。如果你还有其它好玩的发现,不要忘记在下面的评论区留下你的高见。
谢谢!
--------------------------------------------------------------------------------
via: https://www.ostechnix.com/setup-headless-virtualization-server-using-kvm-ubuntu/
@ -327,7 +313,7 @@ via: https://www.ostechnix.com/setup-headless-virtualization-server-using-kvm-ub
作者:[SK][a]
选题:[lujun9972](https://github.com/lujun9972)
译者:[qhwdw](https://github.com/qhwdw)
校对:[校对者ID](https://github.com/校对者ID)
校对:[wxy](https://github.com/wxy)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出

View File

@ -128,7 +128,7 @@ CPU 任务优先级或类型:
* 蓝色:低优先级
* 绿色:正常优先级
* 红色:内核任务
* 蓝色:虚拟任务
* 蓝绿色:虚拟任务
* 条状图末尾的值是已用 CPU 的百分比
内存:

View File

@ -3,33 +3,38 @@
![](https://www.ostechnix.com/wp-content/uploads/2017/09/Lock-The-Keyboard-And-Mouse-720x340.jpg)
我四岁的侄女是个好奇的孩子,她非常喜爱“阿凡达”电影,当阿凡达电影在播放时,她是如此的专注,好似眼睛粘在了屏幕上。但问题是当她观看电影时,她经常会碰到键盘上的某个键或者移动了鼠标,又或者是点击了鼠标的按钮。有时她非常意外地按了键盘上的某个键,从而将电影关闭或者暂停了。所以我就想找个方法来将键盘和鼠标都锁住,但屏幕不会被锁住。幸运的是,我在 Ubuntu 论坛上找到了一个完美的解决方法。假如在你正看着屏幕上的某些重要的事情时,你不想让你的小猫或者小狗在你的键盘上行走,或者让你的孩子在键盘上瞎搞一气,那我建议你试试 **xtrlock** 这个工具。它很简单但非常实用,你可以锁定屏幕的显示直到用户在键盘上输入自己设定的密码(译注:就是用户自己的密码,例如用来打开屏保的那个密码,不需要单独设定)。在这篇简单的教程中,我将为你展示如何在 Linux 下锁住键盘和鼠标,而不锁掉屏幕。这个技巧几乎可以在所有的 Linux 操作系统中生效。
我四岁的侄女是个好奇的孩子,她非常喜爱“阿凡达”电影,当阿凡达电影在播放时,她是如此的专注,好似眼睛粘在了屏幕上。但问题是当她观看电影时,她经常会碰到键盘上的某个键或者移动了鼠标,又或者是点击了鼠标的按钮。有时她非常意外地按了键盘上的某个键,从而将电影关闭或者暂停了。所以我就想找个方法来将键盘和鼠标都锁住,但屏幕不会被锁住。幸运的是,我在 Ubuntu 论坛上找到了一个完美的解决方法。假如在你正看着屏幕上的某些重要的事情时,你不想让你的小猫或者小狗在你的键盘上行走,或者让你的孩子在键盘上瞎搞一气,那我建议你试试 **xtrlock** 这个工具。它很简单但非常实用,你可以锁定屏幕的显示直到用户在键盘上输入自己设定的密码(LCTT 译注:就是用户自己的密码,例如用来打开屏保的那个密码,不需要单独设定)。在这篇简单的教程中,我将为你展示如何在 Linux 下锁住键盘和鼠标,而不锁掉屏幕。这个技巧几乎可以在所有的 Linux 操作系统中生效。
### 安装 xtrlock
xtrlock 软件包在大多数 Linux 操作系统的默认软件仓库中都可以获取到。所以你可以使用你安装的发行版的包管理器来安装它。
**Arch Linux** 及其衍生发行版中,运行下面的命令来安装它:
```
$ sudo pacman -S xtrlock
```
**Fedora** 上使用:
```
$ sudo dnf install xtrlock
```
**RHEL, CentOS** 上使用:
**RHEL、CentOS** 上使用:
```
$ sudo yum install xtrlock
```
**SUSE/openSUSE** 上使用:
```
$ sudo zypper install xtrlock
```
**Debian, Ubuntu, Linux Mint** 上使用:
**Debian、Ubuntu、Linux Mint** 上使用:
```
$ sudo apt-get install xtrlock
```
@ -38,41 +43,50 @@ $ sudo apt-get install xtrlock
安装好 xtrlock 后,你需要根据你的选择来创建一个快捷键,通过这个快捷键来锁住键盘和鼠标。
**/usr/local/bin** 目录下创建一个名为 **lockkbmouse** 的新文件:
LCTT 译注译者在自己的系统Arch + Deepin中发现这里的到下面创建快捷键的部分可以不必做依然生效。
`/usr/local/bin` 目录下创建一个名为 `lockkbmouse` 的新文件:
```
$ sudo vi /usr/local/bin/lockkbmouse
```
然后将下面的命令添加到这个文件中:
```
#!/bin/bash
sleep 1 && xtrlock
```
保存并关闭这个文件。
然后使用下面的命令来使得它可以被执行:
```
$ sudo chmod a+x /usr/local/bin/lockkbmouse
```
接着,我们就需要创建快捷键了。
#### 创建快捷键
**在 Arch Linux MATE 桌面中**
依次点击 **System -> Preferences -> Hardware -> keyboard Shortcuts**
依次点击 “System -> Preferences -> Hardware -> keyboard Shortcuts”
然后点击 **Add** 来创建快捷键。
然后点击 “Add” 来创建快捷键。
![][2]
首先键入你的这个快捷键的名称,然后将下面的命令填入命令框中,最后点击 **Apply** 按钮。
首先键入你的这个快捷键的名称,然后将下面的命令填入命令框中,最后点击 “Apply” 按钮。
```
bash -c "sleep 1 && xtrlock"
```
![][3]
为了能够给这个快捷键赋予快捷方式,需要选中它或者双击它然后输入你选定的快捷键组合,例如我使用 **Alt+k** 这组快捷键。
为了能够给这个快捷键赋予快捷方式,需要选中它或者双击它然后输入你选定的快捷键组合,例如我使用 `Alt+k` 这组快捷键。
![][4]
@ -80,16 +94,17 @@ bash -c "sleep 1 && xtrlock"
**在 Ubuntu GNOME 桌面中**
依次进入 **System Settings -> Devices -> Keyboard**,然后点击 **+** 这个符号。
依次进入 “System Settings -> Devices -> Keyboard”然后点击 “+” 这个符号。
键入你快捷键的名称并将下面的命令加到命令框里面,然后点击 “Add” 按钮。
键入你快捷键的名称并将下面的命令加到命令框里面,然后点击 **Add** 按钮。
```
bash -c "sleep 1 && xtrlock"
```
![][5]
接下来为这个新建的快捷键赋予快捷方式。我们只需要选择或者双击 **“Set shortcut”** 这个按钮就可以了。
接下来为这个新建的快捷键赋予快捷方式。我们只需要选择或者双击 “Set shortcut” 这个按钮就可以了。
![][6]
@ -97,7 +112,7 @@ bash -c "sleep 1 && xtrlock"
![][7]
输入你选定的快捷键组合,例如我使用 **Alt+k**
输入你选定的快捷键组合,例如我使用 `Alt+k`
![][8]
@ -113,23 +128,26 @@ bash -c "sleep 1 && xtrlock"
### 将键盘和鼠标解锁
要将键盘和鼠标解锁,只需要输入你的密码然后敲击“Enter”键就可以了在输入的过程中你将看不到密码。只需要输入然后敲 `ENTER` 键就可以了。在你输入了正确的密码后,鼠标和键盘就可以再工作了。假如你输入了一个错误的密码,你将听到警告声。按 **ESC** 来清除输入的错误密码,然后重新输入正确的密码。要去掉未完全输入完的密码中的一个字符,只需要按 **BACKSPACE** 或者 **DELETE** 键就可以了。
要将键盘和鼠标解锁,只需要输入你的密码然后敲击回车键就可以了,在输入的过程中你将看不到密码。只需要输入然后敲回车键就可以了。在你输入了正确的密码后,鼠标和键盘就可以再工作了。假如你输入了一个错误的密码,你将听到警告声。按 `ESC` 来清除输入的错误密码,然后重新输入正确的密码。要去掉未完全输入完的密码中的一个字符,只需要按 `BACKSPACE` 或者 `DELETE` 键就可以了。
### 要是我被永久地锁住了怎么办?
以防你被永久地锁定了屏幕,切换至一个 TTY例如 CTRL+ALT+F2然后运行
以防你被永久地锁定了屏幕,切换至一个 TTY例如 `CTRL+ALT+F2`)然后运行:
```
$ sudo killall xtrlock
```
或者你还可以使用 **chvt** 命令来在 TTY 和 X 会话之间切换。
或者你还可以使用 `chvt` 命令来在 TTY 和 X 会话之间切换。
例如,如果要切换到 TTY1则运行
```
$ sudo chvt 1
```
要切换回 X 会话,则键入:
```
$ sudo chvt 7
```
@ -137,6 +155,7 @@ $ sudo chvt 7
不同的发行版使用了不同的快捷键组合来在不同的 TTY 间切换。请参考你安装的对应发行版的官方网站了解更多详情。
如果想知道更多 xtrlock 的信息,请参考 man 页:
```
$ man xtrlock
```
@ -145,7 +164,7 @@ $ man xtrlock
**资源:**
* [**Ubuntu 论坛**][10]
* [**Ubuntu 论坛**][10]
--------------------------------------------------------------------------------
@ -154,7 +173,7 @@ via: https://www.ostechnix.com/lock-keyboard-mouse-not-screen-linux/
作者:[SK][a]
选题:[lujun9972](https://github.com/lujun9972)
译者:[FSSlc](https://github.com/FSSlc)
校对:[校对者ID](https://github.com/校对者ID)
校对:[wxy](https://github.com/wxy)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
@ -167,5 +186,5 @@ via: https://www.ostechnix.com/lock-keyboard-mouse-not-screen-linux/
[6]:http://www.ostechnix.com/wp-content/uploads/2018/01/set-shortcut-key-1.png
[7]:http://www.ostechnix.com/wp-content/uploads/2018/01/set-shortcut-key-2.png
[8]:http://www.ostechnix.com/wp-content/uploads/2018/01/set-shortcut-key-3.png
[9]:http://www.ostechnix.com/wp-content/uploads/2018/01/xtrlock-1.png
[9]:http://www.ostechnix.com/wp-content/uploads/2018/01/xtrclock-1.png
[10]:https://ubuntuforums.org/showthread.php?t=993800

View File

@ -1,7 +1,7 @@
Linux 拥有了新的行为准则,但是许多人都对此表示不满
=====
**Linux 内核有了新的<ruby>行为准则<rt>Code of Conduct</rt></ruby>(CoC)。但在这条行为准则被签署以及发布仅仅 30 分钟之后Linus Torvalds 就暂时离开了 Linux 内核的开发工作。因为新行为准则的作者那富有争议的过去,现在这件事成为了热点话题。许多人都对这新的行为准则表示不满。**
> Linux 内核有了新的<ruby>行为准则<rt>Code of Conduct</rt></ruby>(CoC)。但在这条行为准则被签署以及发布仅仅 30 分钟之后Linus Torvalds 就暂时离开了 Linux 内核的开发工作。因为新行为准则的作者那富有争议的过去,现在这件事成为了热点话题。许多人都对这新的行为准则表示不满。
如果你还不了解这件事,请参阅 [Linus Torvalds 对于自己之前的不良态度致歉并开始休假,以改善自己的行为态度][1]
@ -9,17 +9,15 @@ Linux 拥有了新的行为准则,但是许多人都对此表示不满
Linux 内核开发者并不是以前没有需要遵守的行为准则,但是之前的[<ruby>冲突准则<rt>code of conflict</rt></ruby>][2]现在被替换成了以“给内核开发社区营造更加热情,更方便他人参与的氛围”为目的的行为准则。
>“为营造一个开放并且热情的社区环境,我们,贡献者与维护者,许诺让每一个参与进我们项目和社区的人享受一个没有骚扰的体验。无关于他们的年纪,体型,身体残疾,种族,性别,性别认知与表达,社会经验,教育水平,社会或者经济地位,国籍,外表,人种,信仰,性认同和性取向。
> “为营造一个开放并且热情的社区环境,我们,贡献者与维护者,许诺让每一个参与进我们项目和社区的人享受一个没有骚扰的体验。无关于他们的年纪、体型、身体残疾、种族、性别、性别认知与表达、社会经验、教育水平、社会或者经济地位、国籍、外表、人种、信仰、性认同和性取向。
你可以在这里阅读整篇行为准则
[Linux 行为准则][33]
你可以在这里阅读整篇行为准则:[Linux 行为准则][33]。
### Linus Torvalds 是被迫道歉并且休假的吗?
![Linus Torvalds 的道歉][3]
这个新的行为准则由 Linus Torvalds 和 Greg Kroah-Hartman (仅次于 Torvalds 的二把手)签发。来自 Intel 的 Dan Williams 和来自 Facebook 的 Chris Mason 也是该准则的签署者。
这个新的行为准则由 Linus Torvalds 和 Greg Kroah-Hartman (仅次于 Torvalds 的二把手)签发。来自 Intel 的 Dan Williams 和来自 Facebook 的 Chris Mason 也是该准则的签署者之一
如果我正确地解读了时间线在签署这个行为准则的半小时之后Torvalds [发送了一封邮件,对自己之前的不良态度致歉][4]。他同时宣布会进行休假,以改善自己的行为态度。
@ -31,18 +29,19 @@ Linux 内核开发者并不是以前没有需要遵守的行为准则,但是
### 有关贡献者盟约作者 Coraline Ada Ehmke 的争议
Linux 的行为准则基于[<ruby>贡献者盟约<rt>Contributor Convenant</rt></ruby>1.4 版本][5]。贡献者盟约[被上百个开源项目所接纳][6],包括 Eclipse, Angular, Ruby, Kubernetes等项目。
Linux 的行为准则基于[<ruby>贡献者盟约<rt>Contributor Convenant</rt></ruby>1.4 版本][5]。贡献者盟约[被上百个开源项目所接纳][6],包括 Eclipse、Angular、Ruby、Kubernetes 等项目。
贡献者盟约由 [Coraline Ada Ehmke][7] 创作,她是一个软件工程师,开源支持者,以及 [LGBT][8] 活动家。她对于促进开源世界的多样性做了显著的贡献。
Coraline 对于唯才是用的反对立场同样十分鲜明。[<ruby>唯才是用<rt>meritocracy</rt></ruby>][9]这个词语源自拉丁文,本意为个人在系统内部的进步取决于他的“功绩”,例如智力水平,取得的证书以及教育程度。但[类似 Coraline 的活动家们认为][10]唯才是用是个糟糕的体系,因为他们只是通过人的智力产出来度量一个人,而并不重视他们的人性。
Coraline 对于精英主义的反对立场同样十分鲜明。[<ruby>精英主义<rt>meritocracy</rt></ruby>][9]这个词语源自拉丁文,本意为系统内的进步取决于“精英”,例如智力水平、取得的证书以及教育程度。但[类似 Coraline 的活动家们认为][10]唯才是用是个糟糕的体系,因为只是通过人的智力产出来度量一个人,而并不重视他们的人性。
[![croraline meritocracy][11]][12]
图片来源:推特用户@nickmon1112
*图片来源:推特用户@nickmon1112*
[Linus Torvalds 不止一次地说到,他在意的只是代码而并非写代码的人][13]。所以很明显,这忤逆了 Coraline 有关唯才是用体系的观点。
具体来说Coraline 那被人关注饱受争议的过去,是一个关于 [Opal 项目][14]的事件。那是一个发生[在推特上的讨论][15]Elia来自意大利的 Opal 项目核心开发者说“(那些变性人)不接受现实才是问题所在。”
具体来说Coraline 那被人关注饱受争议的过去,是一个关于 [Opal 项目][14]贡献者的事件。那是一个发生[在推特上的讨论][15]Elia来自意大利的 Opal 项目核心开发者说“(那些变性人)不接受现实才是问题所在。”
Coraline 并没有参加讨论,也不是 Opal 项目的贡献者。不过作为 LGBT 活动家,她以 Elia 发表“冒犯变性人群体的发言”为由,[要求他退出 Opal 项目][16]。 Coraline 和她的支持者——他们给这个项目做过贡献,通过在 GitHub 仓库平台上冗长且激烈的争论,试图将 Elia——此项目的核心开发者移出项目。
@ -50,11 +49,11 @@ Coraline 并没有参加讨论,也不是 Opal 项目的贡献者。不过作
不过故事到这里并没有结束。贡献者盟约稍后被更改,[加入了一些针对 Elia 的新条款][17]。这些新条款将行为准则的管束范围扩展到公共领域。不过这些更改稍后[被维护者们标记为恶意篡改][18]。最后 Opal 项目摆脱了贡献者盟约,并用自己的行为准则取而代之。
这个例子非常好的说明了,某些被冒犯的少数人群——他们并没有给这个项目哪怕一点贡献,是怎样试图去驱逐这个项目的核心开发者的。
这个例子非常好的说明了,某些被冒犯的少数人群——哪怕他们并没有给这个项目做过一点贡献,是怎样试图去驱逐这个项目的核心开发者的。
### 人们对于 Linux 新的行为准则的以及 Torvalds 道歉的反映。
Linux 行为准则以及 Torvalds 的道歉一发布,社交媒体与论坛上就开始盛传种种谣言与[推测][19]。虽然很多人对新的行为准则感到满意,但仍有些人认为这是 [SJW 尝试渗透 Linux 社区][20]的阴谋。
Linux 行为准则以及 Torvalds 的道歉一发布,社交媒体与论坛上就开始盛传种种谣言与[推测][19]。虽然很多人对新的行为准则感到满意,但仍有些人认为这是 [SJW 尝试渗透 Linux 社区][20]的阴谋。LCTT 译注SJW——Social Justice Warrior 所谓“为社会正义而战的人”。)
Caroline 发布的一个富有嘲讽意味的推特让争论愈发激烈。
@ -81,7 +80,7 @@ Nick Monroe一位自由记者宣称 Linux 行为准则远没有表面上
Nick 并不是唯一一个反对 Linux 新的行为准则的人。[SJW][26] 的参与引发了更多的阴谋论猜测。
>我猜今天关于 Linux 的大新闻就是现在Linux 内核被一个post meritocracy世界观下的行为准则给掌控了。
>我猜今天关于 Linux 的大新闻就是现在Linux 内核被一个<ruby>后精英政治<rt>post meritocracy</rt></ruby>世界观下的行为准则给掌控了。
>
>这个行为准则的宗旨看起来不错。不过在实际操作中,它们通常被当作 SJW 分子攻击他们不喜之人的工具。况且,很多人都被 SJW 分子所厌恶。
>
@ -107,31 +106,33 @@ Torvalds 的道歉引起了广泛关注 ;)
>
>— Verónica. (@maria_fibonacci) [9 月 17 日, 2018][30]
不继续开玩笑了。有关 Linus 道歉的关注是由 Sharp 挑起的。他因为“恶劣的社区环境”于 2015 年退出了 Linux 内核的开发。
不继续开玩笑了。有关 Linus 道歉的关注是由 Sharp 挑起的。她因为“恶劣的社区环境”于 2015 年[退出了 Linux 内核的开发][31]。LCTT 译注Sarah Sharp 现在改名为“Sage Sharp”并要求别人称其为“them”而不是“she”或“he”。
>现在我们要面对的问题是,这个成就 Linus给予他肆意辱骂特权的社区能否迎来改变。不仅仅是 Linus 个人Linux 内核开发社区也急需改变。<https://t.co/EG5KO43416>
>现在我们要面对的问题是,这个成就 Linus给予他肆意辱骂特权的社区能否迎来改变。不仅仅是 Linus 个人Linux 内核开发社区也急需改变。<https://t.co/EG5KO43416>
>
>— Sage Sharp (@sagesharp) 9 月 17 日, 2018
>— Sage Sharp (@sagesharp) [9 月 17 日, 2018][32]
### 你对于 Linux 行为准则怎么看?
如果你问我的观点,我认为目前社区的确是需要一个行为准则。它能指导人们尊重他人,不因为他人的种族,宗教信仰,国籍,政治观点(左派或者右派)而歧视,营造出一个积极向上的社区氛围。
如果你问我的观点,我认为目前社区的确是需要一个行为准则。它能指导人们尊重他人,不因为他人的种族、宗教信仰、国籍、政治观点(左派或者右派)而歧视,营造出一个积极向上的社区氛围。
对于这个事件,你怎么看?你认为这个行为准则能够帮助 Linux 内核的开发,或者说因为 SJW 成员们的加入,情况会变得更糟?
在 FOSS 里我们没有行为准则,不过我们都会持着文明友好的态度讨论问题。
-------
via: https://itsfoss.com/linux-code-of-conduct/
作者:[Abhishek Prakash][a]
选题:[lujun9972](https://github.com/lujun9972)
译者:[thecyanbird](https://github.com/thecyanbird)
校对:[校对者ID](https://github.com/校对者ID)
校对:[wxy](https://github.com/wxy)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://itsfoss.com/author/abhishek/
[1]: https://itsfoss.com/torvalds-takes-a-break-from-linux/
[1]: https://linux.cn/article-10022-1.html
[2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/CodeOfConflict?id=ddbd2b7ad99a418c60397901a0f3c997d030c65e
[3]: https://4bds6hergc-flywheel.netdna-ssl.com/wp-content/uploads/2018/09/linus-torvalds-apologizes.jpeg
[4]: https://lkml.org/lkml/2018/9/16/167

View File

@ -1,34 +1,35 @@
如何在救援(单用户模式)/紧急模式下启动 Ubuntu 18.04/Debian 9 服务器
======
将 Linux 服务器引导到单用户模式或**救援模式**是 Linux 管理员在关键时刻恢复服务器时通常使用的重要故障排除方法之一。在 Ubuntu 18.04 和 Debian 9 中,单用户模式被称为救援模式。
除了救援模式外Linux 服务器可以在**紧急模式**下启动,它们之间的主要区别在于,紧急模式加载了带有只读根文件系统文件系统的最小环境,也没有启用任何网络或其他服务。但救援模式尝试挂载所有本地文件系统并尝试启动一些重要的服务,包括网络。
将 Linux 服务器引导到单用户模式或<ruby>救援模式<rt>rescue mode</rt></ruby>是 Linux 管理员在关键时刻恢复服务器时通常使用的重要故障排除方法之一。在 Ubuntu 18.04 和 Debian 9 中,单用户模式被称为救援模式。
除了救援模式外Linux 服务器可以在<ruby>紧急模式<rt>emergency mode</rt></ruby>下启动,它们之间的主要区别在于,紧急模式加载了带有只读根文件系统文件系统的最小环境,没有启用任何网络或其他服务。但救援模式尝试挂载所有本地文件系统并尝试启动一些重要的服务,包括网络。
在本文中,我们将讨论如何在救援模式和紧急模式下启动 Ubuntu 18.04 LTS/Debian 9 服务器。
#### 在单用户/救援模式下启动 Ubuntu 18.04 LTS 服务器:
重启服务器并进入启动加载程序 Grub 屏幕并选择 “**Ubuntu**”,启动加载器页面如下所示,
重启服务器并进入启动加载程序 Grub 屏幕并选择 “Ubuntu”启动加载器页面如下所示
![](https://www.linuxtechi.com/wp-content/uploads/2018/09/Bootloader-Screen-Ubuntu18-04-Server.jpg)
按下 “**e**”,然后移动到以 “**linux**” 开头的行尾,并添加 “**systemd.unit=rescue.target**”。如果存在单词 “**$vt_handoff**” 就删除它。
按下 `e`,然后移动到以 `linux` 开头的行尾,并添加 `systemd.unit=rescue.target`。如果存在单词 `$vt_handoff` 就删除它。
![](https://www.linuxtechi.com/wp-content/uploads/2018/09/rescue-target-ubuntu18-04.jpg)
现在按 Ctrl-x 或 F10 启动,
现在按 `Ctrl-x``F10` 启动,
![](https://www.linuxtechi.com/wp-content/uploads/2018/09/rescue-mode-ubuntu18-04.jpg)
现在按回车键,然后你将得到所有文件系统都以读写模式挂载的 shell 并进行故障排除。完成故障排除后,可以使用 “**reboot**” 命令重新启动服务器。
现在按回车键,然后你将得到所有文件系统都以读写模式挂载的 shell 并进行故障排除。完成故障排除后,可以使用 `reboot` 命令重新启动服务器。
#### 在紧急模式下启动 Ubuntu 18.04 LTS 服务器
重启服务器并进入启动加载程序页面并选择 “**Ubuntu**”,然后按 “**e**” 并移动到以 linux 开头的行尾,并添加 “**systemd.unit=emergency.target**“
重启服务器并进入启动加载程序页面并选择 “Ubuntu”然后按 `e` 并移动到以 `linux` 开头的行尾,并添加 `systemd.unit=emergency.target`
![](https://www.linuxtechi.com/wp-content/uploads/2018/09/Emergecny-target-ubuntu18-04-server.jpg)
现在按 Ctlr-x 或 F10 以紧急模式启动,你将获得一个 shell 并从那里进行故障排除。正如我们已经讨论过的那样,在紧急模式下,文件系统将以只读模式挂载,并且在这种模式下也不会有网络,
现在按 `Ctrl-x``F10` 以紧急模式启动,你将获得一个 shell 并从那里进行故障排除。正如我们已经讨论过的那样,在紧急模式下,文件系统将以只读模式挂载,并且在这种模式下也不会有网络,
![](https://www.linuxtechi.com/wp-content/uploads/2018/09/Emergency-prompt-debian9.jpg)
@ -43,17 +44,17 @@
#### 将 Debian 9 引导到救援和紧急模式
重启 Debian 9.x 服务器并进入 grub页面选择 “**Debian GNU/Linux**”。
重启 Debian 9.x 服务器并进入 grub页面选择 “Debian GNU/Linux”。
![](https://www.linuxtechi.com/wp-content/uploads/2018/09/Debian9-Grub-Screen.jpg)
按下 “**e**” 并移动到 linux 开头的行尾并添加 “**systemd.unit=rescue.target**” 以在救援模式下启动系统, 要在紧急模式下启动,那就添加 “**systemd.unit=emergency.target**“
按下 `e` 并移动到 linux 开头的行尾并添加 `systemd.unit=rescue.target` 以在救援模式下启动系统, 要在紧急模式下启动,那就添加 `systemd.unit=emergency.target`
#### 救援模式:
![](https://www.linuxtechi.com/wp-content/uploads/2018/09/Rescue-mode-Debian9.jpg)
现在按 Ctrl-x 或 F10 以救援模式启动
现在按 `Ctrl-x``F10` 以救援模式启动
![](https://www.linuxtechi.com/wp-content/uploads/2018/09/Rescue-Mode-Shell-Debian9.jpg)
@ -63,11 +64,11 @@
![](https://www.linuxtechi.com/wp-content/uploads/2018/09/Emergency-target-grub-debian9.jpg)
现在按下 ctrl-x 或 F10 以紧急模式启动系统
现在按下 `ctrl-x``F10` 以紧急模式启动系统
![](https://www.linuxtechi.com/wp-content/uploads/2018/09/Emergency-prompt-debian9.jpg)
按下回车获取 shell 并使用 “**mount -o remount,rw /**” 命令以读写模式挂载根文件系统。
按下回车获取 shell 并使用 `mount -o remount,rw /` 命令以读写模式挂载根文件系统。
**注意:**如果已经在 Ubuntu 18.04 和 Debian 9 Server 中设置了 root 密码,那么你必须输入 root 密码才能在救援和紧急模式下获得 shell
@ -81,7 +82,7 @@ via: https://www.linuxtechi.com/boot-ubuntu-18-04-debian-9-rescue-emergency-mode
作者:[Pradeep Kumar][a]
选题:[lujun9972](https://github.com/lujun9972)
译者:[geekpi](https://github.com/geekpi)
校对:[校对者ID](https://github.com/校对者ID)
校对:[wxy](https://github.com/wxy)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出

View File

@ -1,3 +1,4 @@
(translating by runningwater)
CPU Power Manager Control And Manage CPU Frequency In Linux
======
@ -64,7 +65,7 @@ via: https://www.ostechnix.com/cpu-power-manager-control-and-manage-cpu-frequenc
作者:[EDITOR][a]
选题:[lujun9972](https://github.com/lujun9972)
译者:[译者ID](https://github.com/译者ID)
译者:[runningwater](https://github.com/runningwater)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出

View File

@ -1,207 +0,0 @@
HankChow translating
Why is Python so slow?
============================================================
Python is booming in popularity. It is used in DevOps, Data Science, Web Development and Security.
It does not, however, win any medals for speed.
![](https://cdn-images-1.medium.com/max/1200/0*M2qZQsVnDS-4i5zc.jpg)
> How does Java compare in terms of speed to C or C++ or C# or Python? The answer depends greatly on the type of application youre running. No benchmark is perfect, but The Computer Language Benchmarks Game is [a good starting point][5].
Ive been referring to the Computer Language Benchmarks Game for over a decade; compared with other languages like Java, C#, Go, JavaScript, C++, Python is [one of the slowest][6]. This includes [JIT][7] (C#, Java) and [AOT][8] (C, C++) compilers, as well as interpreted languages like JavaScript.
_NB: When I say “Python”, Im talking about the reference implementation of the language, CPython. I will refer to other runtimes in this article._
> I want to answer this question: When Python completes a comparable application 210x slower than another language,  _why is it slow_  and cant we  _make it faster_ ?
Here are the top theories:
* “ _Its the GIL (Global Interpreter Lock)_
* “ _Its because its interpreted and not compiled_
* “ _Its because its a dynamically typed language_
Which one of these reasons has the biggest impact on performance?
### “Its the GIL”
Modern computers come with CPUs that have multiple cores, and sometimes multiple processors. In order to utilise all this extra processing power, the Operating System defines a low-level structure called a thread, where a process (e.g. Chrome Browser) can spawn multiple threads and have instructions for the system inside. That way if one process is particularly CPU-intensive, that load can be shared across the cores and this effectively makes most applications complete tasks faster.
My Chrome Browser, as Im writing this article, has 44 threads open. Keep in mind that the structure and API of threading are different between POSIX-based (e.g. Mac OS and Linux) and Windows OS. The operating system also handles the scheduling of threads.
IF you havent done multi-threaded programming before, a concept youll need to quickly become familiar with locks. Unlike a single-threaded process, you need to ensure that when changing variables in memory, multiple threads dont try and access/change the same memory address at the same time.
When CPython creates variables, it allocates the memory and then counts how many references to that variable exist, this is a concept known as reference counting. If the number of references is 0, then it frees that piece of memory from the system. This is why creating a “temporary” variable within say, the scope of a for loop, doesnt blow up the memory consumption of your application.
The challenge then becomes when variables are shared within multiple threads, how CPython locks the reference count. There is a “global interpreter lock” that carefully controls thread execution. The interpreter can only execute one operation at a time, regardless of how many threads it has.
#### What does this mean to the performance of Python application?
If you have a single-threaded, single interpreter application. It will make no difference to the speed. Removing the GIL would have no impact on the performance of your code.
If you wanted to implement concurrency within a single interpreter (Python process) by using threading, and your threads were IO intensive (e.g. Network IO or Disk IO), you would see the consequences of GIL-contention.
![](https://cdn-images-1.medium.com/max/1600/0*S_iSksY5oM5H1Qf_.png)
From David Beazleys GIL visualised post [http://dabeaz.blogspot.com/2010/01/python-gil-visualized.html][1]
If you have a web-application (e.g. Django) and youre using WSGI, then each request to your web-app is a separate Python interpreter, so there is only 1 lock  _per_  request. Because the Python interpreter is slow to start, some WSGI implementations have a “Daemon Mode” [which keep Python process(es) on the go for you.][9]
#### What about other Python runtimes?
[PyPy has a GIL][10] and it is typically >3x faster than CPython.
[Jython does not have a GIL][11] because a Python thread in Jython is represented by a Java thread and benefits from the JVM memory-management system.
#### How does JavaScript do this?
Well, firstly all Javascript engines [use mark-and-sweep Garbage Collection][12]. As stated, the primary need for the GIL is CPythons memory-management algorithm.
JavaScript does not have a GIL, but its also single-threaded so it doesnt require one. JavaScripts event-loop and Promise/Callback pattern are how asynchronous-programming is achieved in place of concurrency. Python has a similar thing with the asyncio event-loop.
### “Its because its an interpreted language”
I hear this a lot and I find it a gross-simplification of the way CPython actually works. If at a terminal you wrote `python myscript.py` then CPython would start a long sequence of reading, lexing, parsing, compiling, interpreting and executing that code.
If youre interested in how that process works, Ive written about it before:
[Modifying the Python language in 6 minutes
This week I raised my first pull-request to the CPython core project, which was declined :-( but as to not completely…hackernoon.com][13][][14]
An important point in that process is the creation of a `.pyc` file, at the compiler stage, the bytecode sequence is written to a file inside `__pycache__/`on Python 3 or in the same directory in Python 2\. This doesnt just apply to your script, but all of the code you imported, including 3rd party modules.
So most of the time (unless you write code which you only ever run once?), Python is interpreting bytecode and executing it locally. Compare that with Java and C#.NET:
> Java compiles to an “Intermediate Language” and the Java Virtual Machine reads the bytecode and just-in-time compiles it to machine code. The .NET CIL is the same, the .NET Common-Language-Runtime, CLR, uses just-in-time compilation to machine code.
So, why is Python so much slower than both Java and C# in the benchmarks if they all use a virtual machine and some sort of Bytecode? Firstly, .NET and Java are JIT-Compiled.
JIT or Just-in-time compilation requires an intermediate language to allow the code to be split into chunks (or frames). Ahead of time (AOT) compilers are designed to ensure that the CPU can understand every line in the code before any interaction takes place.
The JIT itself does not make the execution any faster, because it is still executing the same bytecode sequences. However, JIT enables optimizations to be made at runtime. A good JIT optimizer will see which parts of the application are being executed a lot, call these “hot spots”. It will then make optimizations to those bits of code, by replacing them with more efficient versions.
This means that when your application does the same thing again and again, it can be significantly faster. Also, keep in mind that Java and C# are strongly-typed languages so the optimiser can make many more assumptions about the code.
PyPy has a JIT and as mentioned in the previous section, is significantly faster than CPython. This performance benchmark article goes into more detail —
[Which is the fastest version of Python?
Of course, “it depends”, but what does it depend on and how can you assess which is the fastest version of Python for…hackernoon.com][15][][16]
#### So why doesnt CPython use a JIT?
There are downsides to JITs: one of those is startup time. CPython startup time is already comparatively slow, PyPy is 23x slower to start than CPython. The Java Virtual Machine is notoriously slow to boot. The .NET CLR gets around this by starting at system-startup, but the developers of the CLR also develop the Operating System on which the CLR runs.
If you have a single Python process running for a long time, with code that can be optimized because it contains “hot spots”, then a JIT makes a lot of sense.
However, CPython is a general-purpose implementation. So if you were developing command-line applications using Python, having to wait for a JIT to start every time the CLI was called would be horribly slow.
CPython has to try and serve as many use cases as possible. There was the possibility of [plugging a JIT into CPython][17] but this project has largely stalled.
> If you want the benefits of a JIT and you have a workload that suits it, use PyPy.
### “Its because its a dynamically typed language”
In a “Statically-Typed” language, you have to specify the type of a variable when it is declared. Those would include C, C++, Java, C#, Go.
In a dynamically-typed language, there are still the concept of types, but the type of a variable is dynamic.
```
a = 1
a = "foo"
```
In this toy-example, Python creates a second variable with the same name and a type of `str` and deallocates the memory created for the first instance of `a`
Statically-typed languages arent designed as such to make your life hard, they are designed that way because of the way the CPU operates. If everything eventually needs to equate to a simple binary operation, you have to convert objects and types down to a low-level data structure.
Python does this for you, you just never see it, nor do you need to care.
Not having to declare the type isnt what makes Python slow, the design of the Python language enables you to make almost anything dynamic. You can replace the methods on objects at runtime, you can monkey-patch low-level system calls to a value declared at runtime. Almost anything is possible.
Its this design that makes it incredibly hard to optimise Python.
To illustrate my point, Im going to use a syscall tracing tool that works in Mac OS called Dtrace. CPython distributions do not come with DTrace builtin, so you have to recompile CPython. Im using 3.6.6 for my demo
```
wget https://github.com/python/cpython/archive/v3.6.6.zip
unzip v3.6.6.zip
cd v3.6.6
./configure --with-dtrace
make
```
Now `python.exe` will have Dtrace tracers throughout the code. [Paul Ross wrote an awesome Lightning Talk on Dtrace][19]. You can [download DTrace starter files][20] for Python to measure function calls, execution time, CPU time, syscalls, all sorts of fun. e.g.
`sudo dtrace -s toolkit/<tracer>.d -c ../cpython/python.exe script.py`
The `py_callflow` tracer shows all the function calls in your application
![](https://cdn-images-1.medium.com/max/1600/1*Lz4UdUi4EwknJ0IcpSJ52g.gif)
So, does Pythons dynamic typing make it slow?
* Comparing and converting types is costly, every time a variable is read, written to or referenced the type is checked
* It is hard to optimise a language that is so dynamic. The reason many alternatives to Python are so much faster is that they make compromises to flexibility in the name of performance
* Looking at [Cython][2], which combines C-Static Types and Python to optimise code where the types are known[ can provide ][3]an 84x performanceimprovement.
### Conclusion
> Python is primarily slow because of its dynamic nature and versatility. It can be used as a tool for all sorts of problems, where more optimised and faster alternatives are probably available.
There are, however, ways of optimising your Python applications by leveraging async, understanding the profiling tools, and consider using multiple-interpreters.
For applications where startup time is unimportant and the code would benefit a JIT, consider PyPy.
For parts of your code where performance is critical and you have more statically-typed variables, consider using [Cython][4].
#### Further reading
Jake VDPs excellent article (although slightly dated) [https://jakevdp.github.io/blog/2014/05/09/why-python-is-slow/][21]
Dave Beazleys talk on the GIL [http://www.dabeaz.com/python/GIL.pdf][22]
All about JIT compilers [https://hacks.mozilla.org/2017/02/a-crash-course-in-just-in-time-jit-compilers/][23]
--------------------------------------------------------------------------------
via: https://hackernoon.com/why-is-python-so-slow-e5074b6fe55b
作者:[Anthony Shaw][a]
选题:[oska874][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://hackernoon.com/@anthonypjshaw?source=post_header_lockup
[b]:https://github.com/oska874
[1]:http://dabeaz.blogspot.com/2010/01/python-gil-visualized.html
[2]:http://cython.org/
[3]:http://notes-on-cython.readthedocs.io/en/latest/std_dev.html
[4]:http://cython.org/
[5]:http://algs4.cs.princeton.edu/faq/
[6]:https://benchmarksgame-team.pages.debian.net/benchmarksgame/faster/python.html
[7]:https://en.wikipedia.org/wiki/Just-in-time_compilation
[8]:https://en.wikipedia.org/wiki/Ahead-of-time_compilation
[9]:https://www.slideshare.net/GrahamDumpleton/secrets-of-a-wsgi-master
[10]:http://doc.pypy.org/en/latest/faq.html#does-pypy-have-a-gil-why
[11]:http://www.jython.org/jythonbook/en/1.0/Concurrency.html#no-global-interpreter-lock
[12]:https://developer.mozilla.org/en-US/docs/Web/JavaScript/Memory_Management
[13]:https://hackernoon.com/modifying-the-python-language-in-7-minutes-b94b0a99ce14
[14]:https://hackernoon.com/modifying-the-python-language-in-7-minutes-b94b0a99ce14
[15]:https://hackernoon.com/which-is-the-fastest-version-of-python-2ae7c61a6b2b
[16]:https://hackernoon.com/which-is-the-fastest-version-of-python-2ae7c61a6b2b
[17]:https://www.slideshare.net/AnthonyShaw5/pyjion-a-jit-extension-system-for-cpython
[18]:https://github.com/python/cpython/archive/v3.6.6.zip
[19]:https://github.com/paulross/dtrace-py#the-lightning-talk
[20]:https://github.com/paulross/dtrace-py/tree/master/toolkit
[21]:https://jakevdp.github.io/blog/2014/05/09/why-python-is-slow/
[22]:http://www.dabeaz.com/python/GIL.pdf
[23]:https://hacks.mozilla.org/2017/02/a-crash-course-in-just-in-time-jit-compilers/

View File

@ -1,4 +1,3 @@
translating by name1e5s
Know Your Storage: Block, File & Object
======

View File

@ -1,75 +0,0 @@
translating---geekpi
Tips for listing files with ls at the Linux command line
======
Learn some of the Linux 'ls' command's most useful variations.
![](https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/button_push_open_keyboard_file_organize.png?itok=KlAsk1gx)
One of the first commands I learned in Linux was `ls`. Knowing whats in a directory where a file on your system resides is important. Being able to see and modify not just some but all of the files is also important.
My first LInux cheat sheet was the [One Page Linux Manual][1] , which was released in1999 and became my go-to reference. I taped it over my desk and referred to it often as I began to explore Linux. Listing files with `ls -l` is introduced on the first page, at the bottom of the first column.
Later, I would learn other iterations of this most basic command. Through the `ls` command, I began to learn about the complexity of the Linux file permissions and what was mine and what required root or sudo permission to change. I became very comfortable on the command line over time, and while I still use `ls -l` to find files in the directory, I frequently use `ls -al` so I can see hidden files that might need to be changed, like configuration files.
According to an article by Eric Fischer about the `ls` command in the [Linux Documentation Project][2], the command's roots go back to the `listf` command on MITs Compatible Time Sharing System in 1961. When CTSS was replaced by [Multics][3], the command became `list`, with switches like `list -all`. According to [Wikipedia][4], `ls` appeared in the original version of AT&T Unix. The `ls` command we use today on Linux systems comes from the [GNU Core Utilities][5].
Most of the time, I use only a couple of iterations of the command. Looking inside a directory with `ls` or `ls -al` is how I generally use the command, but there are many other options that you should be familiar with.
`$ ls -l` provides a simple list of the directory:
![](https://opensource.com/sites/default/files/uploads/linux_ls_1_0.png)
Using the man pages of my Fedora 28 system, I find that there are many other options to `ls`, all of which provide interesting and useful information about the Linux file system. By entering `man ls` at the command prompt, we can begin to explore some of the other options:
![](https://opensource.com/sites/default/files/uploads/linux_ls_2_0.png)
To sort the directory by file sizes, use `ls -lS`:
![](https://opensource.com/sites/default/files/uploads/linux_ls_3_0.png)
To list the contents in reverse order, use `ls -lr`:
![](https://opensource.com/sites/default/files/uploads/linux_ls_4.png)
To list contents by columns, use `ls -c`:
![](https://opensource.com/sites/default/files/uploads/linux_ls_5.png)
`ls -al` provides a list of all the files in the same directory:
![](https://opensource.com/sites/default/files/uploads/linux_ls_6.png)
Here are some additional options that I find useful and interesting:
* List only the .txt files in the directory: `ls *.txt`
* List by file size: `ls -s`
* Sort by time and date: `ls -d`
* Sort by extension: `ls -X`
* Sort by file size: `ls -S`
* Long format with file size: `ls -ls`
* List only the .txt files in a directory: `ls *.txt`
To generate a directory list in the specified format and send it to a file for later viewing, enter `ls -al > mydirectorylist`. Finally, one of the more exotic commands I found is `ls -R`, which provides a recursive list of all the directories on your computer and their contents.
For a complete list of the all the iterations of the `ls` command, refer to the [GNU Core Utilities][6].
--------------------------------------------------------------------------------
via: https://opensource.com/article/18/10/ls-command
作者:[Don Watkins][a]
选题:[lujun9972](https://github.com/lujun9972)
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/don-watkins
[1]: http://hackerspace.cs.rutgers.edu/library/General/One_Page_Linux_Manual.pdf
[2]: http://www.tldp.org/LDP/LG/issue48/fischer.html
[3]: https://en.wikipedia.org/wiki/Multics
[4]: https://en.wikipedia.org/wiki/Ls
[5]: http://www.gnu.org/s/coreutils/
[6]: https://www.gnu.org/software/coreutils/manual/html_node/ls-invocation.html#ls-invocation

View File

@ -1,182 +0,0 @@
distant1219 is translating
PyTorch 1.0 Preview Release: Facebooks newest Open Source AI
======
Facebook already uses its own Open Source AI, PyTorch quite extensively in its own artificial intelligence projects. Recently, they have gone a league ahead by releasing a pre-release preview version 1.0.
For those who are not familiar, [PyTorch][1] is a Python-based library for Scientific Computing.
PyTorch harnesses the [superior computational power of Graphical Processing Units (GPUs)][2] for carrying out complex [Tensor][3] computations and implementing [deep neural networks][4]. So, it is used widely across the world by numerous researchers and developers.
This new ready-to-use [Preview Release][5] was announced at the [PyTorch Developer Conference][6] at [The Midway][7], San Francisco, CA on Tuesday, October 2, 2018.
### Highlights of PyTorch 1.0 Release Candidate
![PyTorhc is Python based open source AI framework from Facebook][8]
Some of the main new features in the release candidate are:
#### 1\. JIT
JIT is a set of compiler tools to bring research close to production. It includes a Python-based language called Torch Script and also ways to make existing code compatible with itself.
#### 2\. New torch.distributed library: “C10D”
“C10D” enables asynchronous operation on different backends with performance improvements on slower networks and more.
#### 3\. C++ frontend (experimental)
Though it has been specifically mentioned as an unstable API (expected in a pre-release), this is a pure C++ interface to the PyTorch backend that follows the API and architecture of the established Python frontend to enable research in high performance, low latency and C++ applications installed directly on hardware.
To know more, you can take a look at the complete [update notes][9] on GitHub.
The first stable version PyTorch 1.0 will be released in summer.
### Installing PyTorch on Linux
To install PyTorch v1.0rc0, the developers recommend using [conda][10] while there also other ways to do that as shown on their [local installation page][11] where they have documented everything necessary in detail.
#### Prerequisites
* Linux
* Pip
* Python
* [CUDA][12] (For Nvidia GPU owners)
As we recently showed you [how to install and use Pip][13], lets get to know how we can install PyTorch with it.
Note that PyTorch has GPU and CPU-only variants. You should install the one that suits your hardware.
#### Installing old and stable version of PyTorch
If you want the stable release (version 0.4) for your GPU, use:
```
pip install torch torchvision
```
Use these two commands in succession for a CPU-only stable release:
```
pip install http://download.pytorch.org/whl/cpu/torch-0.4.1-cp27-cp27mu-linux_x86_64.whl
pip install torchvision
```
#### Installing PyTorch 1.0 Release Candidate
You install PyTorch 1.0 RC GPU version with this command:
```
pip install torch_nightly -f https://download.pytorch.org/whl/nightly/cu92/torch_nightly.html
```
If you do not have a GPU and would prefer a CPU-only version, use:
```
pip install torch_nightly -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
```
#### Verifying your PyTorch installation
Startup the python console on a terminal with the following simple command:
```
python
```
Now enter the following sample code line by line to verify your installation:
```
from __future__ import print_function
import torch
x = torch.rand(5, 3)
print(x)
```
You should get an output like:
```
tensor([[0.3380, 0.3845, 0.3217],
[0.8337, 0.9050, 0.2650],
[0.2979, 0.7141, 0.9069],
[0.1449, 0.1132, 0.1375],
[0.4675, 0.3947, 0.1426]])
```
To check whether you can use PyTorchs GPU capabilities, use the following sample code:
```
import torch
torch.cuda.is_available()
```
The resulting output should be:
```
True
```
Support for AMD GPUs for PyTorch is still under development, so complete test coverage is not yet provided as reported [here][14], suggesting this [resource][15] in case you have an AMD GPU.
Lets now look into some research projects that extensively use PyTorch:
### Ongoing Research Projects based on PyTorch
* [Detectron][16]: Facebook AI Researchs software system to intelligently detect and classify objects. It is based on Caffe2. Earlier this year, Caffe2 and PyTorch [joined forces][17] to create a Research + Production enabled PyTorch 1.0 we talk about.
* [Unsupervised Sentiment Discovery][18]: Such methods are extensively used with social media algorithms.
* [vid2vid][19]: Photorealistic video-to-video translation
* [DeepRecommender][20] (We covered how such systems work on our past [Netflix AI article][21])
Nvidia, leading GPU manufacturer covered more on this with their own [update][22] on this recent development where you can also read about ongoing collaborative research endeavours.
### How should we react to such PyTorch capabilities?
To think Facebook applies such amazingly innovative projects and more in its social media algorithms, should we appreciate all this or get alarmed? This is almost [Skynet][23]! This newly improved production-ready pre-release of PyTorch will certainly push things further ahead! Feel free to share your thoughts with us in the comments below!
--------------------------------------------------------------------------------
via: https://itsfoss.com/pytorch-open-source-ai-framework/
作者:[Avimanyu Bandyopadhyay][a]
选题:[lujun9972](https://github.com/lujun9972)
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://itsfoss.com/author/avimanyu/
[1]: https://pytorch.org/
[2]: https://en.wikipedia.org/wiki/General-purpose_computing_on_graphics_processing_units
[3]: https://en.wikipedia.org/wiki/Tensor
[4]: https://www.techopedia.com/definition/32902/deep-neural-network
[5]: https://code.fb.com/ai-research/facebook-accelerates-ai-development-with-new-partners-and-production-capabilities-for-pytorch-1-0
[6]: https://pytorch.fbreg.com/
[7]: https://www.themidwaysf.com/
[8]: https://4bds6hergc-flywheel.netdna-ssl.com/wp-content/uploads/2018/10/pytorch.jpeg
[9]: https://github.com/pytorch/pytorch/releases/tag/v1.0rc0
[10]: https://conda.io/
[11]: https://pytorch.org/get-started/locally/
[12]: https://www.pugetsystems.com/labs/hpc/How-to-install-CUDA-9-2-on-Ubuntu-18-04-1184/
[13]: https://itsfoss.com/install-pip-ubuntu/
[14]: https://github.com/pytorch/pytorch/issues/10657#issuecomment-415067478
[15]: https://rocm.github.io/install.html#installing-from-amd-rocm-repositories
[16]: https://github.com/facebookresearch/Detectron
[17]: https://caffe2.ai/blog/2018/05/02/Caffe2_PyTorch_1_0.html
[18]: https://github.com/NVIDIA/sentiment-discovery
[19]: https://github.com/NVIDIA/vid2vid
[20]: https://github.com/NVIDIA/DeepRecommender/
[21]: https://itsfoss.com/netflix-open-source-ai/
[22]: https://news.developer.nvidia.com/pytorch-1-0-accelerated-on-nvidia-gpus/
[23]: https://en.wikipedia.org/wiki/Skynet_(Terminator)

View File

@ -1,3 +1,5 @@
translating---geekpi
How to Install GRUB on Arch Linux (UEFI)
======

View File

@ -0,0 +1,151 @@
How To Browse And Read Entire Arch Wiki As Linux Man Pages
======
![](https://www.ostechnix.com/wp-content/uploads/2018/10/arch-wiki-720x340.jpg)
A while ago, I wrote a guide that described how to browse the Arch Wiki from your Terminal using a command line script named [**arch-wiki-cli**][1]. Using this script, anyone can easily navigate through entire Arch Wiki website and read it with a text browser of your choice. Obviously, an active Internet connection is required to use this script. Today, I stumbled upon a similar utility named **“Arch-wiki-man”**. As the name says, it is also used to read the Arch Wiki from command line, but it doesnt require Internet connection. Arch-wiki-man program helps you to browse and read entire Arch Wiki as Linux man pages. It will display any article from Arch Wiki in man pages format. Also, you need not to be online to browse Arch Wiki. The entire Arch Wiki will be downloaded locally and the updates are pushed automatically every two days. So, you always have an up-to-date, local copy of the Arch Wiki on your system.
### Installing Arch-wiki-man
Arch-wiki-man is available in [**AUR**][2], so you can install it using any AUR helper programs, for example [**Yay**][3].
```
$ yay -S arch-wiki-man
```
Alternatively, it can be installed using NPM package manager like below. Make sure you have [**installed NodeJS**][4] and run the following command to install it:
```
$ npm install -g arch-wiki-man
```
### Browse And Read Entire Arch Wiki As Linux Man Pages
The typical syntax of Arch-wiki-man is:
```
$ awman <search-query>
```
Let me show you some examples.
**Search with one or more matches**
Let us search for a [**Arch Linux installation guide**][5]. To do so, simply run:
```
$ awman Installation guide
```
The above command will search for the matches that contains the search term “Installation guide” in the Arch Wiki. If there are multiple matches for the given search term, a selection menu will appear. Choose the guide you want to read using **UP/DOWN arrows** or Vim-style keybindings ( **j/k** ) and hit ENTER to open it. The resulting guide will open in man pages format like below.
![][6]
Here, awman refers **a** rch **w** iki **m** an.
All man command options are supported, so you can navigate through guide as the way you do when reading a man page. To view the help section, press **h**.
![][7]
To exit the selection menu without entering **man** , simply press **Ctrl+c**.
To go back and/or quit man, type **q**.
**Search matches in titles and descriptions**
By default, Awman will search for the matches in titles only. You can, however, direct it to search for the matches in both the titles and descriptions as well.
```
$ awman -d vim
```
Or,
```
$ awman --desc-search vim
```
**Search for matches in contents**
Apart from searching for matches in titles and descriptions, it is also possible to scan the contents for a match as well. Please note that this will significantly slower the search process.
```
$ awman -k emacs
```
Or,
```
$ awman --apropos emacs
```
**Open the search results in web browser**
If you dont want to view the arch wiki guides in man page format, you can open it in a web browser. To do so, run:
```
$ awman -w pacman
```
Or,
```
$ awman --web pacman
```
This command will open the resulting match in the default web browser rather than with **man** command. Please note that you need Internet connection to use this option.
**Search in other languages**
By default, Awman will open the Arch wiki pages in English. If you want to view the results in other languages, for example **Spanish** , simply do:
```
$ awman -l spanish codecs
```
![][8]
To view the list of available language options, run:
```
$ awman --list-languages
```
**Update the local copy of Arch Wiki**
Like I already said, the updates are pushed automatically every two days. If you want to update it manually, simply run:
```
$ awman-update
arch-wiki-man@1.3.0 /usr/lib/node_modules/arch-wiki-man
└── arch-wiki-md-repo@0.10.84
arch-wiki-md-repo has been successfully updated or reinstalled.
```
Cheers!
--------------------------------------------------------------------------------
via: https://www.ostechnix.com/how-to-browse-and-read-entire-arch-wiki-as-linux-man-pages/
作者:[SK][a]
选题:[lujun9972][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://www.ostechnix.com/author/sk/
[b]: https://github.com/lujun9972
[1]: https://www.ostechnix.com/search-arch-wiki-website-commandline/
[2]: https://aur.archlinux.org/packages/arch-wiki-man/
[3]: https://www.ostechnix.com/yay-found-yet-another-reliable-aur-helper/
[4]: https://www.ostechnix.com/install-node-js-linux/
[5]: https://www.ostechnix.com/install-arch-linux-latest-version/
[6]: http://www.ostechnix.com/wp-content/uploads/2018/10/awman-1.gif
[7]: http://www.ostechnix.com/wp-content/uploads/2018/10/awman-2.png
[8]: https://www.ostechnix.com/wp-content/uploads/2018/10/awman-3-1.png

View File

@ -0,0 +1,119 @@
Final JOS project
======
Piazza Discussion Due, November 2, 2018 Proposals Due, November 8, 2018 Code repository Due, December 6, 2018 Check-off and in-class demos, Week of December 10, 2018
### Introduction
For the final project you have two options:
* Work on your own and do [lab 6][1], including one challenge exercise in lab 6\. (You are free, of course, to extend lab 6, or any part of JOS, further in interesting ways, but it isn't required.)
* Work in a team of one, two or three, on a project of your choice that involves your JOS. This project must be of the same scope as lab 6 or larger (if you are working in a team).
The goal is to have fun and explore more advanced O/S topics; you don't have to do novel research.
If you are doing your own project, we'll grade you on how much you got working, how elegant your design is, how well you can explain it, and how interesting and creative your solution is. We do realize that time is limited, so we don't expect you to re-write Linux by the end of the semester. Try to make sure your goals are reasonable; perhaps set a minimum goal that's definitely achievable (e.g., something of the scale of lab 6) and a more ambitious goal if things go well.
If you are doing lab 6, we will grade you on whether you pass the tests and the challenge exercise.
### Deliverables
```
Nov 3: Piazza discussion and form groups of 1, 2, or 3 (depending on which final project option you are choosing). Use the lab7 tag/folder on Piazza. Discuss ideas with others in comments on their Piazza posting. Use these postings to help find other students interested in similar ideas for forming a group. Course staff will provide feedback on project ideas on Piazza; if you'd like more detailed feedback, come chat with us in person.
```
```
Nov 9: Submit a proposal at [the submission website][19], just a paragraph or two. The proposal should include your group members list, the problem you want to address, how you plan to address it, and what are you proposing to specifically design and implement. (If you are doing lab 6, there is nothing to do for this deliverable.)
```
```
Dec 7: submit source code along with a brief write-up. Put the write-up under the top-level source directory with the name "README.pdf". Since some of you will be working in groups for this lab assignment, you may want to use git to share your project code between group members. You will need to decide on whose source code you will use as a starting point for your group project. Make sure to create a branch for your final project, and name it lab7\. (If you do lab 6, follow the lab 6 submission instructions.)
```
```
Week of Dec 11: short in-class demonstration. Prepare a short in-class demo of your JOS project. We will provide a projector that you can use to demonstrate your project. Depending on the number of groups and the kinds of projects that each group chooses, we may decide to limit the total number of presentations, and some groups might end up not presenting in class.
```
```
Week of Dec 11: check-off with TAs. Demo your project to the TAs so that we can ask you some questions and find out in more detail what you did.
```
### Project ideas
If you are not doing lab 6, here's a list of ideas to get you started thinking. But, you should feel free to pursue your own ideas. Some of the ideas are starting points and by themselves not of the scope of lab 6, and others are likely to be much of larger scope.
* Build a virtual machine monitor that can run multiple guests (for example, multiple instances of JOS), using [x86 VM support][2].
* Do something useful with the hardware protection of Intel SGX. [Here is a recent paper using Intel SGX][3].
* Make the JOS file system support writing, file creation, logging for durability, etc., perhaps taking ideas from Linux EXT3.
* Use file system ideas from [Soft updates][4], [WAFL][5], ZFS, or another advanced file system.
* Add snapshots to a file system, so that a user can look at the file system as it appeared at various points in the past. You'll probably want to use some kind of copy-on-write for disk storage to keep space consumption down.
* Build a [distributed shared memory][6] (DSM) system, so that you can run multi-threaded shared memory parallel programs on a cluster of machines, using paging to give the appearance of real shared memory. When a thread tries to access a page that's on another machine, the page fault will give the DSM system a chance to fetch the page over the network from whatever machine currently stores it.
* Allow processes to migrate from one machine to another over the network. You'll need to do something about the various pieces of a process's state, but since much state in JOS is in user-space it may be easier than process migration on Linux.
* Implement [paging][7] to disk in JOS, so that processes can be bigger than RAM. Extend your pager with swapping.
* Implement [mmap()][8] of files for JOS.
* Use [xfi][9] to sandbox code within a process.
* Support x86 [2MB or 4MB pages][10].
* Modify JOS to have kernel-supported threads inside processes. See [in-class uthread assignment][11] to get started. Implementing scheduler activations would be one way to do this project.
* Use fine-grained locking or lock-free concurrency in JOS in the kernel or in the file server (after making it multithreaded). The linux kernel uses [read copy update][12] to be able to perform read operations without holding locks. Explore RCU by implementing it in JOS and use it to support a name cache with lock-free reads.
* Implement ideas from the [Exokernel papers][13], for example the packet filter.
* Make JOS have soft real-time behavior. You will have to identify some application for which this is useful.
* Make JOS run on 64-bit CPUs. This includes redoing the virtual memory system to use 4-level pages tables. See [reference page][14] for some documentation.
* Port JOS to a different microprocessor. The [osdev wiki][15] may be helpful.
* A window system for JOS, including graphics driver and mouse. See [reference page][16] for some documentation. [sqrt(x)][17] is an example JOS window system (and writeup).
* Implement [dune][18] to export privileged hardware instructions to user-space applications in JOS.
* Write a user-level debugger; add strace-like functionality; hardware register profiling (e.g. Oprofile); call-traces
* Binary emulation for (static) Linux executables
--------------------------------------------------------------------------------
via: https://pdos.csail.mit.edu/6.828/2018/labs/lab7/
作者:[csail.mit][a]
选题:[lujun9972][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://pdos.csail.mit.edu
[b]: https://github.com/lujun9972
[1]: https://pdos.csail.mit.edu/6.828/2018/labs/lab6/index.html
[2]: http://www.intel.com/technology/itj/2006/v10i3/1-hardware/3-software.htm
[3]: https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-baumann.pdf
[4]: http://www.ece.cmu.edu/~ganger/papers/osdi94.pdf
[5]: https://ng.gnunet.org/sites/default/files/10.1.1.40.3691.pdf
[6]: http://www.cdf.toronto.edu/~csc469h/fall/handouts/nitzberg91.pdf
[7]: http://en.wikipedia.org/wiki/Paging
[8]: http://en.wikipedia.org/wiki/Mmap
[9]: http://static.usenix.org/event/osdi06/tech/erlingsson.html
[10]: http://en.wikipedia.org/wiki/Page_(computer_memory)
[11]: http://pdos.csail.mit.edu/6.828/2018/homework/xv6-uthread.html
[12]: http://en.wikipedia.org/wiki/Read-copy-update
[13]: http://pdos.csail.mit.edu/6.828/2018/readings/engler95exokernel.pdf
[14]: http://pdos.csail.mit.edu/6.828/2018/reference.html
[15]: http://wiki.osdev.org/Main_Page
[16]: http://pdos.csail.mit.edu/6.828/2018/reference.html
[17]: http://web.mit.edu/amdragon/www/pubs/sqrtx-6.828.html
[18]: https://www.usenix.org/system/files/conference/osdi12/osdi12-final-117.pdf
[19]: https://6828.scripts.mit.edu/2018/handin.py/

View File

@ -0,0 +1,595 @@
Lab 4: Preemptive Multitasking
======
### Lab 4: Preemptive Multitasking
**Part A due Thursday, October 18, 2018
Part B due Thursday, October 25, 2018
Part C due Thursday, November 1, 2018**
#### Introduction
In this lab you will implement preemptive multitasking among multiple simultaneously active user-mode environments.
In part A you will add multiprocessor support to JOS, implement round-robin scheduling, and add basic environment management system calls (calls that create and destroy environments, and allocate/map memory).
In part B, you will implement a Unix-like `fork()`, which allows a user-mode environment to create copies of itself.
Finally, in part C you will add support for inter-process communication (IPC), allowing different user-mode environments to communicate and synchronize with each other explicitly. You will also add support for hardware clock interrupts and preemption.
##### Getting Started
Use Git to commit your Lab 3 source, fetch the latest version of the course repository, and then create a local branch called `lab4` based on our lab4 branch, `origin/lab4`:
```
athena% cd ~/6.828/lab
athena% add git
athena% git pull
Already up-to-date.
athena% git checkout -b lab4 origin/lab4
Branch lab4 set up to track remote branch refs/remotes/origin/lab4.
Switched to a new branch "lab4"
athena% git merge lab3
Merge made by recursive.
...
athena%
```
Lab 4 contains a number of new source files, some of which you should browse before you start:
| kern/cpu.h | Kernel-private definitions for multiprocessor support |
| kern/mpconfig.c | Code to read the multiprocessor configuration |
| kern/lapic.c | Kernel code driving the local APIC unit in each processor |
| kern/mpentry.S | Assembly-language entry code for non-boot CPUs |
| kern/spinlock.h | Kernel-private definitions for spin locks, including the big kernel lock |
| kern/spinlock.c | Kernel code implementing spin locks |
| kern/sched.c | Code skeleton of the scheduler that you are about to implement |
##### Lab Requirements
This lab is divided into three parts, A, B, and C. We have allocated one week in the schedule for each part.
As before, you will need to do all of the regular exercises described in the lab and _at least one_ challenge problem. (You do not need to do one challenge problem per part, just one for the whole lab.) Additionally, you will need to write up a brief description of the challenge problem that you implemented. If you implement more than one challenge problem, you only need to describe one of them in the write-up, though of course you are welcome to do more. Place the write-up in a file called `answers-lab4.txt` in the top level of your `lab` directory before handing in your work.
#### Part A: Multiprocessor Support and Cooperative Multitasking
In the first part of this lab, you will first extend JOS to run on a multiprocessor system, and then implement some new JOS kernel system calls to allow user-level environments to create additional new environments. You will also implement _cooperative_ round-robin scheduling, allowing the kernel to switch from one environment to another when the current environment voluntarily relinquishes the CPU (or exits). Later in part C you will implement _preemptive_ scheduling, which allows the kernel to re-take control of the CPU from an environment after a certain time has passed even if the environment does not cooperate.
##### Multiprocessor Support
We are going to make JOS support "symmetric multiprocessing" (SMP), a multiprocessor model in which all CPUs have equivalent access to system resources such as memory and I/O buses. While all CPUs are functionally identical in SMP, during the boot process they can be classified into two types: the bootstrap processor (BSP) is responsible for initializing the system and for booting the operating system; and the application processors (APs) are activated by the BSP only after the operating system is up and running. Which processor is the BSP is determined by the hardware and the BIOS. Up to this point, all your existing JOS code has been running on the BSP.
In an SMP system, each CPU has an accompanying local APIC (LAPIC) unit. The LAPIC units are responsible for delivering interrupts throughout the system. The LAPIC also provides its connected CPU with a unique identifier. In this lab, we make use of the following basic functionality of the LAPIC unit (in `kern/lapic.c`):
* Reading the LAPIC identifier (APIC ID) to tell which CPU our code is currently running on (see `cpunum()`).
* Sending the `STARTUP` interprocessor interrupt (IPI) from the BSP to the APs to bring up other CPUs (see `lapic_startap()`).
* In part C, we program LAPIC's built-in timer to trigger clock interrupts to support preemptive multitasking (see `apic_init()`).
A processor accesses its LAPIC using memory-mapped I/O (MMIO). In MMIO, a portion of _physical_ memory is hardwired to the registers of some I/O devices, so the same load/store instructions typically used to access memory can be used to access device registers. You've already seen one IO hole at physical address `0xA0000` (we use this to write to the VGA display buffer). The LAPIC lives in a hole starting at physical address `0xFE000000` (32MB short of 4GB), so it's too high for us to access using our usual direct map at KERNBASE. The JOS virtual memory map leaves a 4MB gap at `MMIOBASE` so we have a place to map devices like this. Since later labs introduce more MMIO regions, you'll write a simple function to allocate space from this region and map device memory to it.
```
Exercise 1. Implement `mmio_map_region` in `kern/pmap.c`. To see how this is used, look at the beginning of `lapic_init` in `kern/lapic.c`. You'll have to do the next exercise, too, before the tests for `mmio_map_region` will run.
```
###### Application Processor Bootstrap
Before booting up APs, the BSP should first collect information about the multiprocessor system, such as the total number of CPUs, their APIC IDs and the MMIO address of the LAPIC unit. The `mp_init()` function in `kern/mpconfig.c` retrieves this information by reading the MP configuration table that resides in the BIOS's region of memory.
The `boot_aps()` function (in `kern/init.c`) drives the AP bootstrap process. APs start in real mode, much like how the bootloader started in `boot/boot.S`, so `boot_aps()` copies the AP entry code (`kern/mpentry.S`) to a memory location that is addressable in the real mode. Unlike with the bootloader, we have some control over where the AP will start executing code; we copy the entry code to `0x7000` (`MPENTRY_PADDR`), but any unused, page-aligned physical address below 640KB would work.
After that, `boot_aps()` activates APs one after another, by sending `STARTUP` IPIs to the LAPIC unit of the corresponding AP, along with an initial `CS:IP` address at which the AP should start running its entry code (`MPENTRY_PADDR` in our case). The entry code in `kern/mpentry.S` is quite similar to that of `boot/boot.S`. After some brief setup, it puts the AP into protected mode with paging enabled, and then calls the C setup routine `mp_main()` (also in `kern/init.c`). `boot_aps()` waits for the AP to signal a `CPU_STARTED` flag in `cpu_status` field of its `struct CpuInfo` before going on to wake up the next one.
```
Exercise 2. Read `boot_aps()` and `mp_main()` in `kern/init.c`, and the assembly code in `kern/mpentry.S`. Make sure you understand the control flow transfer during the bootstrap of APs. Then modify your implementation of `page_init()` in `kern/pmap.c` to avoid adding the page at `MPENTRY_PADDR` to the free list, so that we can safely copy and run AP bootstrap code at that physical address. Your code should pass the updated `check_page_free_list()` test (but might fail the updated `check_kern_pgdir()` test, which we will fix soon).
```
```
Question
1. Compare `kern/mpentry.S` side by side with `boot/boot.S`. Bearing in mind that `kern/mpentry.S` is compiled and linked to run above `KERNBASE` just like everything else in the kernel, what is the purpose of macro `MPBOOTPHYS`? Why is it necessary in `kern/mpentry.S` but not in `boot/boot.S`? In other words, what could go wrong if it were omitted in `kern/mpentry.S`?
Hint: recall the differences between the link address and the load address that we have discussed in Lab 1.
```
###### Per-CPU State and Initialization
When writing a multiprocessor OS, it is important to distinguish between per-CPU state that is private to each processor, and global state that the whole system shares. `kern/cpu.h` defines most of the per-CPU state, including `struct CpuInfo`, which stores per-CPU variables. `cpunum()` always returns the ID of the CPU that calls it, which can be used as an index into arrays like `cpus`. Alternatively, the macro `thiscpu` is shorthand for the current CPU's `struct CpuInfo`.
Here is the per-CPU state you should be aware of:
* **Per-CPU kernel stack**.
Because multiple CPUs can trap into the kernel simultaneously, we need a separate kernel stack for each processor to prevent them from interfering with each other's execution. The array `percpu_kstacks[NCPU][KSTKSIZE]` reserves space for NCPU's worth of kernel stacks.
In Lab 2, you mapped the physical memory that `bootstack` refers to as the BSP's kernel stack just below `KSTACKTOP`. Similarly, in this lab, you will map each CPU's kernel stack into this region with guard pages acting as a buffer between them. CPU 0's stack will still grow down from `KSTACKTOP`; CPU 1's stack will start `KSTKGAP` bytes below the bottom of CPU 0's stack, and so on. `inc/memlayout.h` shows the mapping layout.
* **Per-CPU TSS and TSS descriptor**.
A per-CPU task state segment (TSS) is also needed in order to specify where each CPU's kernel stack lives. The TSS for CPU _i_ is stored in `cpus[i].cpu_ts`, and the corresponding TSS descriptor is defined in the GDT entry `gdt[(GD_TSS0 >> 3) + i]`. The global `ts` variable defined in `kern/trap.c` will no longer be useful.
* **Per-CPU current environment pointer**.
Since each CPU can run different user process simultaneously, we redefined the symbol `curenv` to refer to `cpus[cpunum()].cpu_env` (or `thiscpu->cpu_env`), which points to the environment _currently_ executing on the _current_ CPU (the CPU on which the code is running).
* **Per-CPU system registers**.
All registers, including system registers, are private to a CPU. Therefore, instructions that initialize these registers, such as `lcr3()`, `ltr()`, `lgdt()`, `lidt()`, etc., must be executed once on each CPU. Functions `env_init_percpu()` and `trap_init_percpu()` are defined for this purpose.
```
Exercise 3. Modify `mem_init_mp()` (in `kern/pmap.c`) to map per-CPU stacks starting at `KSTACKTOP`, as shown in `inc/memlayout.h`. The size of each stack is `KSTKSIZE` bytes plus `KSTKGAP` bytes of unmapped guard pages. Your code should pass the new check in `check_kern_pgdir()`.
```
```
Exercise 4. The code in `trap_init_percpu()` (`kern/trap.c`) initializes the TSS and TSS descriptor for the BSP. It worked in Lab 3, but is incorrect when running on other CPUs. Change the code so that it can work on all CPUs. (Note: your new code should not use the global `ts` variable any more.)
```
When you finish the above exercises, run JOS in QEMU with 4 CPUs using make qemu CPUS=4 (or make qemu-nox CPUS=4), you should see output like this:
```
...
Physical memory: 66556K available, base = 640K, extended = 65532K
check_page_alloc() succeeded!
check_page() succeeded!
check_kern_pgdir() succeeded!
check_page_installed_pgdir() succeeded!
SMP: CPU 0 found 4 CPU(s)
enabled interrupts: 1 2
SMP: CPU 1 starting
SMP: CPU 2 starting
SMP: CPU 3 starting
```
###### Locking
Our current code spins after initializing the AP in `mp_main()`. Before letting the AP get any further, we need to first address race conditions when multiple CPUs run kernel code simultaneously. The simplest way to achieve this is to use a _big kernel lock_. The big kernel lock is a single global lock that is held whenever an environment enters kernel mode, and is released when the environment returns to user mode. In this model, environments in user mode can run concurrently on any available CPUs, but no more than one environment can run in kernel mode; any other environments that try to enter kernel mode are forced to wait.
`kern/spinlock.h` declares the big kernel lock, namely `kernel_lock`. It also provides `lock_kernel()` and `unlock_kernel()`, shortcuts to acquire and release the lock. You should apply the big kernel lock at four locations:
* In `i386_init()`, acquire the lock before the BSP wakes up the other CPUs.
* In `mp_main()`, acquire the lock after initializing the AP, and then call `sched_yield()` to start running environments on this AP.
* In `trap()`, acquire the lock when trapped from user mode. To determine whether a trap happened in user mode or in kernel mode, check the low bits of the `tf_cs`.
* In `env_run()`, release the lock _right before_ switching to user mode. Do not do that too early or too late, otherwise you will experience races or deadlocks.
```
Exercise 5. Apply the big kernel lock as described above, by calling `lock_kernel()` and `unlock_kernel()` at the proper locations.
```
How to test if your locking is correct? You can't at this moment! But you will be able to after you implement the scheduler in the next exercise.
```
Question
2. It seems that using the big kernel lock guarantees that only one CPU can run the kernel code at a time. Why do we still need separate kernel stacks for each CPU? Describe a scenario in which using a shared kernel stack will go wrong, even with the protection of the big kernel lock.
```
```
Challenge! The big kernel lock is simple and easy to use. Nevertheless, it eliminates all concurrency in kernel mode. Most modern operating systems use different locks to protect different parts of their shared state, an approach called _fine-grained locking_. Fine-grained locking can increase performance significantly, but is more difficult to implement and error-prone. If you are brave enough, drop the big kernel lock and embrace concurrency in JOS!
It is up to you to decide the locking granularity (the amount of data that a lock protects). As a hint, you may consider using spin locks to ensure exclusive access to these shared components in the JOS kernel:
* The page allocator.
* The console driver.
* The scheduler.
* The inter-process communication (IPC) state that you will implement in the part C.
```
##### Round-Robin Scheduling
Your next task in this lab is to change the JOS kernel so that it can alternate between multiple environments in "round-robin" fashion. Round-robin scheduling in JOS works as follows:
* The function `sched_yield()` in the new `kern/sched.c` is responsible for selecting a new environment to run. It searches sequentially through the `envs[]` array in circular fashion, starting just after the previously running environment (or at the beginning of the array if there was no previously running environment), picks the first environment it finds with a status of `ENV_RUNNABLE` (see `inc/env.h`), and calls `env_run()` to jump into that environment.
* `sched_yield()` must never run the same environment on two CPUs at the same time. It can tell that an environment is currently running on some CPU (possibly the current CPU) because that environment's status will be `ENV_RUNNING`.
* We have implemented a new system call for you, `sys_yield()`, which user environments can call to invoke the kernel's `sched_yield()` function and thereby voluntarily give up the CPU to a different environment.
```
Exercise 6. Implement round-robin scheduling in `sched_yield()` as described above. Don't forget to modify `syscall()` to dispatch `sys_yield()`.
Make sure to invoke `sched_yield()` in `mp_main`.
Modify `kern/init.c` to create three (or more!) environments that all run the program `user/yield.c`.
Run make qemu. You should see the environments switch back and forth between each other five times before terminating, like below.
Test also with several CPUS: make qemu CPUS=2.
...
Hello, I am environment 00001000.
Hello, I am environment 00001001.
Hello, I am environment 00001002.
Back in environment 00001000, iteration 0.
Back in environment 00001001, iteration 0.
Back in environment 00001002, iteration 0.
Back in environment 00001000, iteration 1.
Back in environment 00001001, iteration 1.
Back in environment 00001002, iteration 1.
...
After the `yield` programs exit, there will be no runnable environment in the system, the scheduler should invoke the JOS kernel monitor. If any of this does not happen, then fix your code before proceeding.
```
```
Question
3. In your implementation of `env_run()` you should have called `lcr3()`. Before and after the call to `lcr3()`, your code makes references (at least it should) to the variable `e`, the argument to `env_run`. Upon loading the `%cr3` register, the addressing context used by the MMU is instantly changed. But a virtual address (namely `e`) has meaning relative to a given address context--the address context specifies the physical address to which the virtual address maps. Why can the pointer `e` be dereferenced both before and after the addressing switch?
4. Whenever the kernel switches from one environment to another, it must ensure the old environment's registers are saved so they can be restored properly later. Why? Where does this happen?
```
```
Challenge! Add a less trivial scheduling policy to the kernel, such as a fixed-priority scheduler that allows each environment to be assigned a priority and ensures that higher-priority environments are always chosen in preference to lower-priority environments. If you're feeling really adventurous, try implementing a Unix-style adjustable-priority scheduler or even a lottery or stride scheduler. (Look up "lottery scheduling" and "stride scheduling" in Google.)
Write a test program or two that verifies that your scheduling algorithm is working correctly (i.e., the right environments get run in the right order). It may be easier to write these test programs once you have implemented `fork()` and IPC in parts B and C of this lab.
```
```
Challenge! The JOS kernel currently does not allow applications to use the x86 processor's x87 floating-point unit (FPU), MMX instructions, or Streaming SIMD Extensions (SSE). Extend the `Env` structure to provide a save area for the processor's floating point state, and extend the context switching code to save and restore this state properly when switching from one environment to another. The `FXSAVE` and `FXRSTOR` instructions may be useful, but note that these are not in the old i386 user's manual because they were introduced in more recent processors. Write a user-level test program that does something cool with floating-point.
```
##### System Calls for Environment Creation
Although your kernel is now capable of running and switching between multiple user-level environments, it is still limited to running environments that the _kernel_ initially set up. You will now implement the necessary JOS system calls to allow _user_ environments to create and start other new user environments.
Unix provides the `fork()` system call as its process creation primitive. Unix `fork()` copies the entire address space of calling process (the parent) to create a new process (the child). The only differences between the two observable from user space are their process IDs and parent process IDs (as returned by `getpid` and `getppid`). In the parent, `fork()` returns the child's process ID, while in the child, `fork()` returns 0. By default, each process gets its own private address space, and neither process's modifications to memory are visible to the other.
You will provide a different, more primitive set of JOS system calls for creating new user-mode environments. With these system calls you will be able to implement a Unix-like `fork()` entirely in user space, in addition to other styles of environment creation. The new system calls you will write for JOS are as follows:
* `sys_exofork`:
This system call creates a new environment with an almost blank slate: nothing is mapped in the user portion of its address space, and it is not runnable. The new environment will have the same register state as the parent environment at the time of the `sys_exofork` call. In the parent, `sys_exofork` will return the `envid_t` of the newly created environment (or a negative error code if the environment allocation failed). In the child, however, it will return 0. (Since the child starts out marked as not runnable, `sys_exofork` will not actually return in the child until the parent has explicitly allowed this by marking the child runnable using....)
* `sys_env_set_status`:
Sets the status of a specified environment to `ENV_RUNNABLE` or `ENV_NOT_RUNNABLE`. This system call is typically used to mark a new environment ready to run, once its address space and register state has been fully initialized.
* `sys_page_alloc`:
Allocates a page of physical memory and maps it at a given virtual address in a given environment's address space.
* `sys_page_map`:
Copy a page mapping ( _not_ the contents of a page!) from one environment's address space to another, leaving a memory sharing arrangement in place so that the new and the old mappings both refer to the same page of physical memory.
* `sys_page_unmap`:
Unmap a page mapped at a given virtual address in a given environment.
For all of the system calls above that accept environment IDs, the JOS kernel supports the convention that a value of 0 means "the current environment." This convention is implemented by `envid2env()` in `kern/env.c`.
We have provided a very primitive implementation of a Unix-like `fork()` in the test program `user/dumbfork.c`. This test program uses the above system calls to create and run a child environment with a copy of its own address space. The two environments then switch back and forth using `sys_yield` as in the previous exercise. The parent exits after 10 iterations, whereas the child exits after 20.
```
Exercise 7. Implement the system calls described above in `kern/syscall.c` and make sure `syscall()` calls them. You will need to use various functions in `kern/pmap.c` and `kern/env.c`, particularly `envid2env()`. For now, whenever you call `envid2env()`, pass 1 in the `checkperm` parameter. Be sure you check for any invalid system call arguments, returning `-E_INVAL` in that case. Test your JOS kernel with `user/dumbfork` and make sure it works before proceeding.
```
```
Challenge! Add the additional system calls necessary to _read_ all of the vital state of an existing environment as well as set it up. Then implement a user mode program that forks off a child environment, runs it for a while (e.g., a few iterations of `sys_yield()`), then takes a complete snapshot or _checkpoint_ of the child environment, runs the child for a while longer, and finally restores the child environment to the state it was in at the checkpoint and continues it from there. Thus, you are effectively "replaying" the execution of the child environment from an intermediate state. Make the child environment perform some interaction with the user using `sys_cgetc()` or `readline()` so that the user can view and mutate its internal state, and verify that with your checkpoint/restart you can give the child environment a case of selective amnesia, making it "forget" everything that happened beyond a certain point.
```
This completes Part A of the lab; make sure it passes all of the Part A tests when you run make grade, and hand it in using make handin as usual. If you are trying to figure out why a particular test case is failing, run ./grade-lab4 -v, which will show you the output of the kernel builds and QEMU runs for each test, until a test fails. When a test fails, the script will stop, and then you can inspect `jos.out` to see what the kernel actually printed.
#### Part B: Copy-on-Write Fork
As mentioned earlier, Unix provides the `fork()` system call as its primary process creation primitive. The `fork()` system call copies the address space of the calling process (the parent) to create a new process (the child).
xv6 Unix implements `fork()` by copying all data from the parent's pages into new pages allocated for the child. This is essentially the same approach that `dumbfork()` takes. The copying of the parent's address space into the child is the most expensive part of the `fork()` operation.
However, a call to `fork()` is frequently followed almost immediately by a call to `exec()` in the child process, which replaces the child's memory with a new program. This is what the the shell typically does, for example. In this case, the time spent copying the parent's address space is largely wasted, because the child process will use very little of its memory before calling `exec()`.
For this reason, later versions of Unix took advantage of virtual memory hardware to allow the parent and child to _share_ the memory mapped into their respective address spaces until one of the processes actually modifies it. This technique is known as _copy-on-write_. To do this, on `fork()` the kernel would copy the address space _mappings_ from the parent to the child instead of the contents of the mapped pages, and at the same time mark the now-shared pages read-only. When one of the two processes tries to write to one of these shared pages, the process takes a page fault. At this point, the Unix kernel realizes that the page was really a "virtual" or "copy-on-write" copy, and so it makes a new, private, writable copy of the page for the faulting process. In this way, the contents of individual pages aren't actually copied until they are actually written to. This optimization makes a `fork()` followed by an `exec()` in the child much cheaper: the child will probably only need to copy one page (the current page of its stack) before it calls `exec()`.
In the next piece of this lab, you will implement a "proper" Unix-like `fork()` with copy-on-write, as a user space library routine. Implementing `fork()` and copy-on-write support in user space has the benefit that the kernel remains much simpler and thus more likely to be correct. It also lets individual user-mode programs define their own semantics for `fork()`. A program that wants a slightly different implementation (for example, the expensive always-copy version like `dumbfork()`, or one in which the parent and child actually share memory afterward) can easily provide its own.
##### User-level page fault handling
A user-level copy-on-write `fork()` needs to know about page faults on write-protected pages, so that's what you'll implement first. Copy-on-write is only one of many possible uses for user-level page fault handling.
It's common to set up an address space so that page faults indicate when some action needs to take place. For example, most Unix kernels initially map only a single page in a new process's stack region, and allocate and map additional stack pages later "on demand" as the process's stack consumption increases and causes page faults on stack addresses that are not yet mapped. A typical Unix kernel must keep track of what action to take when a page fault occurs in each region of a process's space. For example, a fault in the stack region will typically allocate and map new page of physical memory. A fault in the program's BSS region will typically allocate a new page, fill it with zeroes, and map it. In systems with demand-paged executables, a fault in the text region will read the corresponding page of the binary off of disk and then map it.
This is a lot of information for the kernel to keep track of. Instead of taking the traditional Unix approach, you will decide what to do about each page fault in user space, where bugs are less damaging. This design has the added benefit of allowing programs great flexibility in defining their memory regions; you'll use user-level page fault handling later for mapping and accessing files on a disk-based file system.
###### Setting the Page Fault Handler
In order to handle its own page faults, a user environment will need to register a _page fault handler entrypoint_ with the JOS kernel. The user environment registers its page fault entrypoint via the new `sys_env_set_pgfault_upcall` system call. We have added a new member to the `Env` structure, `env_pgfault_upcall`, to record this information.
```
Exercise 8. Implement the `sys_env_set_pgfault_upcall` system call. Be sure to enable permission checking when looking up the environment ID of the target environment, since this is a "dangerous" system call.
```
###### Normal and Exception Stacks in User Environments
During normal execution, a user environment in JOS will run on the _normal_ user stack: its `ESP` register starts out pointing at `USTACKTOP`, and the stack data it pushes resides on the page between `USTACKTOP-PGSIZE` and `USTACKTOP-1` inclusive. When a page fault occurs in user mode, however, the kernel will restart the user environment running a designated user-level page fault handler on a different stack, namely the _user exception_ stack. In essence, we will make the JOS kernel implement automatic "stack switching" on behalf of the user environment, in much the same way that the x86 _processor_ already implements stack switching on behalf of JOS when transferring from user mode to kernel mode!
The JOS user exception stack is also one page in size, and its top is defined to be at virtual address `UXSTACKTOP`, so the valid bytes of the user exception stack are from `UXSTACKTOP-PGSIZE` through `UXSTACKTOP-1` inclusive. While running on this exception stack, the user-level page fault handler can use JOS's regular system calls to map new pages or adjust mappings so as to fix whatever problem originally caused the page fault. Then the user-level page fault handler returns, via an assembly language stub, to the faulting code on the original stack.
Each user environment that wants to support user-level page fault handling will need to allocate memory for its own exception stack, using the `sys_page_alloc()` system call introduced in part A.
###### Invoking the User Page Fault Handler
You will now need to change the page fault handling code in `kern/trap.c` to handle page faults from user mode as follows. We will call the state of the user environment at the time of the fault the _trap-time_ state.
If there is no page fault handler registered, the JOS kernel destroys the user environment with a message as before. Otherwise, the kernel sets up a trap frame on the exception stack that looks like a `struct UTrapframe` from `inc/trap.h`:
```
<-- UXSTACKTOP
trap-time esp
trap-time eflags
trap-time eip
trap-time eax start of struct PushRegs
trap-time ecx
trap-time edx
trap-time ebx
trap-time esp
trap-time ebp
trap-time esi
trap-time edi end of struct PushRegs
tf_err (error code)
fault_va <-- %esp when handler is run
```
The kernel then arranges for the user environment to resume execution with the page fault handler running on the exception stack with this stack frame; you must figure out how to make this happen. The `fault_va` is the virtual address that caused the page fault.
If the user environment is _already_ running on the user exception stack when an exception occurs, then the page fault handler itself has faulted. In this case, you should start the new stack frame just under the current `tf->tf_esp` rather than at `UXSTACKTOP`. You should first push an empty 32-bit word, then a `struct UTrapframe`.
To test whether `tf->tf_esp` is already on the user exception stack, check whether it is in the range between `UXSTACKTOP-PGSIZE` and `UXSTACKTOP-1`, inclusive.
```
Exercise 9. Implement the code in `page_fault_handler` in `kern/trap.c` required to dispatch page faults to the user-mode handler. Be sure to take appropriate precautions when writing into the exception stack. (What happens if the user environment runs out of space on the exception stack?)
```
###### User-mode Page Fault Entrypoint
Next, you need to implement the assembly routine that will take care of calling the C page fault handler and resume execution at the original faulting instruction. This assembly routine is the handler that will be registered with the kernel using `sys_env_set_pgfault_upcall()`.
```
Exercise 10. Implement the `_pgfault_upcall` routine in `lib/pfentry.S`. The interesting part is returning to the original point in the user code that caused the page fault. You'll return directly there, without going back through the kernel. The hard part is simultaneously switching stacks and re-loading the EIP.
```
Finally, you need to implement the C user library side of the user-level page fault handling mechanism.
```
Exercise 11. Finish `set_pgfault_handler()` in `lib/pgfault.c`.
```
###### Testing
Run `user/faultread` (make run-faultread). You should see:
```
...
[00000000] new env 00001000
[00001000] user fault va 00000000 ip 0080003a
TRAP frame ...
[00001000] free env 00001000
```
Run `user/faultdie`. You should see:
```
...
[00000000] new env 00001000
i faulted at va deadbeef, err 6
[00001000] exiting gracefully
[00001000] free env 00001000
```
Run `user/faultalloc`. You should see:
```
...
[00000000] new env 00001000
fault deadbeef
this string was faulted in at deadbeef
fault cafebffe
fault cafec000
this string was faulted in at cafebffe
[00001000] exiting gracefully
[00001000] free env 00001000
```
If you see only the first "this string" line, it means you are not handling recursive page faults properly.
Run `user/faultallocbad`. You should see:
```
...
[00000000] new env 00001000
[00001000] user_mem_check assertion failure for va deadbeef
[00001000] free env 00001000
```
Make sure you understand why `user/faultalloc` and `user/faultallocbad` behave differently.
```
Challenge! Extend your kernel so that not only page faults, but _all_ types of processor exceptions that code running in user space can generate, can be redirected to a user-mode exception handler. Write user-mode test programs to test user-mode handling of various exceptions such as divide-by-zero, general protection fault, and illegal opcode.
```
##### Implementing Copy-on-Write Fork
You now have the kernel facilities to implement copy-on-write `fork()` entirely in user space.
We have provided a skeleton for your `fork()` in `lib/fork.c`. Like `dumbfork()`, `fork()` should create a new environment, then scan through the parent environment's entire address space and set up corresponding page mappings in the child. The key difference is that, while `dumbfork()` copied _pages_ , `fork()` will initially only copy page _mappings_. `fork()` will copy each page only when one of the environments tries to write it.
The basic control flow for `fork()` is as follows:
1. The parent installs `pgfault()` as the C-level page fault handler, using the `set_pgfault_handler()` function you implemented above.
2. The parent calls `sys_exofork()` to create a child environment.
3. For each writable or copy-on-write page in its address space below UTOP, the parent calls `duppage`, which should map the page copy-on-write into the address space of the child and then _remap_ the page copy-on-write in its own address space. [ Note: The ordering here (i.e., marking a page as COW in the child before marking it in the parent) actually matters! Can you see why? Try to think of a specific case where reversing the order could cause trouble. ] `duppage` sets both PTEs so that the page is not writeable, and to contain `PTE_COW` in the "avail" field to distinguish copy-on-write pages from genuine read-only pages.
The exception stack is _not_ remapped this way, however. Instead you need to allocate a fresh page in the child for the exception stack. Since the page fault handler will be doing the actual copying and the page fault handler runs on the exception stack, the exception stack cannot be made copy-on-write: who would copy it?
`fork()` also needs to handle pages that are present, but not writable or copy-on-write.
4. The parent sets the user page fault entrypoint for the child to look like its own.
5. The child is now ready to run, so the parent marks it runnable.
Each time one of the environments writes a copy-on-write page that it hasn't yet written, it will take a page fault. Here's the control flow for the user page fault handler:
1. The kernel propagates the page fault to `_pgfault_upcall`, which calls `fork()`'s `pgfault()` handler.
2. `pgfault()` checks that the fault is a write (check for `FEC_WR` in the error code) and that the PTE for the page is marked `PTE_COW`. If not, panic.
3. `pgfault()` allocates a new page mapped at a temporary location and copies the contents of the faulting page into it. Then the fault handler maps the new page at the appropriate address with read/write permissions, in place of the old read-only mapping.
The user-level `lib/fork.c` code must consult the environment's page tables for several of the operations above (e.g., that the PTE for a page is marked `PTE_COW`). The kernel maps the environment's page tables at `UVPT` exactly for this purpose. It uses a [clever mapping trick][1] to make it to make it easy to lookup PTEs for user code. `lib/entry.S` sets up `uvpt` and `uvpd` so that you can easily lookup page-table information in `lib/fork.c`.
``````
Exercise 12. Implement `fork`, `duppage` and `pgfault` in `lib/fork.c`.
Test your code with the `forktree` program. It should produce the following messages, with interspersed 'new env', 'free env', and 'exiting gracefully' messages. The messages may not appear in this order, and the environment IDs may be different.
1000: I am ''
1001: I am '0'
2000: I am '00'
2001: I am '000'
1002: I am '1'
3000: I am '11'
3001: I am '10'
4000: I am '100'
1003: I am '01'
5000: I am '010'
4001: I am '011'
2002: I am '110'
1004: I am '001'
1005: I am '111'
1006: I am '101'
```
```
Challenge! Implement a shared-memory `fork()` called `sfork()`. This version should have the parent and child _share_ all their memory pages (so writes in one environment appear in the other) except for pages in the stack area, which should be treated in the usual copy-on-write manner. Modify `user/forktree.c` to use `sfork()` instead of regular `fork()`. Also, once you have finished implementing IPC in part C, use your `sfork()` to run `user/pingpongs`. You will have to find a new way to provide the functionality of the global `thisenv` pointer.
```
```
Challenge! Your implementation of `fork` makes a huge number of system calls. On the x86, switching into the kernel using interrupts has non-trivial cost. Augment the system call interface so that it is possible to send a batch of system calls at once. Then change `fork` to use this interface.
How much faster is your new `fork`?
You can answer this (roughly) by using analytical arguments to estimate how much of an improvement batching system calls will make to the performance of your `fork`: How expensive is an `int 0x30` instruction? How many times do you execute `int 0x30` in your `fork`? Is accessing the `TSS` stack switch also expensive? And so on...
Alternatively, you can boot your kernel on real hardware and _really_ benchmark your code. See the `RDTSC` (read time-stamp counter) instruction, defined in the IA32 manual, which counts the number of clock cycles that have elapsed since the last processor reset. QEMU doesn't emulate this instruction faithfully (it can either count the number of virtual instructions executed or use the host TSC, neither of which reflects the number of cycles a real CPU would require).
```
This ends part B. Make sure you pass all of the Part B tests when you run make grade. As usual, you can hand in your submission with make handin.
#### Part C: Preemptive Multitasking and Inter-Process communication (IPC)
In the final part of lab 4 you will modify the kernel to preempt uncooperative environments and to allow environments to pass messages to each other explicitly.
##### Clock Interrupts and Preemption
Run the `user/spin` test program. This test program forks off a child environment, which simply spins forever in a tight loop once it receives control of the CPU. Neither the parent environment nor the kernel ever regains the CPU. This is obviously not an ideal situation in terms of protecting the system from bugs or malicious code in user-mode environments, because any user-mode environment can bring the whole system to a halt simply by getting into an infinite loop and never giving back the CPU. In order to allow the kernel to _preempt_ a running environment, forcefully retaking control of the CPU from it, we must extend the JOS kernel to support external hardware interrupts from the clock hardware.
###### Interrupt discipline
External interrupts (i.e., device interrupts) are referred to as IRQs. There are 16 possible IRQs, numbered 0 through 15. The mapping from IRQ number to IDT entry is not fixed. `pic_init` in `picirq.c` maps IRQs 0-15 to IDT entries `IRQ_OFFSET` through `IRQ_OFFSET+15`.
In `inc/trap.h`, `IRQ_OFFSET` is defined to be decimal 32. Thus the IDT entries 32-47 correspond to the IRQs 0-15. For example, the clock interrupt is IRQ 0. Thus, IDT[IRQ_OFFSET+0] (i.e., IDT[32]) contains the address of the clock's interrupt handler routine in the kernel. This `IRQ_OFFSET` is chosen so that the device interrupts do not overlap with the processor exceptions, which could obviously cause confusion. (In fact, in the early days of PCs running MS-DOS, the `IRQ_OFFSET` effectively _was_ zero, which indeed caused massive confusion between handling hardware interrupts and handling processor exceptions!)
In JOS, we make a key simplification compared to xv6 Unix. External device interrupts are _always_ disabled when in the kernel (and, like xv6, enabled when in user space). External interrupts are controlled by the `FL_IF` flag bit of the `%eflags` register (see `inc/mmu.h`). When this bit is set, external interrupts are enabled. While the bit can be modified in several ways, because of our simplification, we will handle it solely through the process of saving and restoring `%eflags` register as we enter and leave user mode.
You will have to ensure that the `FL_IF` flag is set in user environments when they run so that when an interrupt arrives, it gets passed through to the processor and handled by your interrupt code. Otherwise, interrupts are _masked_ , or ignored until interrupts are re-enabled. We masked interrupts with the very first instruction of the bootloader, and so far we have never gotten around to re-enabling them.
```
Exercise 13. Modify `kern/trapentry.S` and `kern/trap.c` to initialize the appropriate entries in the IDT and provide handlers for IRQs 0 through 15. Then modify the code in `env_alloc()` in `kern/env.c` to ensure that user environments are always run with interrupts enabled.
Also uncomment the `sti` instruction in `sched_halt()` so that idle CPUs unmask interrupts.
The processor never pushes an error code when invoking a hardware interrupt handler. You might want to re-read section 9.2 of the [80386 Reference Manual][2], or section 5.8 of the [IA-32 Intel Architecture Software Developer's Manual, Volume 3][3], at this time.
After doing this exercise, if you run your kernel with any test program that runs for a non-trivial length of time (e.g., `spin`), you should see the kernel print trap frames for hardware interrupts. While interrupts are now enabled in the processor, JOS isn't yet handling them, so you should see it misattribute each interrupt to the currently running user environment and destroy it. Eventually it should run out of environments to destroy and drop into the monitor.
```
###### Handling Clock Interrupts
In the `user/spin` program, after the child environment was first run, it just spun in a loop, and the kernel never got control back. We need to program the hardware to generate clock interrupts periodically, which will force control back to the kernel where we can switch control to a different user environment.
The calls to `lapic_init` and `pic_init` (from `i386_init` in `init.c`), which we have written for you, set up the clock and the interrupt controller to generate interrupts. You now need to write the code to handle these interrupts.
```
Exercise 14. Modify the kernel's `trap_dispatch()` function so that it calls `sched_yield()` to find and run a different environment whenever a clock interrupt takes place.
You should now be able to get the `user/spin` test to work: the parent environment should fork off the child, `sys_yield()` to it a couple times but in each case regain control of the CPU after one time slice, and finally kill the child environment and terminate gracefully.
```
This is a great time to do some _regression testing_. Make sure that you haven't broken any earlier part of that lab that used to work (e.g. `forktree`) by enabling interrupts. Also, try running with multiple CPUs using make CPUS=2 _target_. You should also be able to pass `stresssched` now. Run make grade to see for sure. You should now get a total score of 65/80 points on this lab.
##### Inter-Process communication (IPC)
(Technically in JOS this is "inter-environment communication" or "IEC", but everyone else calls it IPC, so we'll use the standard term.)
We've been focusing on the isolation aspects of the operating system, the ways it provides the illusion that each program has a machine all to itself. Another important service of an operating system is to allow programs to communicate with each other when they want to. It can be quite powerful to let programs interact with other programs. The Unix pipe model is the canonical example.
There are many models for interprocess communication. Even today there are still debates about which models are best. We won't get into that debate. Instead, we'll implement a simple IPC mechanism and then try it out.
###### IPC in JOS
You will implement a few additional JOS kernel system calls that collectively provide a simple interprocess communication mechanism. You will implement two system calls, `sys_ipc_recv` and `sys_ipc_try_send`. Then you will implement two library wrappers `ipc_recv` and `ipc_send`.
The "messages" that user environments can send to each other using JOS's IPC mechanism consist of two components: a single 32-bit value, and optionally a single page mapping. Allowing environments to pass page mappings in messages provides an efficient way to transfer more data than will fit into a single 32-bit integer, and also allows environments to set up shared memory arrangements easily.
###### Sending and Receiving Messages
To receive a message, an environment calls `sys_ipc_recv`. This system call de-schedules the current environment and does not run it again until a message has been received. When an environment is waiting to receive a message, _any_ other environment can send it a message - not just a particular environment, and not just environments that have a parent/child arrangement with the receiving environment. In other words, the permission checking that you implemented in Part A will not apply to IPC, because the IPC system calls are carefully designed so as to be "safe": an environment cannot cause another environment to malfunction simply by sending it messages (unless the target environment is also buggy).
To try to send a value, an environment calls `sys_ipc_try_send` with both the receiver's environment id and the value to be sent. If the named environment is actually receiving (it has called `sys_ipc_recv` and not gotten a value yet), then the send delivers the message and returns 0. Otherwise the send returns `-E_IPC_NOT_RECV` to indicate that the target environment is not currently expecting to receive a value.
A library function `ipc_recv` in user space will take care of calling `sys_ipc_recv` and then looking up the information about the received values in the current environment's `struct Env`.
Similarly, a library function `ipc_send` will take care of repeatedly calling `sys_ipc_try_send` until the send succeeds.
###### Transferring Pages
When an environment calls `sys_ipc_recv` with a valid `dstva` parameter (below `UTOP`), the environment is stating that it is willing to receive a page mapping. If the sender sends a page, then that page should be mapped at `dstva` in the receiver's address space. If the receiver already had a page mapped at `dstva`, then that previous page is unmapped.
When an environment calls `sys_ipc_try_send` with a valid `srcva` (below `UTOP`), it means the sender wants to send the page currently mapped at `srcva` to the receiver, with permissions `perm`. After a successful IPC, the sender keeps its original mapping for the page at `srcva` in its address space, but the receiver also obtains a mapping for this same physical page at the `dstva` originally specified by the receiver, in the receiver's address space. As a result this page becomes shared between the sender and receiver.
If either the sender or the receiver does not indicate that a page should be transferred, then no page is transferred. After any IPC the kernel sets the new field `env_ipc_perm` in the receiver's `Env` structure to the permissions of the page received, or zero if no page was received.
###### Implementing IPC
```
Exercise 15. Implement `sys_ipc_recv` and `sys_ipc_try_send` in `kern/syscall.c`. Read the comments on both before implementing them, since they have to work together. When you call `envid2env` in these routines, you should set the `checkperm` flag to 0, meaning that any environment is allowed to send IPC messages to any other environment, and the kernel does no special permission checking other than verifying that the target envid is valid.
Then implement the `ipc_recv` and `ipc_send` functions in `lib/ipc.c`.
Use the `user/pingpong` and `user/primes` functions to test your IPC mechanism. `user/primes` will generate for each prime number a new environment until JOS runs out of environments. You might find it interesting to read `user/primes.c` to see all the forking and IPC going on behind the scenes.
```
```
Challenge! Why does `ipc_send` have to loop? Change the system call interface so it doesn't have to. Make sure you can handle multiple environments trying to send to one environment at the same time.
```
```
Challenge! The prime sieve is only one neat use of message passing between a large number of concurrent programs. Read C. A. R. Hoare, ``Communicating Sequential Processes,'' _Communications of the ACM_ 21(8) (August 1978), 666-667, and implement the matrix multiplication example.
```
```
Challenge! One of the most impressive examples of the power of message passing is Doug McIlroy's power series calculator, described in [M. Douglas McIlroy, ``Squinting at Power Series,'' _Software--Practice and Experience_ , 20(7) (July 1990), 661-683][4]. Implement his power series calculator and compute the power series for _sin_ ( _x_ + _x_ ^3).
```
```
Challenge! Make JOS's IPC mechanism more efficient by applying some of the techniques from Liedtke's paper, [Improving IPC by Kernel Design][5], or any other tricks you may think of. Feel free to modify the kernel's system call API for this purpose, as long as your code is backwards compatible with what our grading scripts expect.
```
**This ends part C.** Make sure you pass all of the make grade tests and don't forget to write up your answers to the questions and a description of your challenge exercise solution in `answers-lab4.txt`.
Before handing in, use git status and git diff to examine your changes and don't forget to git add answers-lab4.txt. When you're ready, commit your changes with git commit -am 'my solutions to lab 4', then make handin and follow the directions.
--------------------------------------------------------------------------------
via: https://pdos.csail.mit.edu/6.828/2018/labs/lab4/
作者:[csail.mit][a]
选题:[lujun9972][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://pdos.csail.mit.edu
[b]: https://github.com/lujun9972
[1]: https://pdos.csail.mit.edu/6.828/2018/labs/lab4/uvpt.html
[2]: https://pdos.csail.mit.edu/6.828/2018/labs/readings/i386/toc.htm
[3]: https://pdos.csail.mit.edu/6.828/2018/labs/readings/ia32/IA32-3A.pdf
[4]: https://swtch.com/~rsc/thread/squint.pdf
[5]: http://dl.acm.org/citation.cfm?id=168633

View File

@ -0,0 +1,345 @@
Lab 5: File system, Spawn and Shell
======
**Due Thursday, November 15, 2018
**
### Introduction
In this lab, you will implement `spawn`, a library call that loads and runs on-disk executables. You will then flesh out your kernel and library operating system enough to run a shell on the console. These features need a file system, and this lab introduces a simple read/write file system.
#### Getting Started
Use Git to fetch the latest version of the course repository, and then create a local branch called `lab5` based on our lab5 branch, `origin/lab5`:
```
athena% cd ~/6.828/lab
athena% add git
athena% git pull
Already up-to-date.
athena% git checkout -b lab5 origin/lab5
Branch lab5 set up to track remote branch refs/remotes/origin/lab5.
Switched to a new branch "lab5"
athena% git merge lab4
Merge made by recursive.
.....
athena%
```
The main new component for this part of the lab is the file system environment, located in the new `fs` directory. Scan through all the files in this directory to get a feel for what all is new. Also, there are some new file system-related source files in the `user` and `lib` directories,
| fs/fs.c | Code that mainipulates the file system's on-disk structure. |
| fs/bc.c | A simple block cache built on top of our user-level page fault handling facility. |
| fs/ide.c | Minimal PIO-based (non-interrupt-driven) IDE driver code. |
| fs/serv.c | The file system server that interacts with client environments using file system IPCs. |
| lib/fd.c | Code that implements the general UNIX-like file descriptor interface. |
| lib/file.c | The driver for on-disk file type, implemented as a file system IPC client. |
| lib/console.c | The driver for console input/output file type. |
| lib/spawn.c | Code skeleton of the spawn library call. |
You should run the pingpong, primes, and forktree test cases from lab 4 again after merging in the new lab 5 code. You will need to comment out the `ENV_CREATE(fs_fs)` line in `kern/init.c` because `fs/fs.c` tries to do some I/O, which JOS does not allow yet. Similarly, temporarily comment out the call to `close_all()` in `lib/exit.c`; this function calls subroutines that you will implement later in the lab, and therefore will panic if called. If your lab 4 code doesn't contain any bugs, the test cases should run fine. Don't proceed until they work. Don't forget to un-comment these lines when you start Exercise 1.
If they don't work, use git diff lab4 to review all the changes, making sure there isn't any code you wrote for lab4 (or before) missing from lab 5. Make sure that lab 4 still works.
#### Lab Requirements
As before, you will need to do all of the regular exercises described in the lab and _at least one_ challenge problem. Additionally, you will need to write up brief answers to the questions posed in the lab and a short (e.g., one or two paragraph) description of what you did to solve your chosen challenge problem. If you implement more than one challenge problem, you only need to describe one of them in the write-up, though of course you are welcome to do more. Place the write-up in a file called `answers-lab5.txt` in the top level of your `lab5` directory before handing in your work.
### File system preliminaries
The file system you will work with is much simpler than most "real" file systems including that of xv6 UNIX, but it is powerful enough to provide the basic features: creating, reading, writing, and deleting files organized in a hierarchical directory structure.
We are (for the moment anyway) developing only a single-user operating system, which provides protection sufficient to catch bugs but not to protect multiple mutually suspicious users from each other. Our file system therefore does not support the UNIX notions of file ownership or permissions. Our file system also currently does not support hard links, symbolic links, time stamps, or special device files like most UNIX file systems do.
### On-Disk File System Structure
Most UNIX file systems divide available disk space into two main types of regions: _inode_ regions and _data_ regions. UNIX file systems assign one _inode_ to each file in the file system; a file's inode holds critical meta-data about the file such as its `stat` attributes and pointers to its data blocks. The data regions are divided into much larger (typically 8KB or more) _data blocks_ , within which the file system stores file data and directory meta-data. Directory entries contain file names and pointers to inodes; a file is said to be _hard-linked_ if multiple directory entries in the file system refer to that file's inode. Since our file system will not support hard links, we do not need this level of indirection and therefore can make a convenient simplification: our file system will not use inodes at all and instead will simply store all of a file's (or sub-directory's) meta-data within the (one and only) directory entry describing that file.
Both files and directories logically consist of a series of data blocks, which may be scattered throughout the disk much like the pages of an environment's virtual address space can be scattered throughout physical memory. The file system environment hides the details of block layout, presenting interfaces for reading and writing sequences of bytes at arbitrary offsets within files. The file system environment handles all modifications to directories internally as a part of performing actions such as file creation and deletion. Our file system does allow user environments to _read_ directory meta-data directly (e.g., with `read`), which means that user environments can perform directory scanning operations themselves (e.g., to implement the `ls` program) rather than having to rely on additional special calls to the file system. The disadvantage of this approach to directory scanning, and the reason most modern UNIX variants discourage it, is that it makes application programs dependent on the format of directory meta-data, making it difficult to change the file system's internal layout without changing or at least recompiling application programs as well.
#### Sectors and Blocks
Most disks cannot perform reads and writes at byte granularity and instead perform reads and writes in units of _sectors_. In JOS, sectors are 512 bytes each. File systems actually allocate and use disk storage in units of _blocks_. Be wary of the distinction between the two terms: _sector size_ is a property of the disk hardware, whereas _block size_ is an aspect of the operating system using the disk. A file system's block size must be a multiple of the sector size of the underlying disk.
The UNIX xv6 file system uses a block size of 512 bytes, the same as the sector size of the underlying disk. Most modern file systems use a larger block size, however, because storage space has gotten much cheaper and it is more efficient to manage storage at larger granularities. Our file system will use a block size of 4096 bytes, conveniently matching the processor's page size.
#### Superblocks
![Disk layout][1]
File systems typically reserve certain disk blocks at "easy-to-find" locations on the disk (such as the very start or the very end) to hold meta-data describing properties of the file system as a whole, such as the block size, disk size, any meta-data required to find the root directory, the time the file system was last mounted, the time the file system was last checked for errors, and so on. These special blocks are called _superblocks_.
Our file system will have exactly one superblock, which will always be at block 1 on the disk. Its layout is defined by `struct Super` in `inc/fs.h`. Block 0 is typically reserved to hold boot loaders and partition tables, so file systems generally do not use the very first disk block. Many "real" file systems maintain multiple superblocks, replicated throughout several widely-spaced regions of the disk, so that if one of them is corrupted or the disk develops a media error in that region, the other superblocks can still be found and used to access the file system.
#### File Meta-data
![File structure][2]
The layout of the meta-data describing a file in our file system is described by `struct File` in `inc/fs.h`. This meta-data includes the file's name, size, type (regular file or directory), and pointers to the blocks comprising the file. As mentioned above, we do not have inodes, so this meta-data is stored in a directory entry on disk. Unlike in most "real" file systems, for simplicity we will use this one `File` structure to represent file meta-data as it appears _both on disk and in memory_.
The `f_direct` array in `struct File` contains space to store the block numbers of the first 10 (`NDIRECT`) blocks of the file, which we call the file's _direct_ blocks. For small files up to 10*4096 = 40KB in size, this means that the block numbers of all of the file's blocks will fit directly within the `File` structure itself. For larger files, however, we need a place to hold the rest of the file's block numbers. For any file greater than 40KB in size, therefore, we allocate an additional disk block, called the file's _indirect block_ , to hold up to 4096/4 = 1024 additional block numbers. Our file system therefore allows files to be up to 1034 blocks, or just over four megabytes, in size. To support larger files, "real" file systems typically support _double-_ and _triple-indirect blocks_ as well.
#### Directories versus Regular Files
A `File` structure in our file system can represent either a _regular_ file or a directory; these two types of "files" are distinguished by the `type` field in the `File` structure. The file system manages regular files and directory-files in exactly the same way, except that it does not interpret the contents of the data blocks associated with regular files at all, whereas the file system interprets the contents of a directory-file as a series of `File` structures describing the files and subdirectories within the directory.
The superblock in our file system contains a `File` structure (the `root` field in `struct Super`) that holds the meta-data for the file system's root directory. The contents of this directory-file is a sequence of `File` structures describing the files and directories located within the root directory of the file system. Any subdirectories in the root directory may in turn contain more `File` structures representing sub-subdirectories, and so on.
### The File System
The goal for this lab is not to have you implement the entire file system, but for you to implement only certain key components. In particular, you will be responsible for reading blocks into the block cache and flushing them back to disk; allocating disk blocks; mapping file offsets to disk blocks; and implementing read, write, and open in the IPC interface. Because you will not be implementing all of the file system yourself, it is very important that you familiarize yourself with the provided code and the various file system interfaces.
### Disk Access
The file system environment in our operating system needs to be able to access the disk, but we have not yet implemented any disk access functionality in our kernel. Instead of taking the conventional "monolithic" operating system strategy of adding an IDE disk driver to the kernel along with the necessary system calls to allow the file system to access it, we instead implement the IDE disk driver as part of the user-level file system environment. We will still need to modify the kernel slightly, in order to set things up so that the file system environment has the privileges it needs to implement disk access itself.
It is easy to implement disk access in user space this way as long as we rely on polling, "programmed I/O" (PIO)-based disk access and do not use disk interrupts. It is possible to implement interrupt-driven device drivers in user mode as well (the L3 and L4 kernels do this, for example), but it is more difficult since the kernel must field device interrupts and dispatch them to the correct user-mode environment.
The x86 processor uses the IOPL bits in the EFLAGS register to determine whether protected-mode code is allowed to perform special device I/O instructions such as the IN and OUT instructions. Since all of the IDE disk registers we need to access are located in the x86's I/O space rather than being memory-mapped, giving "I/O privilege" to the file system environment is the only thing we need to do in order to allow the file system to access these registers. In effect, the IOPL bits in the EFLAGS register provides the kernel with a simple "all-or-nothing" method of controlling whether user-mode code can access I/O space. In our case, we want the file system environment to be able to access I/O space, but we do not want any other environments to be able to access I/O space at all.
```
Exercise 1. `i386_init` identifies the file system environment by passing the type `ENV_TYPE_FS` to your environment creation function, `env_create`. Modify `env_create` in `env.c`, so that it gives the file system environment I/O privilege, but never gives that privilege to any other environment.
Make sure you can start the file environment without causing a General Protection fault. You should pass the "fs i/o" test in make grade.
```
```
Question
1. Do you have to do anything else to ensure that this I/O privilege setting is saved and restored properly when you subsequently switch from one environment to another? Why?
```
Note that the `GNUmakefile` file in this lab sets up QEMU to use the file `obj/kern/kernel.img` as the image for disk 0 (typically "Drive C" under DOS/Windows) as before, and to use the (new) file `obj/fs/fs.img` as the image for disk 1 ("Drive D"). In this lab our file system should only ever touch disk 1; disk 0 is used only to boot the kernel. If you manage to corrupt either disk image in some way, you can reset both of them to their original, "pristine" versions simply by typing:
```
$ rm obj/kern/kernel.img obj/fs/fs.img
$ make
```
or by doing:
```
$ make clean
$ make
```
Challenge! Implement interrupt-driven IDE disk access, with or without DMA. You can decide whether to move the device driver into the kernel, keep it in user space along with the file system, or even (if you really want to get into the micro-kernel spirit) move it into a separate environment of its own.
### The Block Cache
In our file system, we will implement a simple "buffer cache" (really just a block cache) with the help of the processor's virtual memory system. The code for the block cache is in `fs/bc.c`.
Our file system will be limited to handling disks of size 3GB or less. We reserve a large, fixed 3GB region of the file system environment's address space, from 0x10000000 (`DISKMAP`) up to 0xD0000000 (`DISKMAP+DISKMAX`), as a "memory mapped" version of the disk. For example, disk block 0 is mapped at virtual address 0x10000000, disk block 1 is mapped at virtual address 0x10001000, and so on. The `diskaddr` function in `fs/bc.c` implements this translation from disk block numbers to virtual addresses (along with some sanity checking).
Since our file system environment has its own virtual address space independent of the virtual address spaces of all other environments in the system, and the only thing the file system environment needs to do is to implement file access, it is reasonable to reserve most of the file system environment's address space in this way. It would be awkward for a real file system implementation on a 32-bit machine to do this since modern disks are larger than 3GB. Such a buffer cache management approach may still be reasonable on a machine with a 64-bit address space.
Of course, it would take a long time to read the entire disk into memory, so instead we'll implement a form of _demand paging_ , wherein we only allocate pages in the disk map region and read the corresponding block from the disk in response to a page fault in this region. This way, we can pretend that the entire disk is in memory.
```
Exercise 2. Implement the `bc_pgfault` and `flush_block` functions in `fs/bc.c`. `bc_pgfault` is a page fault handler, just like the one your wrote in the previous lab for copy-on-write fork, except that its job is to load pages in from the disk in response to a page fault. When writing this, keep in mind that (1) `addr` may not be aligned to a block boundary and (2) `ide_read` operates in sectors, not blocks.
The `flush_block` function should write a block out to disk _if necessary_. `flush_block` shouldn't do anything if the block isn't even in the block cache (that is, the page isn't mapped) or if it's not dirty. We will use the VM hardware to keep track of whether a disk block has been modified since it was last read from or written to disk. To see whether a block needs writing, we can just look to see if the `PTE_D` "dirty" bit is set in the `uvpt` entry. (The `PTE_D` bit is set by the processor in response to a write to that page; see 5.2.4.3 in [chapter 5][3] of the 386 reference manual.) After writing the block to disk, `flush_block` should clear the `PTE_D` bit using `sys_page_map`.
Use make grade to test your code. Your code should pass "check_bc", "check_super", and "check_bitmap".
```
`fs_init` function in `fs/fs.c` is a prime example of how to use the block cache. After initializing the block cache, it simply stores pointers into the disk map region in the `super` global variable. After this point, we can simply read from the `super` structure as if they were in memory and our page fault handler will read them from disk as necessary.
```
Challenge! The block cache has no eviction policy. Once a block gets faulted in to it, it never gets removed and will remain in memory forevermore. Add eviction to the buffer cache. Using the `PTE_A` "accessed" bits in the page tables, which the hardware sets on any access to a page, you can track approximate usage of disk blocks without the need to modify every place in the code that accesses the disk map region. Be careful with dirty blocks.
```
### The Block Bitmap
After `fs_init` sets the `bitmap` pointer, we can treat `bitmap` as a packed array of bits, one for each block on the disk. See, for example, `block_is_free`, which simply checks whether a given block is marked free in the bitmap.
```
Exercise 3. Use `free_block` as a model to implement `alloc_block` in `fs/fs.c`, which should find a free disk block in the bitmap, mark it used, and return the number of that block. When you allocate a block, you should immediately flush the changed bitmap block to disk with `flush_block`, to help file system consistency.
Use make grade to test your code. Your code should now pass "alloc_block".
```
### File Operations
We have provided a variety of functions in `fs/fs.c` to implement the basic facilities you will need to interpret and manage `File` structures, scan and manage the entries of directory-files, and walk the file system from the root to resolve an absolute pathname. Read through _all_ of the code in `fs/fs.c` and make sure you understand what each function does before proceeding.
```
Exercise 4. Implement `file_block_walk` and `file_get_block`. `file_block_walk` maps from a block offset within a file to the pointer for that block in the `struct File` or the indirect block, very much like what `pgdir_walk` did for page tables. `file_get_block` goes one step further and maps to the actual disk block, allocating a new one if necessary.
Use make grade to test your code. Your code should pass "file_open", "file_get_block", and "file_flush/file_truncated/file rewrite", and "testfile".
```
`file_block_walk` and `file_get_block` are the workhorses of the file system. For example, `file_read` and `file_write` are little more than the bookkeeping atop `file_get_block` necessary to copy bytes between scattered blocks and a sequential buffer.
```
Challenge! The file system is likely to be corrupted if it gets interrupted in the middle of an operation (for example, by a crash or a reboot). Implement soft updates or journalling to make the file system crash-resilient and demonstrate some situation where the old file system would get corrupted, but yours doesn't.
```
### The file system interface
Now that we have the necessary functionality within the file system environment itself, we must make it accessible to other environments that wish to use the file system. Since other environments can't directly call functions in the file system environment, we'll expose access to the file system environment via a _remote procedure call_ , or RPC, abstraction, built atop JOS's IPC mechanism. Graphically, here's what a call to the file system server (say, read) looks like
```
Regular env FS env
+---------------+ +---------------+
| read | | file_read |
| (lib/fd.c) | | (fs/fs.c) |
...|.......|.......|...|.......^.......|...............
| v | | | | RPC mechanism
| devfile_read | | serve_read |
| (lib/file.c) | | (fs/serv.c) |
| | | | ^ |
| v | | | |
| fsipc | | serve |
| (lib/file.c) | | (fs/serv.c) |
| | | | ^ |
| v | | | |
| ipc_send | | ipc_recv |
| | | | ^ |
+-------|-------+ +-------|-------+
| |
+-------------------+
```
Everything below the dotted line is simply the mechanics of getting a read request from the regular environment to the file system environment. Starting at the beginning, `read` (which we provide) works on any file descriptor and simply dispatches to the appropriate device read function, in this case `devfile_read` (we can have more device types, like pipes). `devfile_read` implements `read` specifically for on-disk files. This and the other `devfile_*` functions in `lib/file.c` implement the client side of the FS operations and all work in roughly the same way, bundling up arguments in a request structure, calling `fsipc` to send the IPC request, and unpacking and returning the results. The `fsipc` function simply handles the common details of sending a request to the server and receiving the reply.
The file system server code can be found in `fs/serv.c`. It loops in the `serve` function, endlessly receiving a request over IPC, dispatching that request to the appropriate handler function, and sending the result back via IPC. In the read example, `serve` will dispatch to `serve_read`, which will take care of the IPC details specific to read requests such as unpacking the request structure and finally call `file_read` to actually perform the file read.
Recall that JOS's IPC mechanism lets an environment send a single 32-bit number and, optionally, share a page. To send a request from the client to the server, we use the 32-bit number for the request type (the file system server RPCs are numbered, just like how syscalls were numbered) and store the arguments to the request in a `union Fsipc` on the page shared via the IPC. On the client side, we always share the page at `fsipcbuf`; on the server side, we map the incoming request page at `fsreq` (`0x0ffff000`).
The server also sends the response back via IPC. We use the 32-bit number for the function's return code. For most RPCs, this is all they return. `FSREQ_READ` and `FSREQ_STAT` also return data, which they simply write to the page that the client sent its request on. There's no need to send this page in the response IPC, since the client shared it with the file system server in the first place. Also, in its response, `FSREQ_OPEN` shares with the client a new "Fd page". We'll return to the file descriptor page shortly.
```
Exercise 5. Implement `serve_read` in `fs/serv.c`.
`serve_read`'s heavy lifting will be done by the already-implemented `file_read` in `fs/fs.c` (which, in turn, is just a bunch of calls to `file_get_block`). `serve_read` just has to provide the RPC interface for file reading. Look at the comments and code in `serve_set_size` to get a general idea of how the server functions should be structured.
Use make grade to test your code. Your code should pass "serve_open/file_stat/file_close" and "file_read" for a score of 70/150.
```
```
Exercise 6. Implement `serve_write` in `fs/serv.c` and `devfile_write` in `lib/file.c`.
Use make grade to test your code. Your code should pass "file_write", "file_read after file_write", "open", and "large file" for a score of 90/150.
```
### Spawning Processes
We have given you the code for `spawn` (see `lib/spawn.c`) which creates a new environment, loads a program image from the file system into it, and then starts the child environment running this program. The parent process then continues running independently of the child. The `spawn` function effectively acts like a `fork` in UNIX followed by an immediate `exec` in the child process.
We implemented `spawn` rather than a UNIX-style `exec` because `spawn` is easier to implement from user space in "exokernel fashion", without special help from the kernel. Think about what you would have to do in order to implement `exec` in user space, and be sure you understand why it is harder.
```
Exercise 7. `spawn` relies on the new syscall `sys_env_set_trapframe` to initialize the state of the newly created environment. Implement `sys_env_set_trapframe` in `kern/syscall.c` (don't forget to dispatch the new system call in `syscall()`).
Test your code by running the `user/spawnhello` program from `kern/init.c`, which will attempt to spawn `/hello` from the file system.
Use make grade to test your code.
```
```
Challenge! Implement Unix-style `exec`.
```
```
Challenge! Implement `mmap`-style memory-mapped files and modify `spawn` to map pages directly from the ELF image when possible.
```
### Sharing library state across fork and spawn
The UNIX file descriptors are a general notion that also encompasses pipes, console I/O, etc. In JOS, each of these device types has a corresponding `struct Dev`, with pointers to the functions that implement read/write/etc. for that device type. `lib/fd.c` implements the general UNIX-like file descriptor interface on top of this. Each `struct Fd` indicates its device type, and most of the functions in `lib/fd.c` simply dispatch operations to functions in the appropriate `struct Dev`.
`lib/fd.c` also maintains the _file descriptor table_ region in each application environment's address space, starting at `FDTABLE`. This area reserves a page's worth (4KB) of address space for each of the up to `MAXFD` (currently 32) file descriptors the application can have open at once. At any given time, a particular file descriptor table page is mapped if and only if the corresponding file descriptor is in use. Each file descriptor also has an optional "data page" in the region starting at `FILEDATA`, which devices can use if they choose.
We would like to share file descriptor state across `fork` and `spawn`, but file descriptor state is kept in user-space memory. Right now, on `fork`, the memory will be marked copy-on-write, so the state will be duplicated rather than shared. (This means environments won't be able to seek in files they didn't open themselves and that pipes won't work across a fork.) On `spawn`, the memory will be left behind, not copied at all. (Effectively, the spawned environment starts with no open file descriptors.)
We will change `fork` to know that certain regions of memory are used by the "library operating system" and should always be shared. Rather than hard-code a list of regions somewhere, we will set an otherwise-unused bit in the page table entries (just like we did with the `PTE_COW` bit in `fork`).
We have defined a new `PTE_SHARE` bit in `inc/lib.h`. This bit is one of the three PTE bits that are marked "available for software use" in the Intel and AMD manuals. We will establish the convention that if a page table entry has this bit set, the PTE should be copied directly from parent to child in both `fork` and `spawn`. Note that this is different from marking it copy-on-write: as described in the first paragraph, we want to make sure to _share_ updates to the page.
```
Exercise 8. Change `duppage` in `lib/fork.c` to follow the new convention. If the page table entry has the `PTE_SHARE` bit set, just copy the mapping directly. (You should use `PTE_SYSCALL`, not `0xfff`, to mask out the relevant bits from the page table entry. `0xfff` picks up the accessed and dirty bits as well.)
Likewise, implement `copy_shared_pages` in `lib/spawn.c`. It should loop through all page table entries in the current process (just like `fork` did), copying any page mappings that have the `PTE_SHARE` bit set into the child process.
```
Use make run-testpteshare to check that your code is behaving properly. You should see lines that say "`fork handles PTE_SHARE right`" and "`spawn handles PTE_SHARE right`".
Use make run-testfdsharing to check that file descriptors are shared properly. You should see lines that say "`read in child succeeded`" and "`read in parent succeeded`".
### The keyboard interface
For the shell to work, we need a way to type at it. QEMU has been displaying output we write to the CGA display and the serial port, but so far we've only taken input while in the kernel monitor. In QEMU, input typed in the graphical window appear as input from the keyboard to JOS, while input typed to the console appear as characters on the serial port. `kern/console.c` already contains the keyboard and serial drivers that have been used by the kernel monitor since lab 1, but now you need to attach these to the rest of the system.
```
Exercise 9. In your `kern/trap.c`, call `kbd_intr` to handle trap `IRQ_OFFSET+IRQ_KBD` and `serial_intr` to handle trap `IRQ_OFFSET+IRQ_SERIAL`.
```
We implemented the console input/output file type for you, in `lib/console.c`. `kbd_intr` and `serial_intr` fill a buffer with the recently read input while the console file type drains the buffer (the console file type is used for stdin/stdout by default unless the user redirects them).
Test your code by running make run-testkbd and type a few lines. The system should echo your lines back to you as you finish them. Try typing in both the console and the graphical window, if you have both available.
### The Shell
Run make run-icode or make run-icode-nox. This will run your kernel and start `user/icode`. `icode` execs `init`, which will set up the console as file descriptors 0 and 1 (standard input and standard output). It will then spawn `sh`, the shell. You should be able to run the following commands:
```
echo hello world | cat
cat lorem |cat
cat lorem |num
cat lorem |num |num |num |num |num
lsfd
```
Note that the user library routine `cprintf` prints straight to the console, without using the file descriptor code. This is great for debugging but not great for piping into other programs. To print output to a particular file descriptor (for example, 1, standard output), use `fprintf(1, "...", ...)`. `printf("...", ...)` is a short-cut for printing to FD 1. See `user/lsfd.c` for examples.
```
Exercise 10.
The shell doesn't support I/O redirection. It would be nice to run sh <script instead of having to type in all the commands in the script by hand, as you did above. Add I/O redirection for < to `user/sh.c`.
Test your implementation by typing sh <script into your shell
Run make run-testshell to test your shell. `testshell` simply feeds the above commands (also found in `fs/testshell.sh`) into the shell and then checks that the output matches `fs/testshell.key`.
```
```
Challenge! Add more features to the shell. Possibilities include (a few require changes to the file system too):
* backgrounding commands (`ls &`)
* multiple commands per line (`ls; echo hi`)
* command grouping (`(ls; echo hi) | cat > out`)
* environment variable expansion (`echo $hello`)
* quoting (`echo "a | b"`)
* command-line history and/or editing
* tab completion
* directories, cd, and a PATH for command-lookup.
* file creation
* ctl-c to kill the running environment
but feel free to do something not on this list.
```
Your code should pass all tests at this point. As usual, you can grade your submission with make grade and hand it in with make handin.
**This completes the lab.** As usual, don't forget to run make grade and to write up your answers and a description of your challenge exercise solution. Before handing in, use git status and git diff to examine your changes and don't forget to git add answers-lab5.txt. When you're ready, commit your changes with git commit -am 'my solutions to lab 5', then make handin to submit your solution.
--------------------------------------------------------------------------------
via: https://pdos.csail.mit.edu/6.828/2018/labs/lab5/
作者:[csail.mit][a]
选题:[lujun9972][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://pdos.csail.mit.edu
[b]: https://github.com/lujun9972
[1]: https://pdos.csail.mit.edu/6.828/2018/labs/lab5/disk.png
[2]: https://pdos.csail.mit.edu/6.828/2018/labs/lab5/file.png
[3]: http://pdos.csail.mit.edu/6.828/2011/readings/i386/s05_02.htm

View File

@ -0,0 +1,511 @@
Lab 6: Network Driver
======
### Lab 6: Network Driver (default final project)
**Due on Thursday, December 6, 2018
**
### Introduction
This lab is the default final project that you can do on your own.
Now that you have a file system, no self respecting OS should go without a network stack. In this the lab you are going to write a driver for a network interface card. The card will be based on the Intel 82540EM chip, also known as the E1000.
##### Getting Started
Use Git to commit your Lab 5 source (if you haven't already), fetch the latest version of the course repository, and then create a local branch called `lab6` based on our lab6 branch, `origin/lab6`:
```
athena% cd ~/6.828/lab
athena% add git
athena% git commit -am 'my solution to lab5'
nothing to commit (working directory clean)
athena% git pull
Already up-to-date.
athena% git checkout -b lab6 origin/lab6
Branch lab6 set up to track remote branch refs/remotes/origin/lab6.
Switched to a new branch "lab6"
athena% git merge lab5
Merge made by recursive.
fs/fs.c | 42 +++++++++++++++++++
1 files changed, 42 insertions(+), 0 deletions(-)
athena%
```
The network card driver, however, will not be enough to get your OS hooked up to the Internet. In the new lab6 code, we have provided you with a network stack and a network server. As in previous labs, use git to grab the code for this lab, merge in your own code, and explore the contents of the new `net/` directory, as well as the new files in `kern/`.
In addition to writing the driver, you will need to create a system call interface to give access to your driver. You will implement missing network server code to transfer packets between the network stack and your driver. You will also tie everything together by finishing a web server. With the new web server you will be able to serve files from your file system.
Much of the kernel device driver code you will have to write yourself from scratch. This lab provides much less guidance than previous labs: there are no skeleton files, no system call interfaces written in stone, and many design decisions are left up to you. For this reason, we recommend that you read the entire assignment write up before starting any individual exercises. Many students find this lab more difficult than previous labs, so please plan your time accordingly.
##### Lab Requirements
As before, you will need to do all of the regular exercises described in the lab and _at least one_ challenge problem. Write up brief answers to the questions posed in the lab and a description of your challenge exercise in `answers-lab6.txt`.
#### QEMU's virtual network
We will be using QEMU's user mode network stack since it requires no administrative privileges to run. QEMU's documentation has more about user-net [here][1]. We've updated the makefile to enable QEMU's user-mode network stack and the virtual E1000 network card.
By default, QEMU provides a virtual router running on IP 10.0.2.2 and will assign JOS the IP address 10.0.2.15. To keep things simple, we hard-code these defaults into the network server in `net/ns.h`.
While QEMU's virtual network allows JOS to make arbitrary connections out to the Internet, JOS's 10.0.2.15 address has no meaning outside the virtual network running inside QEMU (that is, QEMU acts as a NAT), so we can't connect directly to servers running inside JOS, even from the host running QEMU. To address this, we configure QEMU to run a server on some port on the _host_ machine that simply connects through to some port in JOS and shuttles data back and forth between your real host and the virtual network.
You will run JOS servers on ports 7 (echo) and 80 (http). To avoid collisions on shared Athena machines, the makefile generates forwarding ports for these based on your user ID. To find out what ports QEMU is forwarding to on your development host, run make which-ports. For convenience, the makefile also provides make nc-7 and make nc-80, which allow you to interact directly with servers running on these ports in your terminal. (These targets only connect to a running QEMU instance; you must start QEMU itself separately.)
##### Packet Inspection
The makefile also configures QEMU's network stack to record all incoming and outgoing packets to `qemu.pcap` in your lab directory.
To get a hex/ASCII dump of captured packets use `tcpdump` like this:
```
tcpdump -XXnr qemu.pcap
```
Alternatively, you can use [Wireshark][2] to graphically inspect the pcap file. Wireshark also knows how to decode and inspect hundreds of network protocols. If you're on Athena, you'll have to use Wireshark's predecessor, ethereal, which is in the sipbnet locker.
##### Debugging the E1000
We are very lucky to be using emulated hardware. Since the E1000 is running in software, the emulated E1000 can report to us, in a user readable format, its internal state and any problems it encounters. Normally, such a luxury would not be available to a driver developer writing with bare metal.
The E1000 can produce a lot of debug output, so you have to enable specific logging channels. Some channels you might find useful are:
| Flag | Meaning |
| --------- | ---------------------------------------------------|
| tx | Log packet transmit operations |
| txerr | Log transmit ring errors |
| rx | Log changes to RCTL |
| rxfilter | Log filtering of incoming packets |
| rxerr | Log receive ring errors |
| unknown | Log reads and writes of unknown registers |
| eeprom | Log reads from the EEPROM |
| interrupt | Log interrupts and changes to interrupt registers. |
To enable "tx" and "txerr" logging, for example, use make E1000_DEBUG=tx,txerr ....
Note: `E1000_DEBUG` flags only work in the 6.828 version of QEMU.
You can take debugging using software emulated hardware one step further. If you are ever stuck and do not understand why the E1000 is not responding the way you would expect, you can look at QEMU's E1000 implementation in `hw/e1000.c`.
#### The Network Server
Writing a network stack from scratch is hard work. Instead, we will be using lwIP, an open source lightweight TCP/IP protocol suite that among many things includes a network stack. You can find more information on lwIP [here][3]. In this assignment, as far as we are concerned, lwIP is a black box that implements a BSD socket interface and has a packet input port and packet output port.
The network server is actually a combination of four environments:
* core network server environment (includes socket call dispatcher and lwIP)
* input environment
* output environment
* timer environment
The following diagram shows the different environments and their relationships. The diagram shows the entire system including the device driver, which will be covered later. In this lab, you will implement the parts highlighted in green.
![Network server architecture][4]
##### The Core Network Server Environment
The core network server environment is composed of the socket call dispatcher and lwIP itself. The socket call dispatcher works exactly like the file server. User environments use stubs (found in `lib/nsipc.c`) to send IPC messages to the core network environment. If you look at `lib/nsipc.c` you will see that we find the core network server the same way we found the file server: `i386_init` created the NS environment with NS_TYPE_NS, so we scan `envs`, looking for this special environment type. For each user environment IPC, the dispatcher in the network server calls the appropriate BSD socket interface function provided by lwIP on behalf of the user.
Regular user environments do not use the `nsipc_*` calls directly. Instead, they use the functions in `lib/sockets.c`, which provides a file descriptor-based sockets API. Thus, user environments refer to sockets via file descriptors, just like how they referred to on-disk files. A number of operations (`connect`, `accept`, etc.) are specific to sockets, but `read`, `write`, and `close` go through the normal file descriptor device-dispatch code in `lib/fd.c`. Much like how the file server maintained internal unique ID's for all open files, lwIP also generates unique ID's for all open sockets. In both the file server and the network server, we use information stored in `struct Fd` to map per-environment file descriptors to these unique ID spaces.
Even though it may seem that the IPC dispatchers of the file server and network server act the same, there is a key difference. BSD socket calls like `accept` and `recv` can block indefinitely. If the dispatcher were to let lwIP execute one of these blocking calls, the dispatcher would also block and there could only be one outstanding network call at a time for the whole system. Since this is unacceptable, the network server uses user-level threading to avoid blocking the entire server environment. For every incoming IPC message, the dispatcher creates a thread and processes the request in the newly created thread. If the thread blocks, then only that thread is put to sleep while other threads continue to run.
In addition to the core network environment there are three helper environments. Besides accepting messages from user applications, the core network environment's dispatcher also accepts messages from the input and timer environments.
##### The Output Environment
When servicing user environment socket calls, lwIP will generate packets for the network card to transmit. LwIP will send each packet to be transmitted to the output helper environment using the `NSREQ_OUTPUT` IPC message with the packet attached in the page argument of the IPC message. The output environment is responsible for accepting these messages and forwarding the packet on to the device driver via the system call interface that you will soon create.
##### The Input Environment
Packets received by the network card need to be injected into lwIP. For every packet received by the device driver, the input environment pulls the packet out of kernel space (using kernel system calls that you will implement) and sends the packet to the core server environment using the `NSREQ_INPUT` IPC message.
The packet input functionality is separated from the core network environment because JOS makes it hard to simultaneously accept IPC messages and poll or wait for a packet from the device driver. We do not have a `select` system call in JOS that would allow environments to monitor multiple input sources to identify which input is ready to be processed.
If you take a look at `net/input.c` and `net/output.c` you will see that both need to be implemented. This is mainly because the implementation depends on your system call interface. You will write the code for the two helper environments after you implement the driver and system call interface.
##### The Timer Environment
The timer environment periodically sends messages of type `NSREQ_TIMER` to the core network server notifying it that a timer has expired. The timer messages from this thread are used by lwIP to implement various network timeouts.
### Part A: Initialization and transmitting packets
Your kernel does not have a notion of time, so we need to add it. There is currently a clock interrupt that is generated by the hardware every 10ms. On every clock interrupt we can increment a variable to indicate that time has advanced by 10ms. This is implemented in `kern/time.c`, but is not yet fully integrated into your kernel.
```
Exercise 1. Add a call to `time_tick` for every clock interrupt in `kern/trap.c`. Implement `sys_time_msec` and add it to `syscall` in `kern/syscall.c` so that user space has access to the time.
```
Use make INIT_CFLAGS=-DTEST_NO_NS run-testtime to test your time code. You should see the environment count down from 5 in 1 second intervals. The "-DTEST_NO_NS" disables starting the network server environment because it will panic at this point in the lab.
#### The Network Interface Card
Writing a driver requires knowing in depth the hardware and the interface presented to the software. The lab text will provide a high-level overview of how to interface with the E1000, but you'll need to make extensive use of Intel's manual while writing your driver.
```
Exercise 2. Browse Intel's [Software Developer's Manual][5] for the E1000. This manual covers several closely related Ethernet controllers. QEMU emulates the 82540EM.
You should skim over chapter 2 now to get a feel for the device. To write your driver, you'll need to be familiar with chapters 3 and 14, as well as 4.1 (though not 4.1's subsections). You'll also need to use chapter 13 as reference. The other chapters mostly cover components of the E1000 that your driver won't have to interact with. Don't worry about the details right now; just get a feel for how the document is structured so you can find things later.
While reading the manual, keep in mind that the E1000 is a sophisticated device with many advanced features. A working E1000 driver only needs a fraction of the features and interfaces that the NIC provides. Think carefully about the easiest way to interface with the card. We strongly recommend that you get a basic driver working before taking advantage of the advanced features.
```
##### PCI Interface
The E1000 is a PCI device, which means it plugs into the PCI bus on the motherboard. The PCI bus has address, data, and interrupt lines, and allows the CPU to communicate with PCI devices and PCI devices to read and write memory. A PCI device needs to be discovered and initialized before it can be used. Discovery is the process of walking the PCI bus looking for attached devices. Initialization is the process of allocating I/O and memory space as well as negotiating the IRQ line for the device to use.
We have provided you with PCI code in `kern/pci.c`. To perform PCI initialization during boot, the PCI code walks the PCI bus looking for devices. When it finds a device, it reads its vendor ID and device ID and uses these two values as a key to search the `pci_attach_vendor` array. The array is composed of `struct pci_driver` entries like this:
```
struct pci_driver {
uint32_t key1, key2;
int (*attachfn) (struct pci_func *pcif);
};
```
If the discovered device's vendor ID and device ID match an entry in the array, the PCI code calls that entry's `attachfn` to perform device initialization. (Devices can also be identified by class, which is what the other driver table in `kern/pci.c` is for.)
The attach function is passed a _PCI function_ to initialize. A PCI card can expose multiple functions, though the E1000 exposes only one. Here is how we represent a PCI function in JOS:
```
struct pci_func {
struct pci_bus *bus;
uint32_t dev;
uint32_t func;
uint32_t dev_id;
uint32_t dev_class;
uint32_t reg_base[6];
uint32_t reg_size[6];
uint8_t irq_line;
};
```
The above structure reflects some of the entries found in Table 4-1 of Section 4.1 of the developer manual. The last three entries of `struct pci_func` are of particular interest to us, as they record the negotiated memory, I/O, and interrupt resources for the device. The `reg_base` and `reg_size` arrays contain information for up to six Base Address Registers or BARs. `reg_base` stores the base memory addresses for memory-mapped I/O regions (or base I/O ports for I/O port resources), `reg_size` contains the size in bytes or number of I/O ports for the corresponding base values from `reg_base`, and `irq_line` contains the IRQ line assigned to the device for interrupts. The specific meanings of the E1000 BARs are given in the second half of table 4-2.
When the attach function of a device is called, the device has been found but not yet _enabled_. This means that the PCI code has not yet determined the resources allocated to the device, such as address space and an IRQ line, and, thus, the last three elements of the `struct pci_func` structure are not yet filled in. The attach function should call `pci_func_enable`, which will enable the device, negotiate these resources, and fill in the `struct pci_func`.
```
Exercise 3. Implement an attach function to initialize the E1000. Add an entry to the `pci_attach_vendor` array in `kern/pci.c` to trigger your function if a matching PCI device is found (be sure to put it before the `{0, 0, 0}` entry that mark the end of the table). You can find the vendor ID and device ID of the 82540EM that QEMU emulates in section 5.2. You should also see these listed when JOS scans the PCI bus while booting.
For now, just enable the E1000 device via `pci_func_enable`. We'll add more initialization throughout the lab.
We have provided the `kern/e1000.c` and `kern/e1000.h` files for you so that you do not need to mess with the build system. They are currently blank; you need to fill them in for this exercise. You may also need to include the `e1000.h` file in other places in the kernel.
When you boot your kernel, you should see it print that the PCI function of the E1000 card was enabled. Your code should now pass the `pci attach` test of make grade.
```
##### Memory-mapped I/O
Software communicates with the E1000 via _memory-mapped I/O_ (MMIO). You've seen this twice before in JOS: both the CGA console and the LAPIC are devices that you control and query by writing to and reading from "memory". But these reads and writes don't go to DRAM; they go directly to these devices.
`pci_func_enable` negotiates an MMIO region with the E1000 and stores its base and size in BAR 0 (that is, `reg_base[0]` and `reg_size[0]`). This is a range of _physical memory addresses_ assigned to the device, which means you'll have to do something to access it via virtual addresses. Since MMIO regions are assigned very high physical addresses (typically above 3GB), you can't use `KADDR` to access it because of JOS's 256MB limit. Thus, you'll have to create a new memory mapping. We'll use the area above MMIOBASE (your `mmio_map_region` from lab 4 will make sure we don't overwrite the mapping used by the LAPIC). Since PCI device initialization happens before JOS creates user environments, you can create the mapping in `kern_pgdir` and it will always be available.
```
Exercise 4. In your attach function, create a virtual memory mapping for the E1000's BAR 0 by calling `mmio_map_region` (which you wrote in lab 4 to support memory-mapping the LAPIC).
You'll want to record the location of this mapping in a variable so you can later access the registers you just mapped. Take a look at the `lapic` variable in `kern/lapic.c` for an example of one way to do this. If you do use a pointer to the device register mapping, be sure to declare it `volatile`; otherwise, the compiler is allowed to cache values and reorder accesses to this memory.
To test your mapping, try printing out the device status register (section 13.4.2). This is a 4 byte register that starts at byte 8 of the register space. You should get `0x80080783`, which indicates a full duplex link is up at 1000 MB/s, among other things.
```
Hint: You'll need a lot of constants, like the locations of registers and values of bit masks. Trying to copy these out of the developer's manual is error-prone and mistakes can lead to painful debugging sessions. We recommend instead using QEMU's [`e1000_hw.h`][6] header as a guideline. We don't recommend copying it in verbatim, because it defines far more than you actually need and may not define things in the way you need, but it's a good starting point.
##### DMA
You could imagine transmitting and receiving packets by writing and reading from the E1000's registers, but this would be slow and would require the E1000 to buffer packet data internally. Instead, the E1000 uses _Direct Memory Access_ or DMA to read and write packet data directly from memory without involving the CPU. The driver is responsible for allocating memory for the transmit and receive queues, setting up DMA descriptors, and configuring the E1000 with the location of these queues, but everything after that is asynchronous. To transmit a packet, the driver copies it into the next DMA descriptor in the transmit queue and informs the E1000 that another packet is available; the E1000 will copy the data out of the descriptor when there is time to send the packet. Likewise, when the E1000 receives a packet, it copies it into the next DMA descriptor in the receive queue, which the driver can read from at its next opportunity.
The receive and transmit queues are very similar at a high level. Both consist of a sequence of _descriptors_. While the exact structure of these descriptors varies, each descriptor contains some flags and the physical address of a buffer containing packet data (either packet data for the card to send, or a buffer allocated by the OS for the card to write a received packet to).
The queues are implemented as circular arrays, meaning that when the card or the driver reach the end of the array, it wraps back around to the beginning. Both have a _head pointer_ and a _tail pointer_ and the contents of the queue are the descriptors between these two pointers. The hardware always consumes descriptors from the head and moves the head pointer, while the driver always add descriptors to the tail and moves the tail pointer. The descriptors in the transmit queue represent packets waiting to be sent (hence, in the steady state, the transmit queue is empty). For the receive queue, the descriptors in the queue are free descriptors that the card can receive packets into (hence, in the steady state, the receive queue consists of all available receive descriptors). Correctly updating the tail register without confusing the E1000 is tricky; be careful!
The pointers to these arrays as well as the addresses of the packet buffers in the descriptors must all be _physical addresses_ because hardware performs DMA directly to and from physical RAM without going through the MMU.
#### Transmitting Packets
The transmit and receive functions of the E1000 are basically independent of each other, so we can work on one at a time. We'll attack transmitting packets first simply because we can't test receive without transmitting an "I'm here!" packet first.
First, you'll have to initialize the card to transmit, following the steps described in section 14.5 (you don't have to worry about the subsections). The first step of transmit initialization is setting up the transmit queue. The precise structure of the queue is described in section 3.4 and the structure of the descriptors is described in section 3.3.3. We won't be using the TCP offload features of the E1000, so you can focus on the "legacy transmit descriptor format." You should read those sections now and familiarize yourself with these structures.
##### C Structures
You'll find it convenient to use C `struct`s to describe the E1000's structures. As you've seen with structures like the `struct Trapframe`, C `struct`s let you precisely layout data in memory. C can insert padding between fields, but the E1000's structures are laid out such that this shouldn't be a problem. If you do encounter field alignment problems, look into GCC's "packed" attribute.
As an example, consider the legacy transmit descriptor given in table 3-8 of the manual and reproduced here:
```
63 48 47 40 39 32 31 24 23 16 15 0
+---------------------------------------------------------------+
| Buffer address |
+---------------|-------|-------|-------|-------|---------------+
| Special | CSS | Status| Cmd | CSO | Length |
+---------------|-------|-------|-------|-------|---------------+
```
The first byte of the structure starts at the top right, so to convert this into a C struct, read from right to left, top to bottom. If you squint at it right, you'll see that all of the fields even fit nicely into a standard-size types:
```
struct tx_desc
{
uint64_t addr;
uint16_t length;
uint8_t cso;
uint8_t cmd;
uint8_t status;
uint8_t css;
uint16_t special;
};
```
Your driver will have to reserve memory for the transmit descriptor array and the packet buffers pointed to by the transmit descriptors. There are several ways to do this, ranging from dynamically allocating pages to simply declaring them in global variables. Whatever you choose, keep in mind that the E1000 accesses physical memory directly, which means any buffer it accesses must be contiguous in physical memory.
There are also multiple ways to handle the packet buffers. The simplest, which we recommend starting with, is to reserve space for a packet buffer for each descriptor during driver initialization and simply copy packet data into and out of these pre-allocated buffers. The maximum size of an Ethernet packet is 1518 bytes, which bounds how big these buffers need to be. More sophisticated drivers could dynamically allocate packet buffers (e.g., to reduce memory overhead when network usage is low) or even pass buffers directly provided by user space (a technique known as "zero copy"), but it's good to start simple.
```
Exercise 5. Perform the initialization steps described in section 14.5 (but not its subsections). Use section 13 as a reference for the registers the initialization process refers to and sections 3.3.3 and 3.4 for reference to the transmit descriptors and transmit descriptor array.
Be mindful of the alignment requirements on the transmit descriptor array and the restrictions on length of this array. Since TDLEN must be 128-byte aligned and each transmit descriptor is 16 bytes, your transmit descriptor array will need some multiple of 8 transmit descriptors. However, don't use more than 64 descriptors or our tests won't be able to test transmit ring overflow.
For the TCTL.COLD, you can assume full-duplex operation. For TIPG, refer to the default values described in table 13-77 of section 13.4.34 for the IEEE 802.3 standard IPG (don't use the values in the table in section 14.5).
```
Try running make E1000_DEBUG=TXERR,TX qemu. If you are using the course qemu, you should see an "e1000: tx disabled" message when you set the TDT register (since this happens before you set TCTL.EN) and no further "e1000" messages.
Now that transmit is initialized, you'll have to write the code to transmit a packet and make it accessible to user space via a system call. To transmit a packet, you have to add it to the tail of the transmit queue, which means copying the packet data into the next packet buffer and then updating the TDT (transmit descriptor tail) register to inform the card that there's another packet in the transmit queue. (Note that TDT is an _index_ into the transmit descriptor array, not a byte offset; the documentation isn't very clear about this.)
However, the transmit queue is only so big. What happens if the card has fallen behind transmitting packets and the transmit queue is full? In order to detect this condition, you'll need some feedback from the E1000. Unfortunately, you can't just use the TDH (transmit descriptor head) register; the documentation explicitly states that reading this register from software is unreliable. However, if you set the RS bit in the command field of a transmit descriptor, then, when the card has transmitted the packet in that descriptor, the card will set the DD bit in the status field of the descriptor. If a descriptor's DD bit is set, you know it's safe to recycle that descriptor and use it to transmit another packet.
What if the user calls your transmit system call, but the DD bit of the next descriptor isn't set, indicating that the transmit queue is full? You'll have to decide what to do in this situation. You could simply drop the packet. Network protocols are resilient to this, but if you drop a large burst of packets, the protocol may not recover. You could instead tell the user environment that it has to retry, much like you did for `sys_ipc_try_send`. This has the advantage of pushing back on the environment generating the data.
```
Exercise 6. Write a function to transmit a packet by checking that the next descriptor is free, copying the packet data into the next descriptor, and updating TDT. Make sure you handle the transmit queue being full.
```
Now would be a good time to test your packet transmit code. Try transmitting just a few packets by directly calling your transmit function from the kernel. You don't have to create packets that conform to any particular network protocol in order to test this. Run make E1000_DEBUG=TXERR,TX qemu to run your test. You should see something like
```
e1000: index 0: 0x271f00 : 9000002a 0
...
```
as you transmit packets. Each line gives the index in the transmit array, the buffer address of that transmit descriptor, the cmd/CSO/length fields, and the special/CSS/status fields. If QEMU doesn't print the values you expected from your transmit descriptor, check that you're filling in the right descriptor and that you configured TDBAL and TDBAH correctly. If you get "e1000: TDH wraparound @0, TDT x, TDLEN y" messages, that means the E1000 ran all the way through the transmit queue without stopping (if QEMU didn't check this, it would enter an infinite loop), which probably means you aren't manipulating TDT correctly. If you get lots of "e1000: tx disabled" messages, then you didn't set the transmit control register right.
Once QEMU runs, you can then run tcpdump -XXnr qemu.pcap to see the packet data that you transmitted. If you saw the expected "e1000: index" messages from QEMU, but your packet capture is empty, double check that you filled in every necessary field and bit in your transmit descriptors (the E1000 probably went through your transmit descriptors, but didn't think it had to send anything).
```
Exercise 7. Add a system call that lets you transmit packets from user space. The exact interface is up to you. Don't forget to check any pointers passed to the kernel from user space.
```
#### Transmitting Packets: Network Server
Now that you have a system call interface to the transmit side of your device driver, it's time to send packets. The output helper environment's goal is to do the following in a loop: accept `NSREQ_OUTPUT` IPC messages from the core network server and send the packets accompanying these IPC message to the network device driver using the system call you added above. The `NSREQ_OUTPUT` IPC's are sent by the `low_level_output` function in `net/lwip/jos/jif/jif.c`, which glues the lwIP stack to JOS's network system. Each IPC will include a page consisting of a `union Nsipc` with the packet in its `struct jif_pkt pkt` field (see `inc/ns.h`). `struct jif_pkt` looks like
```
struct jif_pkt {
int jp_len;
char jp_data[0];
};
```
`jp_len` represents the length of the packet. All subsequent bytes on the IPC page are dedicated to the packet contents. Using a zero-length array like `jp_data` at the end of a struct is a common C trick (some would say abomination) for representing buffers without pre-determined lengths. Since C doesn't do array bounds checking, as long as you ensure there's enough unused memory following the struct, you can use `jp_data` as if it were an array of any size.
Be aware of the interaction between the device driver, the output environment and the core network server when there is no more space in the device driver's transmit queue. The core network server sends packets to the output environment using IPC. If the output environment is suspended due to a send packet system call because the driver has no more buffer space for new packets, the core network server will block waiting for the output server to accept the IPC call.
```
Exercise 8. Implement `net/output.c`.
```
You can use `net/testoutput.c` to test your output code without involving the whole network server. Try running make E1000_DEBUG=TXERR,TX run-net_testoutput. You should see something like
```
Transmitting packet 0
e1000: index 0: 0x271f00 : 9000009 0
Transmitting packet 1
e1000: index 1: 0x2724ee : 9000009 0
...
```
and tcpdump -XXnr qemu.pcap should output
```
reading from file qemu.pcap, link-type EN10MB (Ethernet)
-5:00:00.600186 [|ether]
0x0000: 5061 636b 6574 2030 30 Packet.00
-5:00:00.610080 [|ether]
0x0000: 5061 636b 6574 2030 31 Packet.01
...
```
To test with a larger packet count, try make E1000_DEBUG=TXERR,TX NET_CFLAGS=-DTESTOUTPUT_COUNT=100 run-net_testoutput. If this overflows your transmit ring, double check that you're handling the DD status bit correctly and that you've told the hardware to set the DD status bit (using the RS command bit).
Your code should pass the `testoutput` tests of make grade.
```
Question
1. How did you structure your transmit implementation? In particular, what do you do if the transmit ring is full?
```
### Part B: Receiving packets and the web server
#### Receiving Packets
Just like you did for transmitting packets, you'll have to configure the E1000 to receive packets and provide a receive descriptor queue and receive descriptors. Section 3.2 describes how packet reception works, including the receive queue structure and receive descriptors, and the initialization process is detailed in section 14.4.
```
Exercise 9. Read section 3.2. You can ignore anything about interrupts and checksum offloading (you can return to these sections if you decide to use these features later), and you don't have to be concerned with the details of thresholds and how the card's internal caches work.
```
The receive queue is very similar to the transmit queue, except that it consists of empty packet buffers waiting to be filled with incoming packets. Hence, when the network is idle, the transmit queue is empty (because all packets have been sent), but the receive queue is full (of empty packet buffers).
When the E1000 receives a packet, it first checks if it matches the card's configured filters (for example, to see if the packet is addressed to this E1000's MAC address) and ignores the packet if it doesn't match any filters. Otherwise, the E1000 tries to retrieve the next receive descriptor from the head of the receive queue. If the head (RDH) has caught up with the tail (RDT), then the receive queue is out of free descriptors, so the card drops the packet. If there is a free receive descriptor, it copies the packet data into the buffer pointed to by the descriptor, sets the descriptor's DD (Descriptor Done) and EOP (End of Packet) status bits, and increments the RDH.
If the E1000 receives a packet that is larger than the packet buffer in one receive descriptor, it will retrieve as many descriptors as necessary from the receive queue to store the entire contents of the packet. To indicate that this has happened, it will set the DD status bit on all of these descriptors, but only set the EOP status bit on the last of these descriptors. You can either deal with this possibility in your driver, or simply configure the card to not accept "long packets" (also known as _jumbo frames_ ) and make sure your receive buffers are large enough to store the largest possible standard Ethernet packet (1518 bytes).
```
Exercise 10. Set up the receive queue and configure the E1000 by following the process in section 14.4. You don't have to support "long packets" or multicast. For now, don't configure the card to use interrupts; you can change that later if you decide to use receive interrupts. Also, configure the E1000 to strip the Ethernet CRC, since the grade script expects it to be stripped.
By default, the card will filter out _all_ packets. You have to configure the Receive Address Registers (RAL and RAH) with the card's own MAC address in order to accept packets addressed to that card. You can simply hard-code QEMU's default MAC address of 52:54:00:12:34:56 (we already hard-code this in lwIP, so doing it here too doesn't make things any worse). Be very careful with the byte order; MAC addresses are written from lowest-order byte to highest-order byte, so 52:54:00:12 are the low-order 32 bits of the MAC address and 34:56 are the high-order 16 bits.
The E1000 only supports a specific set of receive buffer sizes (given in the description of RCTL.BSIZE in 13.4.22). If you make your receive packet buffers large enough and disable long packets, you won't have to worry about packets spanning multiple receive buffers. Also, remember that, just like for transmit, the receive queue and the packet buffers must be contiguous in physical memory.
You should use at least 128 receive descriptors
```
You can do a basic test of receive functionality now, even without writing the code to receive packets. Run make E1000_DEBUG=TX,TXERR,RX,RXERR,RXFILTER run-net_testinput. `testinput` will transmit an ARP (Address Resolution Protocol) announcement packet (using your packet transmitting system call), which QEMU will automatically reply to. Even though your driver can't receive this reply yet, you should see a "e1000: unicast match[0]: 52:54:00:12:34:56" message, indicating that a packet was received by the E1000 and matched the configured receive filter. If you see a "e1000: unicast mismatch: 52:54:00:12:34:56" message instead, the E1000 filtered out the packet, which means you probably didn't configure RAL and RAH correctly. Make sure you got the byte ordering right and didn't forget to set the "Address Valid" bit in RAH. If you don't get any "e1000" messages, you probably didn't enable receive correctly.
Now you're ready to implement receiving packets. To receive a packet, your driver will have to keep track of which descriptor it expects to hold the next received packet (hint: depending on your design, there's probably already a register in the E1000 keeping track of this). Similar to transmit, the documentation states that the RDH register cannot be reliably read from software, so in order to determine if a packet has been delivered to this descriptor's packet buffer, you'll have to read the DD status bit in the descriptor. If the DD bit is set, you can copy the packet data out of that descriptor's packet buffer and then tell the card that the descriptor is free by updating the queue's tail index, RDT.
If the DD bit isn't set, then no packet has been received. This is the receive-side equivalent of when the transmit queue was full, and there are several things you can do in this situation. You can simply return a "try again" error and require the caller to retry. While this approach works well for full transmit queues because that's a transient condition, it is less justifiable for empty receive queues because the receive queue may remain empty for long stretches of time. A second approach is to suspend the calling environment until there are packets in the receive queue to process. This tactic is very similar to `sys_ipc_recv`. Just like in the IPC case, since we have only one kernel stack per CPU, as soon as we leave the kernel the state on the stack will be lost. We need to set a flag indicating that an environment has been suspended by receive queue underflow and record the system call arguments. The drawback of this approach is complexity: the E1000 must be instructed to generate receive interrupts and the driver must handle them in order to resume the environment blocked waiting for a packet.
```
Exercise 11. Write a function to receive a packet from the E1000 and expose it to user space by adding a system call. Make sure you handle the receive queue being empty.
```
```
Challenge! If the transmit queue is full or the receive queue is empty, the environment and your driver may spend a significant amount of CPU cycles polling, waiting for a descriptor. The E1000 can generate an interrupt once it is finished with a transmit or receive descriptor, avoiding the need for polling. Modify your driver so that processing the both the transmit and receive queues is interrupt driven instead of polling.
Note that, once an interrupt is asserted, it will remain asserted until the driver clears the interrupt. In your interrupt handler make sure to clear the interrupt as soon as you handle it. If you don't, after returning from your interrupt handler, the CPU will jump back into it again. In addition to clearing the interrupts on the E1000 card, interrupts also need to be cleared on the LAPIC. Use `lapic_eoi` to do so.
```
#### Receiving Packets: Network Server
In the network server input environment, you will need to use your new receive system call to receive packets and pass them to the core network server environment using the `NSREQ_INPUT` IPC message. These IPC input message should have a page attached with a `union Nsipc` with its `struct jif_pkt pkt` field filled in with the packet received from the network.
```
Exercise 12. Implement `net/input.c`.
```
Run `testinput` again with make E1000_DEBUG=TX,TXERR,RX,RXERR,RXFILTER run-net_testinput. You should see
```
Sending ARP announcement...
Waiting for packets...
e1000: index 0: 0x26dea0 : 900002a 0
e1000: unicast match[0]: 52:54:00:12:34:56
input: 0000 5254 0012 3456 5255 0a00 0202 0806 0001
input: 0010 0800 0604 0002 5255 0a00 0202 0a00 0202
input: 0020 5254 0012 3456 0a00 020f 0000 0000 0000
input: 0030 0000 0000 0000 0000 0000 0000 0000 0000
```
The lines beginning with "input:" are a hexdump of QEMU's ARP reply.
Your code should pass the `testinput` tests of make grade. Note that there's no way to test packet receiving without sending at least one ARP packet to inform QEMU of JOS' IP address, so bugs in your transmitting code can cause this test to fail.
To more thoroughly test your networking code, we have provided a daemon called `echosrv` that sets up an echo server running on port 7 that will echo back anything sent over a TCP connection. Use make E1000_DEBUG=TX,TXERR,RX,RXERR,RXFILTER run-echosrv to start the echo server in one terminal and make nc-7 in another to connect to it. Every line you type should be echoed back by the server. Every time the emulated E1000 receives a packet, QEMU should print something like the following to the console:
```
e1000: unicast match[0]: 52:54:00:12:34:56
e1000: index 2: 0x26ea7c : 9000036 0
e1000: index 3: 0x26f06a : 9000039 0
e1000: unicast match[0]: 52:54:00:12:34:56
```
At this point, you should also be able to pass the `echosrv` test.
```
Question
2. How did you structure your receive implementation? In particular, what do you do if the receive queue is empty and a user environment requests the next incoming packet?
```
```
Challenge! Read about the EEPROM in the developer's manual and write the code to load the E1000's MAC address out of the EEPROM. Currently, QEMU's default MAC address is hard-coded into both your receive initialization and lwIP. Fix your initialization to use the MAC address you read from the EEPROM, add a system call to pass the MAC address to lwIP, and modify lwIP to the MAC address read from the card. Test your change by configuring QEMU to use a different MAC address.
```
```
Challenge! Modify your E1000 driver to be "zero copy." Currently, packet data has to be copied from user-space buffers to transmit packet buffers and from receive packet buffers back to user-space buffers. A zero copy driver avoids this by having user space and the E1000 share packet buffer memory directly. There are many different approaches to this, including mapping the kernel-allocated structures into user space or passing user-provided buffers directly to the E1000. Regardless of your approach, be careful how you reuse buffers so that you don't introduce races between user-space code and the E1000.
```
```
Challenge! Take the zero copy concept all the way into lwIP.
A typical packet is composed of many headers. The user sends data to be transmitted to lwIP in one buffer. The TCP layer wants to add a TCP header, the IP layer an IP header and the MAC layer an Ethernet header. Even though there are many parts to a packet, right now the parts need to be joined together so that the device driver can send the final packet.
The E1000's transmit descriptor design is well-suited to collecting pieces of a packet scattered throughout memory, like the packet fragments created inside lwIP. If you enqueue multiple transmit descriptors, but only set the EOP command bit on the last one, then the E1000 will internally concatenate the packet buffers from these descriptors and only transmit the concatenated buffer when it reaches the EOP-marked descriptor. As a result, the individual packet pieces never need to be joined together in memory.
Change your driver to be able to send packets composed of many buffers without copying and modify lwIP to avoid merging the packet pieces as it does right now.
```
```
Challenge! Augment your system call interface to service more than one user environment. This will prove useful if there are multiple network stacks (and multiple network servers) each with their own IP address running in user mode. The receive system call will need to decide to which environment it needs to forward each incoming packet.
Note that the current interface cannot tell the difference between two packets and if multiple environments call the packet receive system call, each respective environment will get a subset of the incoming packets and that subset may include packets that are not destined to the calling environment.
Sections 2.2 and 3 in [this][7] Exokernel paper have an in-depth explanation of the problem and a method of addressing it in a kernel like JOS. Use the paper to help you get a grip on the problem, chances are you do not need a solution as complex as presented in the paper.
```
#### The Web Server
A web server in its simplest form sends the contents of a file to the requesting client. We have provided skeleton code for a very simple web server in `user/httpd.c`. The skeleton code deals with incoming connections and parses the headers.
```
Exercise 13. The web server is missing the code that deals with sending the contents of a file back to the client. Finish the web server by implementing `send_file` and `send_data`.
```
Once you've finished the web server, start the webserver (make run-httpd-nox) and point your favorite browser at http:// _host_ : _port_ /index.html, where _host_ is the name of the computer running QEMU (If you're running QEMU on athena use `hostname.mit.edu` (hostname is the output of the `hostname` command on athena, or `localhost` if you're running the web browser and QEMU on the same computer) and _port_ is the port number reported for the web server by make which-ports . You should see a web page served by the HTTP server running inside JOS.
At this point, you should score 105/105 on make grade.
```
Challenge! Add a simple chat server to JOS, where multiple people can connect to the server and anything that any user types is transmitted to the other users. To do this, you will have to find a way to communicate with multiple sockets at once _and_ to send and receive on the same socket at the same time. There are multiple ways to go about this. lwIP provides a MSG_DONTWAIT flag for recv (see `lwip_recvfrom` in `net/lwip/api/sockets.c`), so you could constantly loop through all open sockets, polling them for data. Note that, while `recv` flags are supported by the network server IPC, they aren't accessible via the regular `read` function, so you'll need a way to pass the flags. A more efficient approach is to start one or more environments for each connection and to use IPC to coordinate them. Conveniently, the lwIP socket ID found in the struct Fd for a socket is global (not per-environment), so, for example, the child of a `fork` inherits its parents sockets. Or, an environment can even send on another environment's socket simply by constructing an Fd containing the right socket ID.
```
```
Question
3. What does the web page served by JOS's web server say?
4. How long approximately did it take you to do this lab?
```
**This completes the lab.** As usual, don't forget to run make grade and to write up your answers and a description of your challenge exercise solution. Before handing in, use git status and git diff to examine your changes and don't forget to git add answers-lab6.txt. When you're ready, commit your changes with git commit -am 'my solutions to lab 6', then make handin and follow the directions.
--------------------------------------------------------------------------------
via: https://pdos.csail.mit.edu/6.828/2018/labs/lab6/
作者:[csail.mit][a]
选题:[lujun9972][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://pdos.csail.mit.edu
[b]: https://github.com/lujun9972
[1]: http://wiki.qemu.org/download/qemu-doc.html#Using-the-user-mode-network-stack
[2]: http://www.wireshark.org/
[3]: http://www.sics.se/~adam/lwip/
[4]: https://pdos.csail.mit.edu/6.828/2018/labs/lab6/ns.png
[5]: https://pdos.csail.mit.edu/6.828/2018/readings/hardware/8254x_GBe_SDM.pdf
[6]: https://pdos.csail.mit.edu/6.828/2018/labs/lab6/e1000_hw.h
[7]: http://pdos.csail.mit.edu/papers/exo:tocs.pdf

View File

@ -0,0 +1,201 @@
为什么 Python 这么慢?
============================================================
Python 现在越来越火,已经迅速扩张到包括 DevOps、数据科学、web 开发、信息安全等各个领域当中。
然而,相比起 Python 扩张的速度Python 代码的运行速度就显得有点逊色了。
![](https://cdn-images-1.medium.com/max/1200/0*M2qZQsVnDS-4i5zc.jpg)
> 在代码运行速度方面Java、C、C++、C和 Python 要如何进行比较呢?并没有一个放之四海而皆准的标准,因为具体结果很大程度上取决于运行的程序类型,而<ruby>语言基准测试<rt>Computer Language Benchmarks Games</rt></ruby>可以作为[衡量的一个方面][5]。
根据我这些年来进行语言基准测试的经验来看Python 比很多语言运行起来都要慢。无论是使用 [JIT][7] 编译器的 C、Java还是使用 [AOT][8] 编译器的 C、C ++,又或者是 JavaScript 这些解释型语言Python 都[比它们运行得慢][6]。
注意:对于文中的 Python ,一般指 CPython 这个官方的实现。当然我也会在本文中提到其它语言的 Python 实现。
> 我要回答的是这个问题对于一个类似的程序Python 要比其它语言慢 2 到 10 倍不等,这其中的原因是什么?又有没有改善的方法呢?
主流的说法有这些:
* “是<ruby>全局解释器锁<rt>Global Interpreter Lock</rt></ruby>GIL的原因”
* “是因为 Python 是解释型语言而不是编译型语言”
* “是因为 Python 是一种动态类型的语言”
哪一个才是是影响 Python 运行效率的主要原因呢?
### 是全局解释器锁的原因吗?
现在很多计算机都配备了具有多个核的 CPU ,有时甚至还会有多个处理器。为了更充分利用它们的处理能力,操作系统定义了一个称为线程的低级结构。某一个进程(例如 Chrome 浏览器可以建立多个线程在系统内执行不同的操作。在这种情况下CPU 密集型进程就可以跨核心共享负载了,这样的做法可以大大提高应用程序的运行效率。
例如在我写这篇文章时,我的 Chrome 浏览器打开了 44 个线程。要知道的是,基于 POSIX 的操作系统(例如 Mac OS、Linux和 Windows 操作系统的线程结构、API 都是不同的,因此操作系统还负责对各个线程的调度。
如果你还没有写过多线程执行的代码,你就需要了解一下线程锁的概念了。多线程进程比单线程进程更为复杂,是因为需要使用线程锁来确保同一个内存地址中的数据不会被多个线程同时访问或更改。
CPython 解释器在创建变量时,首先会分配内存,然后对该变量的引用进行计数,这称为<ruby>引用计数<rt>reference counting</rt></ruby>。如果变量的引用数变为 0这个变量就会从内存中释放掉。这就是在 for 循环代码块内创建临时变量不会增加内存消耗的原因。
而当多个线程内共享一个变量时CPython 锁定引用计数的关键就在于使用了 GIL它会谨慎地控制线程的执行情况无论同时存在多少个线程每次只允许一个线程进行操作。
#### 这会对 Python 程序的性能有什么影响?
如果你的程序只有单线程、单进程,代码的速度和性能不会受到全局解释器锁的影响。
但如果你通过在单进程中使用多线程实现并发,并且是 IO 密集型(例如网络 IO 或磁盘 IO的线程GIL 竞争的效果就很明显了。
![](https://cdn-images-1.medium.com/max/1600/0*S_iSksY5oM5H1Qf_.png)
由 David Beazley 提供的 GIL 竞争情况图[http://dabeaz.blogspot.com/2010/01/python-gil-visualized.html][1]
对于一个 web 应用(例如 Django同时还使用了 WSGI那么对这个 web 应用的每一个请求都是一个单独的 Python 进程,而且每个请求只有一个锁。同时 Python 解释器的启动也比较慢,某些 WSGI 实现还具有“守护进程模式”,[就会导致 Python 进程非常繁忙][9]。
#### 其它的 Python 解释器表现如何?
[PyPy 也是一种带有 GIL 的解释器][10],但通常比 CPython 要快 3 倍以上。
[Jython 则是一种没有 GIL 的解释器][11],这是因为 Jython 中的 Python 线程使用 Java 线程来实现,并且由 JVM 内存管理系统来进行管理。
#### JavaScript 在这方面又是怎样做的呢?
所有的 Javascript 引擎使用的都是 [mark-and-sweep 垃圾收集算法][12],而 GIL 使用的则是 CPython 的内存管理算法。因此 JavaScript 没有 GIL而且它是单线程的也不需要用到 GIL JavaScript 的事件循环和 Promise/Callback 模式实现了以异步编程的方式代替并发。在 Python 当中也有一个类似的 asyncio 事件循环。
### 是因为 Python 是解释型语言吗?
我经常会听到这个说法,但其实当终端上执行 `python myscript.py` 之后CPython 会对代码进行一系列的读取、语法分析、解析、编译、解释和执行的操作。
如果你对这一系列过程感兴趣,也可以阅读一下我之前的文章:
[在 6 分钟内修改 Python 语言][13]
创建 `.pyc` 文件是这个过程的重点。在代码编译阶段Python 3 会将字节码序列写入 `__pycache__/` 下的文件中,而 Python 2 则会将字节码序列写入当前目录的 `.pyc` 文件中。对于你编写的脚本、导入的所有代码以及第三方模块都是如此。
因此绝大多数情况下除非你的代码是一次性的……Python 都会解释字节码并执行。与 Java、C#.NET 相比:
> Java 代码会被编译为“中间语言”,由 Java 虚拟机读取字节码,并将其即时编译为机器码。.NET CIL 也是如此,.NET CLRCommon-Language-Runtime将字节码即时编译为机器码。
既然 Python 不像 Java 和 C# 那样使用虚拟机或某种字节码,为什么 Python 在基准测试中仍然比 Java 和 C# 慢得多呢?首要原因是,.NET 和 Java 都是 JIT 编译的。
<ruby>即时编译<rt>Just-in-time compilation</rt></ruby>JIT需要一种中间语言以便将代码拆分为多个块或多个帧。而<ruby>提前编译器<rt>ahead of time compiler</rt></ruby>AOT则需要确保 CPU 在任何交互发生之前理解每一行代码。
JIT 本身是不会让执行速度加快的,因为它执行的仍然是同样的字节码序列。但是 JIT 会允许运行时的优化。一个优秀的 JIT 优化器会分析出程序的哪些部分会被多次执行,这就是程序中的“热点”,然后,优化器会将这些热点编译得更为高效以实现优化。
这就意味着如果你的程序是多次地重复相同的操作时有可能会被优化器优化得更快。而且Java 和 C# 是强类型语言,因此优化器对代码的判断可以更为准确。
PyPy 使用了明显快于 CPython 的 JIT。更详细的结果可以在这篇性能基准测试文章中看到
[哪一个 Python 版本最快?][15]
#### 那为什么 CPython 不使用 JIT 呢?
JIT 也不是完美的,它的一个显著缺点就在于启动时间。 CPython 的启动时间已经相对比较慢,而 PyPy 比 CPython 启动还要慢 2 到 3 倍,所以 Java 虚拟机启动速度已经是出了名的慢了。.NET CLR则通过在系统启动时自启动来优化体验 甚至还有专门运行 CLR 的操作系统。
因此如果你的 Python 进程在一次启动后就长时间运行JIT 就比较有意义了,因为代码里有“热点”可以优化。
尽管如此CPython 仍然是通用的代码实现。设想如果使用 Python 开发命令行程序,但每次调用 CLI 时都必须等待 JIT 缓慢启动,这种体验就相当不好了。
CPython 必须通过大量用例的测试,才有可能实现[将 JIT 插入到 CPython 中][17],但这个改进工作的进度基本处于停滞不前的状态。
> 如果你想充分发挥 JIT 的优势请使用PyPy。
### 是因为 Python 是一种动态类型的语言吗?
在 C、C++、Java、C#、Go 这些静态类型语言中,必须在声明变量时指定变量的类型。而在动态类型语言中,虽然也有类型的概念,但变量的类型是可改变的。
```
a = 1
a = "foo"
```
在上面这个示例里Python 将变量 `a` 一开始存储整数类型变量的内存空间释放了,并创建了一个新的存储字符串类型的内存空间,并且和原来的变量同名。
静态类型语言这样的设计并不是为了为难你,而是为了方便 CPU 运行而这样设计的。因为最终都需要将所有操作都对应为简单的二进制操作,因此必须将对象、类型这些高级的数据结构转换为低级数据结构。
Python 也实现了这样的转换,但用户看不到这些转换,也不需要关心这些转换。
变量类型不固定并不是 Python 运行慢的原因Python 通过巧妙的设计让用户可以让各种结构变得动态:可以在运行时更改对象上的方法,也可以在运行时让模块调用新声明的值,几乎可以做到任何事。
但也正是这种设计使得 Python 的优化难度变得很大。
为了证明我的观点,我使用了一个 `dtrace` 这个 Mac OS 上的系统调用跟踪工具。CPython 中没有内置 dTrace因此必须重新对 CPython 进行编译。以下使用 Python 3.6.6 进行为例:
```
wget https://github.com/python/cpython/archive/v3.6.6.zip
unzip v3.6.6.zip
cd v3.6.6
./configure --with-dtrace
make
```
这样 `python.exe` 将使用 dtrace 追踪所有代码。[Paul Ross 也作过关于 dtrace 的闪电演讲][19]。你可以下载 Python 的 dtrace 启动文件来查看函数调用、系统调用、CPU 时间、执行时间,以及各种其它的内容。
`sudo dtrace -s toolkit/<tracer>.d -c ../cpython/python.exe script.py`
`py_callflow` 追踪器显示了程序里调用的所有函数。
![](https://cdn-images-1.medium.com/max/1600/1*Lz4UdUi4EwknJ0IcpSJ52g.gif)
那么Python 的动态类型会让它变慢吗?
* 类型比较和类型转换消耗的资源是比较多的,每次读取、写入或引用变量时都会检查变量的类型
* Python 的动态程度让它难以被优化,因此很多 Python 的替代品都为了提升速度而在灵活性方面作出了妥协
* 而 [Cython][2] 结合了 C 的静态类型和 Python 来优化已知类型的代码,它可以将[性能提升][3] 84 倍。
### 总结
> 由于 Python 是一种动态、多功能的语言,因此运行起来会相对缓慢。对于不同的实际需求,可以使用各种不同的优化或替代方案。
例如可以使用异步,引入分析工具或使用多种解释器来优化 Python 程序。
对于不要求启动时间且代码可以充分利用 JIT 的程序,可以考虑使用 PyPy。
而对于看重性能并且静态类型变量较多的程序,不妨使用 [Cython][4]。
#### 延伸阅读
Jake VDP 的优秀文章(略微过时) [https://jakevdp.github.io/blog/2014/05/09/why-python-is-slow/][21]
Dave Beazleys 关于 GIL 的演讲 [http://www.dabeaz.com/python/GIL.pdf][22]
JIT 编译器的那些事 [https://hacks.mozilla.org/2017/02/a-crash-course-in-just-in-time-jit-compilers/][23]
--------------------------------------------------------------------------------
via: https://hackernoon.com/why-is-python-so-slow-e5074b6fe55b
作者:[Anthony Shaw][a]
选题:[oska874][b]
译者:[HankChow](https://github.com/HankChow)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://hackernoon.com/@anthonypjshaw?source=post_header_lockup
[b]:https://github.com/oska874
[1]:http://dabeaz.blogspot.com/2010/01/python-gil-visualized.html
[2]:http://cython.org/
[3]:http://notes-on-cython.readthedocs.io/en/latest/std_dev.html
[4]:http://cython.org/
[5]:http://algs4.cs.princeton.edu/faq/
[6]:https://benchmarksgame-team.pages.debian.net/benchmarksgame/faster/python.html
[7]:https://en.wikipedia.org/wiki/Just-in-time_compilation
[8]:https://en.wikipedia.org/wiki/Ahead-of-time_compilation
[9]:https://www.slideshare.net/GrahamDumpleton/secrets-of-a-wsgi-master
[10]:http://doc.pypy.org/en/latest/faq.html#does-pypy-have-a-gil-why
[11]:http://www.jython.org/jythonbook/en/1.0/Concurrency.html#no-global-interpreter-lock
[12]:https://developer.mozilla.org/en-US/docs/Web/JavaScript/Memory_Management
[13]:https://hackernoon.com/modifying-the-python-language-in-7-minutes-b94b0a99ce14
[14]:https://hackernoon.com/modifying-the-python-language-in-7-minutes-b94b0a99ce14
[15]:https://hackernoon.com/which-is-the-fastest-version-of-python-2ae7c61a6b2b
[16]:https://hackernoon.com/which-is-the-fastest-version-of-python-2ae7c61a6b2b
[17]:https://www.slideshare.net/AnthonyShaw5/pyjion-a-jit-extension-system-for-cpython
[18]:https://github.com/python/cpython/archive/v3.6.6.zip
[19]:https://github.com/paulross/dtrace-py#the-lightning-talk
[20]:https://github.com/paulross/dtrace-py/tree/master/toolkit
[21]:https://jakevdp.github.io/blog/2014/05/09/why-python-is-slow/
[22]:http://www.dabeaz.com/python/GIL.pdf
[23]:https://hacks.mozilla.org/2017/02/a-crash-course-in-just-in-time-jit-compilers/

View File

@ -0,0 +1,73 @@
在 Linux 命令行中使用 ls 列出文件的提示
======
学习一些 Linux "ls" 命令最有用的变化。
![](https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/button_push_open_keyboard_file_organize.png?itok=KlAsk1gx)
我在 Linux 中最先学到的命令之一就是 `ls`。了解系统中文件所在目录中的内容非常重要。能够查看和修改不仅仅是一些文件还要所有文件也很重要。
我的第一个 Linux 备忘录是[单页 Linux 手册][1],它于 1999 年发布,它成为我的首选参考资料。当我开始探索 Linux 时,我把它贴在桌子上并经常参考它。它的第一页第一列的底部有使用 `ls -l` 列出文件的命令。
之后,我将学习这个最基本命令的其他迭代。通过 `ls` 命令,我开始了解 Linux 文件权限的复杂性以及哪些是我的文件,哪些需要 root 或者 root 权限来修改。随着时间的推移,我习惯使用命令行,虽然我仍然使用 `ls -l` 来查找目录中的文件,但我经常使用 `ls -al`,这样我就可以看到可能需要更改的隐藏文件,比如那些配置文件。
根据 Eric Fischer 在[Linux 文档项目][2]中关于 `ls` 命令的文章,该命令的根源可以追溯到 1961年 MIT 的相容分时系统 CTSS
上的 `listf` 命令。当 CTSS 被 [Multics][3] 代替时,命令变为 `list`,并有像 `list -all` 的开关。根据[维基百科][4]“ls” 出现在 AT&T Unix 的原始版本中。我们今天在 Linux 系统上使用的 `ls` 命令来自 [GNU Core Utilities][5]。
大多数时候,我只使用几个迭代的命令。使用 `ls``ls -al` 查看目录内部是我通常使用该命令的方法,但是你还应该熟悉许多其他选项。
`$ ls -l` 提供了一个简单的目录列表:
![](https://opensource.com/sites/default/files/uploads/linux_ls_1_0.png)
使用我的 Fedora 28 系统中的手册页,我发现 `ls` 还有许多其他选项,所有这些选项都提供了有关 Linux 文件系统的有趣且有用的信息。通过在命令提示符下输入 `man ls`,我们可以开始探索其他一些选项:
![](https://opensource.com/sites/default/files/uploads/linux_ls_2_0.png)
要按文件大小对目录进行排序,请使用 `ls -lS`
![](https://opensource.com/sites/default/files/uploads/linux_ls_3_0.png)
要以相反的顺序列出内容,请使用 `ls -lr`
![](https://opensource.com/sites/default/files/uploads/linux_ls_4.png)
要按列列出内容,请使用 `ls -c`
![](https://opensource.com/sites/default/files/uploads/linux_ls_5.png)
`ls -al` 提供了同一目录中所有文件的列表:
![](https://opensource.com/sites/default/files/uploads/linux_ls_6.png)
以下是我认为有用且有趣的一些其他选项:
* 仅列出目录中的 .txt 文件:`ls * .txt`
  * 按文件大小列出:`ls -s`
  * 按时间和日期排序:`ls -d`
  * 按扩展名排序:`ls -X`
  * 按文件大小排序:`ls -S`
  * 带有文件大小的长格式:`ls -ls`
要生成指定格式的目录列表并将其定向到文件供以后查看,请输入 `ls -al> mydirectorylist`。最后,我找到的一个更奇特的命令是 `ls -R`,它提供了计算机上所有目录及其内容的递归列表。
有关 `ls` 命令的所有迭代的完整列表,请参阅 [GNU Core Utilities][6]。
--------------------------------------------------------------------------------
via: https://opensource.com/article/18/10/ls-command
作者:[Don Watkins][a]
选题:[lujun9972](https://github.com/lujun9972)
译者:[geekpi](https://github.com/geekpi)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/don-watkins
[1]: http://hackerspace.cs.rutgers.edu/library/General/One_Page_Linux_Manual.pdf
[2]: http://www.tldp.org/LDP/LG/issue48/fischer.html
[3]: https://en.wikipedia.org/wiki/Multics
[4]: https://en.wikipedia.org/wiki/Ls
[5]: http://www.gnu.org/s/coreutils/
[6]: https://www.gnu.org/software/coreutils/manual/html_node/ls-invocation.html#ls-invocation

View File

@ -0,0 +1,174 @@
PyTorch 1.0 预览版发布: Facebook 最新 AI 开源框架
======
Facebook 在人工智能项目中广泛使用自己的开源 AI 框架 PyTorch最近他们已经发布了 PyTorch 1.0 的预览版本。
对于那些不熟悉的人, [PyTorch][1] 是一个基于 Python 的科学计算库。
PyTorch 利用 [GPUs 超强的运算能力 ][2] 来实现复杂的 [张量][3] 计算 和 [深度神经网络][4]。 因此, 它被世界各地的研究人员和开发人员广泛使用。
这一新的能够使用的 [预览版][5] 已在2018年10月2日周二旧金山举办的 [PyTorch 开发人员大会][6] 的[中途][7]宣布。
### PyTorch 1.0 候选版本的亮点
![PyTorhc is Python based open source AI framework from Facebook][8]
候选版本中的一些主要新功能包括:
#### 1\. JIT
JIT 是一个编译工具集,使研究和生产更加接近。 它包含一个基于 Python 语言的叫做 Torch Script 的脚本语言,也有能使现有代码与它自己兼容的方法。
#### 2\. 全新的 torch.distributed 库: “C10D”
“C10D” 能够在不同的后端上启用异步操作, 并在较慢的网络上提高性能。
#### 3\. C++ 前端 (实验性功能)
虽然它被特别提到是一个不稳定的 API (预计在预发行版中) 这是一个 PyTorch 后端的纯 c++ 接口, 遵循 API 和建立的 Python 前端的体系结构,以实现高性能、 低延迟的研究和开发直接安装在硬件上的 c++ 应用程序。
想要了解更多,可以在 GitHub 上查看完整的 [更新说明][9]。
第一个PyTorch 1.0 的稳定版本将在夏季发布。
### 在 Linux 上安装 PyTorch
为了安装 PyTorch v1.0rc0 开发人员建议使用 [conda][10] 同时也可以按照[本地安装][11]所示,使用其他方法可以安装,所有必要的细节详见文档。
#### 前提
* Linux
* Pip
* Python
* [CUDA][12] (对于使用 Nvidia GPU 的用户)
我们已经知道[如何安装和使用 Pip][13],那就让我们来了解如何使用 Pip 安装 PyTorch。
请注意PyTorch 具有 GPU 和仅限 CPU 的不同安装包。你应该安装一个适合你硬件的安装包。
#### 安装 PyTorch 的旧版本和稳定版
如果你想在 GPU 机器上安装稳定版0.4 版本),使用:
```
pip install torch torchvision
```
使用以下两个命令,来安装仅用于 CPU 的稳定版:
```
pip install http://download.pytorch.org/whl/cpu/torch-0.4.1-cp27-cp27mu-linux_x86_64.whl
pip install torchvision
```
#### 安装 PyTorch 1.0 候选版本
使用如下命令安装 PyTorch 1.0 RC GPU 版本:
```
pip install torch_nightly -f https://download.pytorch.org/whl/nightly/cu92/torch_nightly.html
```
如果没有GPU并且更喜欢使用 仅限CPU 版本,使用如下命令:
```
pip install torch_nightly -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
```
#### 验证 PyTorch 安装
使用如下简单的命令,启动终端上的 python 控制台:
```
python
```
现在,按行输入下面的示例代码以验证您的安装:
```
from __future__ import print_function
import torch
x = torch.rand(5, 3)
print(x)
```
你应该得到如下输出:
```
tensor([[0.3380, 0.3845, 0.3217],
[0.8337, 0.9050, 0.2650],
[0.2979, 0.7141, 0.9069],
[0.1449, 0.1132, 0.1375],
[0.4675, 0.3947, 0.1426]])
```
若要检查是否可以使用 PyTorch 的 GPU 功能, 可以使用以下示例代码:
```
import torch
torch.cuda.is_available()
```
输出结果应该是:
```
True
```
支持 PyTorch 的 AMD GPU 仍在开发中, 因此, 尚未按[报告][14]提供完整的测试覆盖,如果您有 AMD GPU ,请在[这里][15]提出建议。
现在让我们来看看一些广泛使用 PyTorch 的研究项目:
### 基于 PyTorch 的持续研究项目
* [Detectron][16]: Facebook AI 研究院的软件系统, 可以智能地进行对象检测和分类。它之前是基于 Caffe2 的。今年早些时候Caffe2 和 PyTorch [合力][17]创建了一个研究 + 生产的 PyTorch 1.0
* [Unsupervised Sentiment Discovery][18]: 广泛应用于社交媒体的一些算法
* [vid2vid][19]: 逼真的视频到视频的转换
* [DeepRecommender][20] 我们在过去的[网飞的 AI 文章][21]中介绍了这些系统是如何工作的
领先的 GPU 制造商英伟达在[更新][22]这方面最近的发展,你也可以阅读正在进行的合作的研究。
### 我们应该如何应对这种 PyTorch 的能力?
想到 Facebook 在社交媒体算法中应用如此令人惊叹的创新项目, 我们是否应该感激这一切或是感到惊恐?这几乎是[天网][23]! 这一新改进的发布的 PyTorch 肯定会推动事情进一步向前! 在下方评论,随时与我们分享您的想法!
--------------------------------------------------------------------------------
via: https://itsfoss.com/pytorch-open-source-ai-framework/
作者:[Avimanyu Bandyopadhyay][a]
选题:[lujun9972](https://github.com/lujun9972)
译者:[distant1219](https://github.com/distant1219)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://itsfoss.com/author/avimanyu/
[1]: https://pytorch.org/
[2]: https://en.wikipedia.org/wiki/General-purpose_computing_on_graphics_processing_units
[3]: https://en.wikipedia.org/wiki/Tensor
[4]: https://www.techopedia.com/definition/32902/deep-neural-network
[5]: https://code.fb.com/ai-research/facebook-accelerates-ai-development-with-new-partners-and-production-capabilities-for-pytorch-1-0
[6]: https://pytorch.fbreg.com/
[7]: https://www.themidwaysf.com/
[8]: https://4bds6hergc-flywheel.netdna-ssl.com/wp-content/uploads/2018/10/pytorch.jpeg
[9]: https://github.com/pytorch/pytorch/releases/tag/v1.0rc0
[10]: https://conda.io/
[11]: https://pytorch.org/get-started/locally/
[12]: https://www.pugetsystems.com/labs/hpc/How-to-install-CUDA-9-2-on-Ubuntu-18-04-1184/
[13]: https://itsfoss.com/install-pip-ubuntu/
[14]: https://github.com/pytorch/pytorch/issues/10657#issuecomment-415067478
[15]: https://rocm.github.io/install.html#installing-from-amd-rocm-repositories
[16]: https://github.com/facebookresearch/Detectron
[17]: https://caffe2.ai/blog/2018/05/02/Caffe2_PyTorch_1_0.html
[18]: https://github.com/NVIDIA/sentiment-discovery
[19]: https://github.com/NVIDIA/vid2vid
[20]: https://github.com/NVIDIA/DeepRecommender/
[21]: https://itsfoss.com/netflix-open-source-ai/
[22]: https://news.developer.nvidia.com/pytorch-1-0-accelerated-on-nvidia-gpus/
[23]: https://en.wikipedia.org/wiki/Skynet_(Terminator)