Merge pull request #1 from LCTT/master

refork
This commit is contained in:
魑魅魍魉 2017-12-03 01:44:38 -06:00 committed by GitHub
commit cd0dbfd007
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
112 changed files with 6640 additions and 3564 deletions

View File

@ -0,0 +1,331 @@
ARMv8 上的 kprobes 事件跟踪
==============
![core-dump](http://www.linaro.org/wp-content/uploads/2016/02/core-dump.png)
### 介绍
kprobes 是一种内核功能,它允许通过在执行(或模拟)断点指令之前和之后,设置调用开发者提供例程的任意断点来检测内核。可参见 kprobes 文档^注1 获取更多信息。基本的 kprobes 功能可使用 `CONFIG_KPROBEES` 来选择。在 arm64 的 v4.8 内核发行版中, kprobes 支持被添加到主线。
在这篇文章中,我们将介绍 kprobes 在 arm64 上的使用,通过在命令行中使用 debugfs 事件追踪接口来收集动态追踪事件。这个功能在一些架构(包括 arm32上可用已经有段时间现在在 arm64 上也能使用了。这个功能可以无需编写任何代码就能使用 kprobes。
### 探针类型
kprobes 子系统提供了三种不同类型的动态探针,如下所述。
#### kprobes
基本探针是 kprobes 插入的一个软件断点,用以替代你正在探测的指令,当探测点被命中时,它为最终的单步执行(或模拟)保存下原始指令。
#### kretprobes
kretprobes 是 kprobes 的一部分,它允许拦截返回函数,而不必在返回点设置一个探针(或者可能有多个探针)。对于支持的架构(包括 ARMv8只要选择 kprobes就可以选择此功能。
#### jprobes
jprobes 允许通过提供一个具有相同<ruby>调用签名<rt>call signature</rt></ruby>的中间函数来拦截对一个函数的调用这里中间函数将被首先调用。jprobes 只是一个编程接口,它不能通过 debugfs 事件追踪子系统来使用。因此,我们将不会在这里进一步讨论 jprobes。如果你想使用 jprobes请参考 kprobes 文档。
### 调用 kprobes
kprobes 提供一系列能从内核代码中调用的 API 来设置探测点和当探测点被命中时调用的注册函数。在不往内核中添加代码的情况下kprobes 也是可用的,这是通过写入特定事件追踪的 debugfs 文件来实现的,需要在文件中设置探针地址和信息,以便在探针被命中时记录到追踪日志中。后者是本文将要讨论的重点。最后 kprobes 可以通过 perl 命令来使用。
#### kprobes API
内核开发人员可以在内核中编写函数(通常在专用的调试模块中完成)来设置探测点,并且在探测指令执行前和执行后立即执行任何所需操作。这在 kprobes.txt 中有很好的解释。
#### 事件追踪
事件追踪子系统有自己的自己的文档^注2 ,对于了解一般追踪事件的背景可能值得一读。事件追踪子系统是<ruby>追踪点<rt>tracepoints</rt></ruby>和 kprobes 事件追踪的基础。事件追踪文档重点关注追踪点所以请在查阅文档时记住这一点。kprobes 与追踪点不同的是没有预定义的追踪点列表,而是采用动态创建的用于触发追踪事件信息收集的任意探测点。事件追踪子系统通过一系列 debugfs 文件来控制和监视。事件追踪(`CONFIG_EVENT_TRACING`)将在被如 kprobe 事件追踪子系统等需要时自动选择。
##### kprobes 事件
使用 kprobes 事件追踪子系统用户可以在内核任意断点处指定要报告的信息只需要指定任意现有可探测指令的地址以及格式化信息即可确定。在执行过程中遇到断点时kprobes 将所请求的信息传递给事件追踪子系统的公共部分这些部分将数据格式化并追加到追踪日志中就像追踪点的工作方式一样。kprobes 使用一个类似的但是大部分是独立的 debugfs 文件来控制和显示追踪事件信息。该功能可使用 `CONFIG_KPROBE_EVENT` 来选择。Kprobetrace 文档^ 注3 提供了如何使用 kprobes 事件追踪的基本信息,并且应当被参考用以了解以下介绍示例的详细信息。
#### kprobes 和 perf
perf 工具为 kprobes 提供了另一个命令行接口。特别地,`perf probe` 允许探测点除了由函数名加偏移量和地址指定外还可由源文件和行号指定。perf 接口实际上是使用 kprobes 的 debugfs 接口的封装器。
### Arm64 kprobes
上述所有 kprobes 的方面现在都在 arm64 上得到实现,然而实际上与其它架构上的有一些不同:
* 注册名称参数当然是依架构而特定的,并且可以在 ARM ARM 中找到。
* 目前不是所有的指令类型都可被探测。当前不可探测的指令包括 mrs/msr除了 DAIF 读取、异常生成指令、eret 和 hint除了 nop 变体)。在这些情况下,只探测一个附近的指令来代替是最简单的。这些指令在探测的黑名单里是因为在 kprobes 单步执行或者指令模拟时它们对处理器状态造成的改变是不安全的,这是由于 kprobes 构造的单步执行上下文和指令所需要的不一致,或者是由于指令不能容忍在 kprobes 中额外的处理时间和异常处理ldx/stx
* 试图识别在 ldx/stx 序列中的指令并且防止探测,但是理论上这种检查可能会失败,导致允许探测到的原子序列永远不会成功。当探测原子代码序列附近时应该小心。
* 注意由于 linux ARM64 调用约定的具体信息,为探测函数可靠地复制栈帧是不可能的,基于此不要试图用 jprobes 这样做,这一点与支持 jprobes 的大多数其它架构不同。这样的原因是被调用者没有足够的信息来确定需要的栈数量。
* 注意当探针被命中时,一个探针记录的栈指针信息将反映出使用中的特定栈指针,它是内核栈指针或者中断栈指针。
* 有一组内核函数是不能被探测的,通常因为它们作为 kprobes 处理的一部分被调用。这组函数的一部分是依架构特定的,并且也包含如异常入口代码等。
### 使用 kprobes 事件追踪
kprobes 的一个常用例子是检测函数入口和/或出口。因为只需要使用函数名来作为探针地址它安装探针特别简单。kprobes 事件追踪将查看符号名称并且确定地址。ARMv8 调用标准定义了函数参数和返回值的位置,并且这些可以作为 kprobes 事件处理的一部分被打印出来。
#### 例子: 函数入口探测
检测 USB 以太网驱动程序复位功能:
```
$ pwd
/sys/kernel/debug/tracing
$ cat > kprobe_events <<EOF
p ax88772_reset %x0
EOF
$ echo 1 > events/kprobes/enable
```
此时每次该驱动的 `ax8872_reset()` 函数被调用,追踪事件都将会被记录。这个事件将显示指向通过作为此函数的唯一参数的 `X0`(按照 ARMv8 调用标准)传入的 `usbnet` 结构的指针。插入需要以太网驱动程序的 USB 加密狗后,我们看见以下追踪信息:
```
$ cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 1/1 #P:8
#
# _—=> irqs-off
# / _—-=> need-resched
# | / _—=> hardirq/softirq
# || / _=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
kworker/0:0-4 [000] d… 10972.102939: p_ax88772_reset_0:
(ax88772_reset+0x0/0x230) arg1=0xffff800064824c80
```
这里我们可以看见传入到我们的探测函数的指针参数的值。由于我们没有使用 kprobes 事件追踪的可选标签功能,我们需要的信息自动被标注为 `arg1`。注意这指向我们需要 kprobes 记录这个探针的一组值的第一个,而不是函数参数的实际位置。在这个例子中它也只是碰巧是我们探测函数的第一个参数。
#### 例子: 函数入口和返回探测
kretprobe 功能专门用于探测函数返回。在函数入口 kprobes 子系统将会被调用并且建立钩子以便在函数返回时调用,钩子将记录需求事件信息。对最常见情况,返回信息通常在 `X0` 寄存器中,这是非常有用的。在 `%x0` 中返回值也可以被称为 `$retval`。以下例子也演示了如何提供一个可读的标签来展示有趣的信息。
使用 kprobes 和 kretprobe 检测内核 `do_fork()` 函数来记录参数和结果的例子:
```
$ cd /sys/kernel/debug/tracing
$ cat > kprobe_events <<EOF
p _do_fork %x0 %x1 %x2 %x3 %x4 %x5
r _do_fork pid=%x0
EOF
$ echo 1 > events/kprobes/enable
```
此时每次对 `_do_fork()` 的调用都会产生两个记录到 trace 文件的 kprobe 事件,一个报告调用参数值,另一个报告返回值。返回值在 trace 文件中将被标记为 `pid`。这里是三次 fork 系统调用执行后的 trace 文件的内容:
```
_$ cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 6/6 #P:8
#
# _—=> irqs-off
# / _—-=> need-resched
# | / _—=> hardirq/softirq
# || / _=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
bash-1671 [001] d… 204.946007: p__do_fork_0: (_do_fork+0x0/0x3e4) arg1=0x1200011 arg2=0x0 arg3=0x0 arg4=0x0 arg5=0xffff78b690d0 arg6=0x0
bash-1671 [001] d..1 204.946391: r__do_fork_0: (SyS_clone+0x18/0x20 <- _do_fork) pid=0x724
bash-1671 [001] d… 208.845749: p__do_fork_0: (_do_fork+0x0/0x3e4) arg1=0x1200011 arg2=0x0 arg3=0x0 arg4=0x0 arg5=0xffff78b690d0 arg6=0x0
bash-1671 [001] d..1 208.846127: r__do_fork_0: (SyS_clone+0x18/0x20 <- _do_fork) pid=0x725
bash-1671 [001] d… 214.401604: p__do_fork_0: (_do_fork+0x0/0x3e4) arg1=0x1200011 arg2=0x0 arg3=0x0 arg4=0x0 arg5=0xffff78b690d0 arg6=0x0
bash-1671 [001] d..1 214.401975: r__do_fork_0: (SyS_clone+0x18/0x20 <- _do_fork) pid=0x726_
```
#### 例子: 解引用指针参数
对于指针值kprobes 事件处理子系统也允许解引用和打印所需的内存内容,适用于各种基本数据类型。为了展示所需字段,手动计算结构的偏移量是必要的。
检测 `_do_wait()` 函数:
```
$ cat > kprobe_events <<EOF
p:wait_p do_wait wo_type=+0(%x0):u32 wo_flags=+4(%x0):u32
r:wait_r do_wait $retval
EOF
$ echo 1 > events/kprobes/enable
```
注意在第一个探针中使用的参数标签是可选的,并且可用于更清晰地识别记录在追踪日志中的信息。带符号的偏移量和括号表明了寄存器参数是指向记录在追踪日志中的内存内容的指针。`:u32` 表明了内存位置包含一个无符号的 4 字节宽的数据(在这个例子中指局部定义的结构中的一个 emum 和一个 int
探针标签(冒号后)是可选的,并且将用来识别日志中的探针。对每个探针来说标签必须是独一无二的。如果没有指定,将从附近的符号名称自动生成一个有用的标签,如前面的例子所示。
也要注意 `$retval` 参数可以只是指定为 `%x0`
这里是两次 fork 系统调用执行后的 trace 文件的内容:
```
$ cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 4/4 #P:8
#
# _—=> irqs-off
# / _—-=> need-resched
# | / _—=> hardirq/softirq
# || / _=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
bash-1702 [001] d… 175.342074: wait_p: (do_wait+0x0/0x260) wo_type=0x3 wo_flags=0xe
bash-1702 [002] d..1 175.347236: wait_r: (SyS_wait4+0x74/0xe4 <- do_wait) arg1=0x757
bash-1702 [002] d… 175.347337: wait_p: (do_wait+0x0/0x260) wo_type=0x3 wo_flags=0xf
bash-1702 [002] d..1 175.347349: wait_r: (SyS_wait4+0x74/0xe4 <- do_wait) arg1=0xfffffffffffffff6
```
#### 例子: 探测任意指令地址
在前面的例子中,我们已经为函数的入口和出口插入探针,然而探测一个任意指令(除少数例外)是可能的。如果我们正在 C 函数中放置一个探针,第一步是查看代码的汇编版本以确定我们要放置探针的位置。一种方法是在 vmlinux 文件上使用 gdb,并在要放置探针的函数中展示指令。下面是一个在 `arch/arm64/kernel/modules.c``module_alloc` 函数执行此操作的示例。在这种情况下,因为 gdb 似乎更喜欢使用弱符号定义,并且它是与这个函数关联的存根代码,所以我们从 System.map 中来获取符号值:
```
$ grep module_alloc System.map
ffff2000080951c4 T module_alloc
ffff200008297770 T kasan_module_alloc
```
在这个例子中我们使用了交叉开发工具,并且在我们的主机系统上调用 gdb 来检查指令包含我们感兴趣函数。
```
$ ${CROSS_COMPILE}gdb vmlinux
(gdb) x/30i 0xffff2000080951c4
0xffff2000080951c4 <module_alloc>: sub sp, sp, #0x30
0xffff2000080951c8 <module_alloc+4>: adrp x3, 0xffff200008d70000
0xffff2000080951cc <module_alloc+8>: add x3, x3, #0x0
0xffff2000080951d0 <module_alloc+12>: mov x5, #0x713 // #1811
0xffff2000080951d4 <module_alloc+16>: mov w4, #0xc0 // #192
0xffff2000080951d8 <module_alloc+20>:
mov x2, #0xfffffffff8000000 // #-134217728
0xffff2000080951dc <module_alloc+24>: stp x29, x30, [sp,#16] 0xffff2000080951e0 <module_alloc+28>: add x29, sp, #0x10
0xffff2000080951e4 <module_alloc+32>: movk x5, #0xc8, lsl #48
0xffff2000080951e8 <module_alloc+36>: movk w4, #0x240, lsl #16
0xffff2000080951ec <module_alloc+40>: str x30, [sp] 0xffff2000080951f0 <module_alloc+44>: mov w7, #0xffffffff // #-1
0xffff2000080951f4 <module_alloc+48>: mov x6, #0x0 // #0
0xffff2000080951f8 <module_alloc+52>: add x2, x3, x2
0xffff2000080951fc <module_alloc+56>: mov x1, #0x8000 // #32768
0xffff200008095200 <module_alloc+60>: stp x19, x20, [sp,#32] 0xffff200008095204 <module_alloc+64>: mov x20, x0
0xffff200008095208 <module_alloc+68>: bl 0xffff2000082737a8 <__vmalloc_node_range>
0xffff20000809520c <module_alloc+72>: mov x19, x0
0xffff200008095210 <module_alloc+76>: cbz x0, 0xffff200008095234 <module_alloc+112>
0xffff200008095214 <module_alloc+80>: mov x1, x20
0xffff200008095218 <module_alloc+84>: bl 0xffff200008297770 <kasan_module_alloc>
0xffff20000809521c <module_alloc+88>: tbnz w0, #31, 0xffff20000809524c <module_alloc+136>
0xffff200008095220 <module_alloc+92>: mov sp, x29
0xffff200008095224 <module_alloc+96>: mov x0, x19
0xffff200008095228 <module_alloc+100>: ldp x19, x20, [sp,#16] 0xffff20000809522c <module_alloc+104>: ldp x29, x30, [sp],#32
0xffff200008095230 <module_alloc+108>: ret
0xffff200008095234 <module_alloc+112>: mov sp, x29
0xffff200008095238 <module_alloc+116>: mov x19, #0x0 // #0
```
在这种情况下,我们将在此函数中显示以下源代码行的结果:
```
p = __vmalloc_node_range(size, MODULE_ALIGN, VMALLOC_START,
VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_EXEC, 0,
NUMA_NO_NODE, __builtin_return_address(0));
```
……以及在此代码行的函数调用的返回值:
```
if (p && (kasan_module_alloc(p, size) < 0)) {
```
我们可以在从调用外部函数的汇编代码中识别这些。为了展示这些值,我们将在目标系统上的 `0xffff20000809520c``0xffff20000809521c` 处放置探针。
```
$ cat > kprobe_events <<EOF
p 0xffff20000809520c %x0
p 0xffff20000809521c %x0
EOF
$ echo 1 > events/kprobes/enable
```
现在将一个以太网适配器加密狗插入到 USB 端口后,我们看到以下写入追踪日志的内容:
```
$ cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 12/12 #P:8
#
# _—=> irqs-off
# / _—-=> need-resched
# | / _—=> hardirq/softirq
# || / _=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
systemd-udevd-2082 [000] d… 77.200991: p_0xffff20000809520c: (module_alloc+0x48/0x98) arg1=0xffff200001188000
systemd-udevd-2082 [000] d… 77.201059: p_0xffff20000809521c: (module_alloc+0x58/0x98) arg1=0x0
systemd-udevd-2082 [000] d… 77.201115: p_0xffff20000809520c: (module_alloc+0x48/0x98) arg1=0xffff200001198000
systemd-udevd-2082 [000] d… 77.201157: p_0xffff20000809521c: (module_alloc+0x58/0x98) arg1=0x0
systemd-udevd-2082 [000] d… 77.227456: p_0xffff20000809520c: (module_alloc+0x48/0x98) arg1=0xffff2000011a0000
systemd-udevd-2082 [000] d… 77.227522: p_0xffff20000809521c: (module_alloc+0x58/0x98) arg1=0x0
systemd-udevd-2082 [000] d… 77.227579: p_0xffff20000809520c: (module_alloc+0x48/0x98) arg1=0xffff2000011b0000
systemd-udevd-2082 [000] d… 77.227635: p_0xffff20000809521c: (module_alloc+0x58/0x98) arg1=0x0
modprobe-2097 [002] d… 78.030643: p_0xffff20000809520c: (module_alloc+0x48/0x98) arg1=0xffff2000011b8000
modprobe-2097 [002] d… 78.030761: p_0xffff20000809521c: (module_alloc+0x58/0x98) arg1=0x0
modprobe-2097 [002] d… 78.031132: p_0xffff20000809520c: (module_alloc+0x48/0x98) arg1=0xffff200001270000
modprobe-2097 [002] d… 78.031187: p_0xffff20000809521c: (module_alloc+0x58/0x98) arg1=0x0
```
kprobes 事件系统的另一个功能是记录统计信息,这可在 `inkprobe_profile` 中找到。在以上追踪后,该文件的内容为:
```
$ cat kprobe_profile
p_0xffff20000809520c 6 0
p_0xffff20000809521c 6 0
```
这表明我们设置的两处断点每个共发生了 8 次命中,这当然与追踪日志数据是一致的。在 kprobetrace 文档中有更多 kprobe_profile 的功能描述。
也可以进一步过滤 kprobes 事件。用来控制这点的 debugfs 文件在 kprobetrace 文档中被列出,然而它们内容的详细信息大多在 trace events 文档中被描述。
### 总结
现在Linux ARMv8 对支持 kprobes 功能也和其它架构相当。有人正在做添加 uprobes 和 systemtap 支持的工作。这些功能/工具和其他已经完成的功能(如: perf、 coresight允许 Linux ARMv8 用户像在其它更老的架构上一样调试和测试性能。
* * *
参考文献
- 注1 Jim Keniston, Prasanna S. Panchamukhi, Masami Hiramatsu. “Kernel Probes (kprobes).” _GitHub_. GitHub, Inc., 15 Aug. 2016\. Web. 13 Dec. 2016.
- 注2 Tso, Theodore, Li Zefan, and Tom Zanussi. “Event Tracing.” _GitHub_. GitHub, Inc., 3 Mar. 2016\. Web. 13 Dec. 2016.
- 注3 Hiramatsu, Masami. “Kprobe-based Event Tracing.” _GitHub_. GitHub, Inc., 18 Aug. 2016\. Web. 13 Dec. 2016.
----------------
作者简介 [David Long][8] 在 Linaro Kernel - Core Development 团队中担任工程师。 在加入 Linaro 之前他在商业和国防行业工作了数年既做嵌入式实时工作又为Unix提供软件开发工具。之后在 Digital又名 Compaq公司工作了十几年负责 Unix 标准C 编译器和运行时库的工作。之后 David 又去了一系列初创公司做嵌入式 Linux 和安卓系统,嵌入式定制操作系统和 Xen 虚拟化。他拥有 MIPSAlpha 和 ARM 平台的经验(等等)。他使用过从 1979 年贝尔实验室 V6 开始的大部分Unix操作系统并且长期以来一直是 Linux 用户和倡导者。他偶尔也因使用烙铁和数字示波器调试设备驱动而知名。
--------------------------------------------------------------------------------
via: http://www.linaro.org/blog/kprobes-event-tracing-armv8/
作者:[David Long][a]
译者:[kimii](https://github.com/kimii)
校对:[wxy](https://github.com/wxy)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:http://www.linaro.org/author/david-long/
[1]:http://www.linaro.org/blog/kprobes-event-tracing-armv8/#
[2]:https://github.com/torvalds/linux/blob/master/Documentation/kprobes.txt
[3]:https://github.com/torvalds/linux/blob/master/Documentation/trace/events.txt
[4]:https://github.com/torvalds/linux/blob/master/Documentation/trace/kprobetrace.txt
[5]:https://github.com/torvalds/linux/blob/master/Documentation/kprobes.txt
[6]:https://github.com/torvalds/linux/blob/master/Documentation/trace/events.txt
[7]:https://github.com/torvalds/linux/blob/master/Documentation/trace/kprobetrace.txt
[8]:http://www.linaro.org/author/david-long/
[9]:http://www.linaro.org/blog/kprobes-event-tracing-armv8/#comments
[10]:http://www.linaro.org/blog/kprobes-event-tracing-armv8/#
[11]:http://www.linaro.org/tag/arm64/
[12]:http://www.linaro.org/tag/armv8/
[13]:http://www.linaro.org/tag/jprobes/
[14]:http://www.linaro.org/tag/kernel/
[15]:http://www.linaro.org/tag/kprobes/
[16]:http://www.linaro.org/tag/kretprobes/
[17]:http://www.linaro.org/tag/perf/
[18]:http://www.linaro.org/tag/tracing/

View File

@ -1,23 +1,20 @@
translating by firmianay
Examining network connections on Linux systems
检查 Linux 系统上的网络连接
============================================================
### Linux systems provide a lot of useful commands for reviewing network configuration and connections. Here's a look at a few, including ifquery, ifup, ifdown and ifconfig.
> Linux 系统提供了许多有用的命令来检查网络配置和连接。下面来看几个,包括 `ifquery`、`ifup`、`ifdown` 和 `ifconfig`
Linux 上有许多可用于查看网络设置和连接的命令。在今天的文章中,我们将会通过一些非常方便的命令来看看它们是如何工作的。
There are a lot of commands available on Linux for looking at network settings and connections. In today's post, we're going to run through some very handy commands and see how they work.
### ifquery 命令
### ifquery command
One very useful command is the **ifquery** command. This command should give you a quick list of network interfaces. However, you might only see something like this —showing only the loopback interface:
一个非常有用的命令是 `ifquery`。这个命令应该会显示一个网络接口列表。但是,你可能只会看到类似这样的内容 - 仅显示回环接口:
```
$ ifquery --list
lo
```
If this is the case, your **/etc/network/interfaces** file doesn't include information on network interfaces except for the loopback interface. You can add lines like the last two in the example below — assuming DHCP is used to assign addresses — if you'd like it to be more useful.
如果是这种情况,那说明你的 `/etc/network/interfaces` 不包括除了回环接口之外的网络接口信息。在下面的例子中,假设你使用 DHCP 来分配地址,且如果你希望它更有用的话,你可以添加例子最后的两行。
```
# interfaces(5) file used by ifup(8) and ifdown(8)
@ -27,15 +24,13 @@ auto eth0
iface eth0 inet dhcp
```
### ifup and ifdown commands
### ifup 和 ifdown 命令
The related **ifup** and **ifdown** commands can be used to bring network connections up and shut them down as needed provided this file has the required descriptive data. Just keep in mind that "if" means "interface" in these commands just as it does in the **ifconfig** command, not "if" as in "if I only had a brain".
可以使用相关的 `ifup``ifdown` 命令来打开网络连接并根据需要将其关闭只要该文件具有所需的描述性数据即可。请记住“if” 在这里意思是<ruby>接口<rt>interface</rt></ruby>,这与 `ifconfig` 命令中的一样,而不是<ruby>如果我只有一个大脑<rt>if I only had a brain</rt></ruby> 中的 “if”。
<aside class="nativo-promo smartphone" id="" style="overflow: hidden; margin-bottom: 16px; max-width: 620px;"></aside>
### ifconfig 命令
### ifconfig command
The **ifconfig** command, on the other hand, doesn't read the /etc/network/interfaces file at all and still provides quite a bit of useful information on network interfaces -- configuration data along with packet counts that tell you how busy each interface has been. The ifconfig command can also be used to shut down and restart network interfaces (e.g., ifconfig eth0 down).
另外,`ifconfig` 命令完全不读取 `/etc/network/interfaces`,但是仍然提供了网络接口相当多的有用信息 —— 配置数据以及可以告诉你每个接口有多忙的数据包计数。`ifconfig` 命令也可用于关闭和重新启动网络接口(例如:`ifconfig eth0 down`)。
```
$ ifconfig eth0
@ -50,15 +45,13 @@ eth0 Link encap:Ethernet HWaddr 00:1e:4f:c8:43:fc
Interrupt:21 Memory:fe9e0000-fea00000
```
The RX and TX packet counts in this output are extremely low. In addition, no errors or packet collisions have been reported. The **uptime** command will likely confirm that this system has only recently been rebooted.
输出中的 RX 和 TX 数据包计数很低。此外,没有报告错误或数据包冲突。或许可以用 `uptime` 命令确认此系统最近才重新启动。
The broadcast (Bcast) and network mask (Mask) addresses shown above indicate that the system is operating on a Class C equivalent network (the default) so local addresses will range from 192.168.0.1 to 192.168.0.254.
上面显示的广播 Bcast 和网络掩码 Mask 地址表明系统运行在 C 类等效网络(默认)上,所以本地地址范围从 `192.168.0.1``192.168.0.254`
### netstat command
### netstat 命令
The **netstat** command provides information on routing and network connections. The **netstat -rn** command displays the system's routing table.
<aside class="nativo-promo tablet desktop" id="" style="overflow: hidden; margin-bottom: 16px; max-width: 620px;"></aside>
`netstat` 命令提供有关路由和网络连接的信息。`netstat -rn` 命令显示系统的路由表。192.168.0.1 是本地网关 Flags=UG)。
```
$ netstat -rn
@ -69,7 +62,7 @@ Destination Gateway Genmask Flags MSS Window irtt Iface
192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
```
That **169.254.0.0** entry in the above output is only necessary if you are using or planning to use link-local communications. You can comment out the related lines in the **/etc/network/if-up.d/avahi-autoipd** file like this if this is not the case:
上面输出中的 `169.254.0.0` 条目仅在你正在使用或计划使用本地链路通信时才有必要。如果不是这样的话,你可以在 `/etc/network/if-up.d/avahi-autoipd` 中注释掉相关的行:
```
$ tail -12 /etc/network/if-up.d/avahi-autoipd
@ -86,9 +79,9 @@ $ tail -12 /etc/network/if-up.d/avahi-autoipd
#fi
```
### netstat -a command
### netstat -a 命令
The **netstat -a** command will display  **_all_**  network connections. To limit this to listening and established connections (generally much more useful), use the **netstat -at** command instead.
`netstat -a` 命令将显示“所有”网络连接。为了将其限制为显示正在监听和已建立的连接(通常更有用),请改用 `netstat -at` 命令。
```
$ netstat -at
@ -104,21 +97,9 @@ tcp6 0 0 ip6-localhost:ipp [::]:* LISTEN
tcp6 0 0 ip6-localhost:smtp [::]:* LISTEN
```
### netstat -rn command
### host 命令
The **netstat -rn** command displays the system's routing table. The 192.168.0.1 address is the local gateway (Flags=UG).
```
$ netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
0.0.0.0 192.168.0.1 0.0.0.0 UG 0 0 0 eth0
192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
```
### host command
The **host** command works a lot like **nslookup** by looking up the remote system's IP address, but also provides the system's mail handler.
`host` 命令就像 `nslookup` 一样,用来查询远程系统的 IP 地址,但是还提供系统的邮箱处理地址。
```
$ host world.std.com
@ -126,9 +107,9 @@ world.std.com has address 192.74.137.5
world.std.com mail is handled by 10 smtp.theworld.com.
```
### nslookup command
### nslookup 命令
The **nslookup** also provides information on the system (in this case, the local system) that is providing DNS lookup services.
`nslookup` 还提供系统中(本例中是本地系统)提供 DNS 查询服务的信息。
```
$ nslookup world.std.com
@ -140,9 +121,9 @@ Name: world.std.com
Address: 192.74.137.5
```
### dig command
### dig 命令
The **dig** command provides quitea lot of information on connecting to a remote system -- including the name server we are communicating with and how long the query takes to respond and is often used for troubleshooting.
`dig` 命令提供了很多有关连接到远程系统的信息 - 包括与我们通信的名称服务器以及查询需要多长时间进行响应,并经常用于故障排除。
```
$ dig world.std.com
@ -167,9 +148,9 @@ world.std.com. 78146 IN A 192.74.137.5
;; MSG SIZE rcvd: 58
```
### nmap command
### nmap 命令
The **nmap** command is most frequently used to probe remote systems, but can also be used to report on the services being offered by the local system. In the output below, we can see that ssh is available for logins, that smtp is servicing email, that a web site is active, and that an ipp print service is running.
`nmap` 经常用于探查远程系统,但是同样也用于报告本地系统提供的服务。在下面的输出中,我们可以看到登录可以使用 ssh、smtp 用于电子邮箱、web 站点也是启用的,并且 ipp 打印服务正在运行。
```
$ nmap localhost
@ -187,15 +168,15 @@ PORT STATE SERVICE
Nmap done: 1 IP address (1 host up) scanned in 0.09 seconds
```
Linux systems provide a lot of useful commands for reviewing their network configuration and connections. If you run out of commands to explore, keep in mind that **apropos network** might point you toward even more.
Linux 系统提供了很多有用的命令用于查看网络配置和连接。如果你都探索完了,请记住 `apropos network` 或许会让你了解更多。
--------------------------------------------------------------------------------
via: https://www.networkworld.com/article/3230519/linux/examining-network-connections-on-linux-systems.html
作者:[Sandra Henry-Stocker][a]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
译者:[geekpi](https://github.com/geekpi)
校对:[wxy](https://github.com/wxy)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出

View File

@ -0,0 +1,43 @@
回复:块层介绍第一部分 - 块 I/O 层
============================================================
### 块层介绍第一部分:块 I/O 层
回复amarao 在[块层介绍第一部分:块 I/O 层][1] 中提的问题
先前的文章:[块层介绍第一部分:块 I/O 层][2]
![](https://static.lwn.net/images/2017/neil-blocklayer.png)
嗨,
你在这里描述的问题与块层不直接相关。这可能是一个驱动错误、可能是一个 SCSI 层错误,但绝对不是一个块层的问题。
不幸的是,报告针对 Linux 的错误是一件难事。有些开发者拒绝去看 bugzilla有些开发者喜欢它有些像我这样只能勉强地使用它。
另一种方法是发送电子邮件。为此,你需要选择正确的邮件列表,还有也许是正确的开发人员,当他们心情愉快,或者不是太忙或者不是假期时找到它们。有些人会努力回复所有,有些是完全不可预知的 - 这对我来说通常会发送一个补丁,包含一些错误报告。如果你只是有一个你自己几乎都不了解的 bug那么你的预期响应率可能会更低。很遗憾但这是是真的。
许多 bug 都会得到回应和处理,但很多 bug 都没有。
我不认为说没有人关心是公平的,但是没有人认为它如你想的那样重要是有可能的。如果你想要一个解决方案,那么你需要驱动它。一个驱动它的方法是花钱请顾问或者与经销商签订支持合同。我怀疑你的情况没有上面的可能。另一种方法是了解代码如何工作,并自己找到解决方案。很多人都这么做,但是这对你来说可能不是一种选择。另一种方法是在不同的相关论坛上不断提出问题,直到得到回复。坚持可以见效。你需要做好准备去执行任何你所要求的测试,可能包括建立一个新的内核来测试。
如果你能在最近的内核(4.12 或者更新)上复现这个 bug我建议你邮件报告给 linux-kernel@vger.kernel.org、linux-scsi@vger.kernel.org 和我neilb@suse.com注意你不必订阅这些列表来发送邮件只需要发送就行。描述你的硬件以及如何触发问题的。
包含所有进程状态是 “D” 的栈追踪。你可以用 “cat /proc/$PID/stack” 来得到它,这里的 “$PID” 是进程的 pid。
确保避免抱怨或者说这个已经坏了好几年了以及这是多么严重不足。没有人关心这个。我们关心的是 bug 以及如何修复它。因此只要报告相关的事实就行。
尝试在邮件中而不是链接到其他地方的链接中包含所有事实。有时链接是需要的,但是对于你的脚本,它只有 8 行,所以把它包含在邮件中就行(并避免像 “fuckup” 之类的描述。只需称它为“坏的”broken或者类似的。同样确保你的邮件发送的不是 HTML 格式。我们喜欢纯文本。HTML 被所有的 @vger.kernel.org 邮件列表拒绝。你或许需要配置你的邮箱程序不发送 HTML。
--------------------------------------------------------------------------------
via: https://lwn.net/Articles/737655/
作者:[neilbrown][a]
译者:[geekpi](https://github.com/geekpi)
校对:[wxy](https://github.com/wxy)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://lwn.net/Articles/737655/
[1]:https://lwn.net/Articles/737588/
[2]:https://lwn.net/Articles/736534/

View File

@ -0,0 +1,57 @@
操作系统何时运行?
============================================================
请各位思考以下问题:在你阅读本文的这段时间内,计算机中的操作系统在**运行**吗?又或者仅仅是 Web 浏览器在运行?又或者它们也许均处于空闲状态,等待着你的指示?
这些问题并不复杂,但它们深入涉及到系统软件工作的本质。为了准确回答这些问题,我们需要透彻理解操作系统的行为模型,包括性能、安全和除错等方面。在该系列文章中,我们将以 Linux 为主举例来帮助你建立操作系统的行为模型OS X 和 Windows 在必要的时候也会有所涉及。对那些深度探索者,我会在适当的时候给出 Linux 内核源码的链接。
这里有一个基本认知,就是,在任意给定时刻,某个 CPU 上仅有一个任务处于活动状态。大多数情形下这个任务是某个用户程序,例如你的 Web 浏览器或音乐播放器,但它也可能是一个操作系统线程。可以确信的是,它是**一个任务**,不是两个或更多,也不是零个,对,**永远**是一个。
这听上去可能会有些问题。比如,你的音乐播放器是否会独占 CPU 而阻止其它任务运行?从而使你不能打开任务管理工具去杀死音乐播放器,甚至让鼠标点击也失效,因为操作系统没有机会去处理这些事件。你可能会愤而喊出,“它究竟在搞什么鬼?”,并引发骚乱。
此时便轮到**中断**大显身手了。中断就好比,一声巨响或一次拍肩后,神经系统通知大脑去感知外部刺激一般。计算机主板上的[芯片组][1]同样会中断 CPU 运行以传递新的外部事件,例如键盘上的某个键被按下、网络数据包的到达、一次硬盘读取的完成,等等。硬件外设、主板上的中断控制器和 CPU 本身,它们共同协作实现了中断机制。
中断对于记录我们最珍视的资源——时间——也至关重要。计算机[启动过程][2]中,操作系统内核会设置一个硬件计时器以让其产生周期性**计时中断**,例如每隔 10 毫秒触发一次。每当计时中断到来,内核便会收到通知以更新系统统计信息和盘点如下事项:当前用户程序是否已运行了足够长时间?是否有某个 TCP 定时器超时了?中断给予了内核一个处理这些问题并采取合适措施的机会。这就好像你给自己设置了整天的周期闹铃并把它们用作检查点:我是否应该去做我正在进行的工作?是否存在更紧急的事项?直到你发现 10 年时间已逝去……
这些内核对 CPU 周期性的劫持被称为<ruby>滴答<rt>tick</rt></ruby>,也就是说,是中断让你的操作系统滴答了一下。不止如此,中断也被用作处理一些软件事件,如整数溢出和页错误,其中未涉及外部硬件。**中断是进入操作系统内核最频繁也是最重要的入口**。对于学习电子工程的人而言,这些并无古怪,它们是操作系统赖以运行的机制。
说到这里,让我们再来看一些实际情形。下图示意了 Intel Core i5 系统中的一个网卡中断。图片中的部分元素设置了超链,你可以点击它们以获取更为详细的信息,例如每个设备均被链接到了对应的 Linux 驱动源码。
![](http://duartes.org/gustavo/blog/img/os/hardware-interrupt.png)
链接如下:
- network card https://github.com/torvalds/linux/blob/v3.17/drivers/net/ethernet/intel/e1000e/netdev.c
- USB keyboard https://github.com/torvalds/linux/blob/v3.16/drivers/hid/usbhid/usbkbd.c
- I/O APIC https://github.com/torvalds/linux/blob/v3.16/arch/x86/kernel/apic/io_apic.c
- HPET https://github.com/torvalds/linux/blob/v3.17/arch/x86/kernel/hpet.c
让我们来仔细研究下。首先,由于系统中存在众多中断源,如果硬件只是通知 CPU “嘿,这里发生了一些事情”然后什么也不做,则不太行得通。这会带来难以忍受的冗长等待。因此,计算机上电时,每个设备都被授予了一根**中断线**,或者称为 IRQ。这些 IRQ 然后被系统中的中断控制器映射成值介于 0 到 255 之间的**中断向量**。等到中断到达 CPU它便具备了一个完好定义的数值异于硬件的某些其它诡异行为。
相应地CPU 中还存有一个由内核维护的指针,指向一个包含 255 个函数指针的数组,其中每个函数被用来处理某个特定的中断向量。后文中,我们将继续深入探讨这个数组,它也被称作**中断描述符表**IDT
每当中断到来CPU 会用中断向量的值去索引中断描述符表并执行相应处理函数。这相当于在当前正在执行任务的上下文中发生了一个特殊函数调用从而允许操作系统以较小开销快速对外部事件作出反应。考虑下述场景Web 服务器在发送数据时CPU 却间接调用了操作系统函数,这听上去要么很炫酷要么令人惊恐。下图展示了 Vim 编辑器运行过程中一个中断到来的情形。
![](http://duartes.org/gustavo/blog/img/os/vim-interrupted.png)
此处请留意,中断的到来是如何触发 CPU 到 [Ring 0][3] 内核模式的切换而未有改变当前活跃的任务。这看上去就像Vim 编辑器直接面向操作系统内核产生了一次神奇的函数调用,但 Vim 还在那里,它的[地址空间][4]原封未动,等待着执行流返回。
这很令人振奋,不是么?不过让我们暂且告一段落吧,我需要合理控制篇幅。我知道还没有回答完这个开放式问题,甚至还实质上翻开了新的问题,但你至少知道了在你读这个句子的同时**滴答**正在发生。我们将在充实了对操作系统动态行为模型的理解之后再回来寻求问题的答案,对 Web 浏览器情形的理解也会变得清晰。如果你仍有问题,尤其是在这篇文章公诸于众后,请尽管提出。我将会在文章或后续评论中回答它们。下篇文章将于明天在 RSS 和 Twitter 上发布。
--------------------------------------------------------------------------------
via: http://duartes.org/gustavo/blog/post/when-does-your-os-run/
作者:[gustavo][a]
译者:[Cwndmiao](https://github.com/Cwndmiao)
校对:[wxy](https://github.com/wxy)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:http://duartes.org/gustavo/blog/about/
[1]:http://duartes.org/gustavo/blog/post/motherboard-chipsets-memory-map
[2]:http://duartes.org/gustavo/blog/post/kernel-boot-process
[3]:http://duartes.org/gustavo/blog/post/cpu-rings-privilege-and-protection
[4]:http://duartes.org/gustavo/blog/post/anatomy-of-a-program-in-memory
[5]:http://feeds.feedburner.com/GustavoDuarte
[6]:http://twitter.com/food4hackers

View File

@ -1,27 +1,27 @@
介绍 MOBY 项目:推进软件容器化运动的一个新的开源项目
介绍 Moby 项目:推进软件容器化运动的一个新的开源项目
============================================================
![Moby Project](https://i0.wp.com/blog.docker.com/wp-content/uploads/1-2.png?resize=763%2C275&ssl=1)
自从 Docker 四年前将软件容器推向民主化以来,整个生态系统都围绕着容器化而发展,在这段压缩的时期,它经历了两个不同的增长阶段。在这每一个阶段,生产容器系统的模式已经演变适应用户群体以及项目的规模和需求和不断增长的贡献者生态系统
自从 Docker 四年前将软件容器推向大众化以来,整个生态系统都围绕着容器化而发展,在这段这么短的时期内,它经历了两个不同的增长阶段。在这每一个阶段,生产容器系统的模式已经随着项目和不断增长的容器生态系统而演变适应用户群体的规模和需求。
Moby 是一个新的开源项目,旨在推进软件容器化运动,帮助生态系统将容器作为主流。它提供了一个组件库,一个将它们组装到定制的基于容器的系统的框架,以及所有容器爱好者进行实验和交换想法的地方。
Moby 是一个新的开源项目,旨在推进软件容器化运动,帮助生态系统将容器作为主流。它提供了一个组件库,一个将它们组装到定制的基于容器的系统的框架,也是所有容器爱好者进行实验和交换想法的地方。
让我们来回顾一下我们如何走到今天。在 2013-2014 年开拓者开始使用容器并在一个单一的开源代码库Docker 和其他一些项目中进行协作,以帮助工具成熟。
![Docker Open Source](https://i0.wp.com/blog.docker.com/wp-content/uploads/2-2.png?resize=975%2C548&ssl=1)
然后在 2015-2016 年云原生应用中大量采用容器用于生产环境。在这个阶段用户社区已经发展到支持成千上万个部署由数百个生态系统项目和成千上万的贡献者支持。正是在这个阶段Docker 将其产模式演变为基于开放式组件的方法。这样,它使我们能够增加创新和合作的方面。
然后在 2015-2016 年云原生应用中大量采用容器用于生产环境。在这个阶段用户社区已经发展到支持成千上万个部署由数百个生态系统项目和成千上万的贡献者支持。正是在这个阶段Docker 将其产模式演变为基于开放式组件的方法。这样,它使我们能够增加创新和合作的方面。
涌现出来的新独立的 Docker 组件项目帮助刺激了合作伙伴生态系统和用户社区的发展。在此期间,我们从 Docker 代码库中提取并快速创新组件,以便系统制造商可以在构建自己的容器系统时独立重用它们:[runc][7]、[HyperKit][8]、[VPNKit][9]、[SwarmKit][10]、[InfraKit][11]、[containerd][12] 等。
涌现出来的新独立的 Docker 组件项目帮助促进了合作伙伴生态系统和用户社区的发展。在此期间,我们从 Docker 代码库中提取并快速创新组件,以便系统制造商可以在构建自己的容器系统时独立重用它们:[runc][7]、[HyperKit][8]、[VPNKit][9]、[SwarmKit][10]、[InfraKit][11]、[containerd][12] 等。
![Docker Open Components](https://i1.wp.com/blog.docker.com/wp-content/uploads/3-2.png?resize=975%2C548&ssl=1)
站在容器浪潮的最前沿,我们看到 2017 年出现的一个趋势是容器将成为主流,传播到计算、服务器、数据中心、云、桌面、物联网和移动的各个领域。每个行业和垂直市场金融、医疗、政府、旅游、制造。以及每一个使用案例,现代网络应用、传统服务器应用、机器学习、工业控制系统、机器人技术。容器生态系统中许多新进入者的共同点是,它们建立专门的系统,针对特定的基础设施、行业或使用案例。
站在容器浪潮的最前沿,我们看到 2017 年出现的一个趋势是容器将成为主流,传播到计算、服务器、数据中心、云、桌面、物联网和移动的各个领域。每个行业和垂直市场金融、医疗、政府、旅游、制造。以及每一个使用案例,现代网络应用、传统服务器应用、机器学习、工业控制系统、机器人技术。容器生态系统中许多新进入者的共同点是,它们建立专门的系统,针对特定的基础设施、行业或使用案例。
作为一家公司Docker 使用开源作为我们的创新实验室而与整个生态系统合作。Docker 的成功取决于容器生态系统的成功:如果生态系统成功,我们就成功了。因此,我们一直在计划下一阶段的容器生态系统增长:什么样的产模式将帮助我们扩大容器生态系统,实现容器成为主流的承诺?
作为一家公司Docker 使用开源作为我们的创新实验室而与整个生态系统合作。Docker 的成功取决于容器生态系统的成功:如果生态系统成功,我们就成功了。因此,我们一直在计划下一阶段的容器生态系统增长:什么样的产模式将帮助我们扩大容器生态系统,实现容器成为主流的承诺?
去年,我们的客户开始在 Linux 以外的许多平台上要求有 DockerMac 和 Windows 桌面、Windows Server、云平台如亚马逊网络服务AWS、Microsoft Azure 或 Google 云平台),并且我们专门为这些平台创建了[许多 Docker 版本][13]。为了在一个相对较短的时间与更小的团队,以可扩展的方式构建和发布这些专业版本,而不必重新发明轮子,很明显,我们需要一个新的方。我们需要我们的团队不仅在组件上进行协作,而且还在组件组合上进行协作,这借用[来自汽车行业的想法][14],其中组件被重用于构建完全不同的汽车。
去年,我们的客户开始在 Linux 以外的许多平台上要求有 DockerMac 和 Windows 桌面、Windows Server、云平台如亚马逊网络服务AWS、Microsoft Azure 或 Google 云平台),并且我们专门为这些平台创建了[许多 Docker 版本][13]。为了在一个相对较短的时间和更小的团队中,以可扩展的方式构建和发布这些专业版本,而不必重新发明轮子,很明显,我们需要一个新的方。我们需要我们的团队不仅在组件上进行协作,而且还在组件组合上进行协作,这借用[来自汽车行业的想法][14],其中组件被重用于构建完全不同的汽车。
![Docker production model](https://i1.wp.com/blog.docker.com/wp-content/uploads/4-2.png?resize=975%2C548&ssl=1)
@ -29,15 +29,13 @@ Moby 是一个新的开源项目,旨在推进软件容器化运动,帮助生
![Moby Project](https://i0.wp.com/blog.docker.com/wp-content/uploads/5-2.png?resize=975%2C548&ssl=1)
为了实现这种新的合作高度,今天我们宣布推出软件容器化运动的新开源项目 Moby。它是提供了数十个组件的“乐高”,一个将它们组合成定制容器系统的框架,以及所有容器爱好者进行试验和交换意见的场所。可以把 Moby 认为是容器系统的“乐高俱乐部”。
为了实现这种新的合作高度,今天2017 年 4 月 18 日)我们宣布推出软件容器化运动的新开源项目 Moby。它是提供了数十个组件的“乐高组件”,一个将它们组合成定制容器系统的框架,以及所有容器爱好者进行试验和交换意见的场所。可以把 Moby 认为是容器系统的“乐高俱乐部”。
Moby包括
1. 容器化后端组件**库**例如底层构建器、日志记录设备、卷管理、网络、镜像管理、containerd、SwarmKit 等)
Moby 包括:
1. 容器化后端组件**库**例如低层构建器、日志记录设备、卷管理、网络、镜像管理、containerd、SwarmKit 等)
2. 将组件组合到独立容器平台中的**框架**,以及为这些组件构建、测试和部署构件的工具。
3. 一个名为 **Moby Origin** 的引用组件,它是 Docker 容器平台的开放基础,以及使用 Moby 库或其他项目的各种组件的容器系统示例。
3. 一个名为 “Moby Origin” 的引用组件,它是 Docker 容器平台的开放基础,以及使用 Moby 库或其他项目的各种组件的容器系统示例。
Moby 专为系统构建者而设计,他们想要构建自己的基于容器的系统,而不是可以使用 Docker 或其他容器平台的应用程序开发人员。Moby 的参与者可以从源自 Docker 的组件库中进行选择或者可以选择将“自己的组件”BYOC打包为容器以便在所有组件之间进行混合和匹配以创建定制的容器系统。
@ -49,9 +47,9 @@ Docker 将 Moby 作为一个开放的研发实验室来试验、开发新的组
via: https://blog.docker.com/2017/04/introducing-the-moby-project/
作者:[Solomon Hykes ][a]
作者:[Solomon Hykes][a]
译者:[geekpi](https://github.com/geekpi)
校对:[校对者ID](https://github.com/校对者ID)
校对:[wxy](https://github.com/wxy)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出

View File

@ -0,0 +1,67 @@
了解用于 Linux 和 Windows 容器的 Docker “容器主机”与“容器操作系统”
=================
让我们来探讨一下“容器主机”和“容器操作系统”之间的关系,以及它们在 Linux 和 Windows 容器之间的区别。
### 一些定义
* <ruby>容器主机<rt>Container Host</rt></ruby>:也称为<ruby>主机操作系统<rt>Host OS</rt></ruby>。主机操作系统是 Docker 客户端和 Docker 守护程序在其上运行的操作系统。在 Linux 和非 Hyper-V 容器的情况下,主机操作系统与运行中的 Docker 容器共享内核。对于 Hyper-V每个容器都有自己的 Hyper-V 内核。
* <ruby>容器操作系统<rt>Container OS</rt></ruby>:也被称为<ruby>基础操作系统<rt>Base OS</rt></ruby>。基础操作系统是指包含操作系统如 Ubuntu、CentOS 或 windowsservercore 的镜像。通常情况下你将在基础操作系统镜像之上构建自己的镜像以便可以利用该操作系统的部分功能。请注意Windows 容器需要一个基础操作系统,而 Linux 容器不需要。
* <ruby>操作系统内核<rt>Operating System Kernel</rt></ruby>:内核管理诸如内存、文件系统、网络和进程调度等底层功能。
### 如下的一些图
![Linux Containers](http://floydhilton.com/images/2017/03/2017-03-31_14_50_13-Radom%20Learnings,%20Online%20Whiteboard%20for%20Visual%20Collaboration.png)
在上面的例子中:
* 主机操作系统是 Ubuntu。
* Docker 客户端和 Docker 守护进程(一起被称为 Docker 引擎)正在主机操作系统上运行。
* 每个容器共享主机操作系统内核。
* CentOS 和 BusyBox 是 Linux 基础操作系统镜像。
* “No OS” 容器表明你不需要基础操作系统以在 Linux 中运行一个容器。你可以创建一个含有 [scratch][1] 基础镜像的 Docker 文件,然后运行直接使用内核的二进制文件。
* 查看[这篇][2]文章来比较基础 OS 的大小。
![Windows Containers - Non Hyper-V](http://floydhilton.com/images/2017/03/2017-03-31_15_04_03-Radom%20Learnings,%20Online%20Whiteboard%20for%20Visual%20Collaboration.png)
在上面的例子中:
* 主机操作系统是 Windows 10 或 Windows Server。
* 每个容器共享主机操作系统内核。
* 所有 Windows 容器都需要 [nanoserver][3] 或 [windowsservercore][4] 的基础操作系统。
![Windows Containers - Hyper-V](http://floydhilton.com/images/2017/03/2017-03-31_15_41_31-Radom%20Learnings,%20Online%20Whiteboard%20for%20Visual%20Collaboration.png)
在上面的例子中:
* 主机操作系统是 Windows 10 或 Windows Server。
* 每个容器都托管在自己的轻量级 Hyper-V 虚拟机中。
* 每个容器使用 Hyper-V 虚拟机内的内核,它在容器之间提供额外的分离层。
* 所有 Windows 容器都需要 [nanoserver][5] 或 [windowsservercore][6] 的基础操作系统。
### 几个好的链接
* [关于 Windows 容器][7]
* [深入实现 Windows 容器,包括多用户模式和“写时复制”来节省资源][8]
* [Linux 容器如何通过使用“写时复制”来节省资源][9]
--------------------------------------------------------------------------------
via: http://floydhilton.com/docker/2017/03/31/Docker-ContainerHost-vs-ContainerOS-Linux-Windows.html
作者:[Floyd Hilton][a]
译者:[geekpi](https://github.com/geekpi)
校对:[wxy](https://github.com/wxy)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:http://floydhilton.com/about/
[1]:https://hub.docker.com/_/scratch/
[2]:https://www.brianchristner.io/docker-image-base-os-size-comparison/
[3]:https://hub.docker.com/r/microsoft/nanoserver/
[4]:https://hub.docker.com/r/microsoft/windowsservercore/
[5]:https://hub.docker.com/r/microsoft/nanoserver/
[6]:https://hub.docker.com/r/microsoft/windowsservercore/
[7]:https://docs.microsoft.com/en-us/virtualization/windowscontainers/about/
[8]:http://blog.xebia.com/deep-dive-into-windows-server-containers-and-docker-part-2-underlying-implementation-of-windows-server-containers/
[9]:https://docs.docker.com/engine/userguide/storagedriver/imagesandcontainers/#the-copy-on-write-strategy

View File

@ -0,0 +1,96 @@
Linux 上的科学图像处理
============================================================
在显示你的数据和工作方面我发现了几个科学软件,但是我不会涉及太多方面。因此在这篇文章中,我将谈到一款叫 ImageJ 的热门图像处理软件。特别的,我会介绍 [Fiji][4],这是一款绑定了一系列用于科学图像处理插件的 ImageJ 软件。
Fiji 这个名字是一个循环缩略词,很像 GNU 。代表着 “Fiji Is Just ImageJ”。 ImageJ 是科学研究领域进行图像分析的实用工具 —— 例如你可以用它来辨认航拍风景图中树的种类。 ImageJ 能划分物品种类。它以插件架构制成,海量插件可供选择以提升使用灵活度。
首先是安装 ImageJ (或 Fiji。大多数的 ImageJ 发行版都可有该软件包。你愿意的话,可以以这种方式安装它,然后根据你的研究安装所需的独立插件。另一种选择是安装 Fiji 的同时获取最常用的插件。不幸的是,大多数 Linux 发行版的软件中心不会有可用的 Fiji 安装包。幸而,官网上的简单安装文件是可以使用的。这是一个 zip 文件,包含了运行 Fiji 需要的所有文件目录。第一次启动时,你只会看到一个列出了菜单项的工具栏。(图 1
![](http://www.linuxjournal.com/files/linuxjournal.com/ufiles/imagecache/large-550px-centered/u1000009/12172fijif1.png)
*图 1. 第一次打开 Fiji 有一个最小化的界面。*
如果你没有备好图片来练习使用 ImageJ Fiji 安装包包含了一些示例图片。点击“File”->“Open Samples”的下拉菜单选项图 2。这些示例包含了许多你可能有兴趣做的任务。
![](http://www.linuxjournal.com/files/linuxjournal.com/ufiles/imagecache/large-550px-centered/u1000009/12172fijif2.jpg)
*图 2. 案例图片可供学习使用 ImageJ。*
如果你安装了 Fiji而不是单纯的 ImageJ ,那么大量插件也会被安装。首先要注意的是自动更新器插件。每次打开 ImageJ ,该插件将联网检验 ImageJ 和已安装插件的更新。
所有已安装的插件都在“插件”菜单项中可选。一旦你安装了很多插件列表会变得冗杂所以需要精简你选择的插件。你想手动更新的话点击“Help”->“Update Fiji” 菜单项强制检测并获取可用更新的列表(图 3
![](http://www.linuxjournal.com/files/linuxjournal.com/ufiles/imagecache/large-550px-centered/u1000009/12172fijif3.png)
*图 3. 强制手动检测可用更新。*
那么,现在,用 Fiji/ImageJ 可以做什么呢举一例统计图片中的物品数。你可以通过点击“File”->“Open Samples”->“Embryos”来载入示例。
![](http://www.linuxjournal.com/files/linuxjournal.com/ufiles/imagecache/large-550px-centered/u1000009/12172fijif4.jpg)
*图 4. 用 ImageJ 算出图中的物品数。*
第一步给图片设定比例,这样你可以告诉 ImageJ 如何判别物品。首先选择在工具栏选择线条按钮。然后选择“Analyze”->“Set Scale”然后就会设置比例尺包含的像素点个数(图 5)。你可以设置“known distance ”为 100单元为“um”。
![](http://www.linuxjournal.com/files/linuxjournal.com/ufiles/imagecache/large-550px-centered/u1000009/12172fijif5.png)
*图 5. 很多图片分析任务需要对图片设定一个范围。*
接下来的步骤是简化图片内的信息。点击“Image”->“Type”->“8-bit”来减少信息量到 8 比特灰度图片。要分隔独立物体点击“Process”->“Binary”->“Make Binary”以自动设置图片门限。(图 6)。
![](http://www.linuxjournal.com/files/linuxjournal.com/ufiles/imagecache/large-550px-centered/u1000009/12172fijif6.png)
*图 6. 有些工具可以自动完成像门限一样的任务。*
图片内的物品计数前你需要移除像比例尺之类的人工操作。可以用矩形选择工具来选中它并点击“Edit”->“Clear”来完成这项操作。现在你可以分析图片看看这里是啥物体。
确保图中没有区域被选中点击“Analyze”->“Analyze Particles”来弹出窗口来选择最小尺寸这决定了最后的图片会展示什么图 7
![](http://www.linuxjournal.com/files/linuxjournal.com/ufiles/imagecache/large-550px-centered/u1000009/12172fijif7.png)
*图 7. 你可以通过确定最小尺寸生成一个缩减过的图片。 *
图 8 在总结窗口展示了一个概览。每个最小点也有独立的细节窗口。
![](http://www.linuxjournal.com/files/linuxjournal.com/ufiles/imagecache/large-550px-centered/u1000009/12172fijif8.png)
*图 8. 包含了已知最小点总览清单的输出结果。*
当你有一个分析程序可以工作于给定图片类型,你通常需要将相同的步骤应用到一系列图片当中。这可能数以千计,你当然不会想对每张图片手动重复操作。这时候,你可以集中必要步骤到宏,这样它们可以被应用多次。点击插件->“Macros”->“Record”弹出一个新的窗口记录你随后的所有命令。所有步骤完成你可以将之保存为一个宏文件并且通过点击“Plugins”->“Macros”->“Run”来在其它图片上重复运行。
如果你有非常特定的工作步骤,你可以简单地打开宏文件并手动编辑它,因为它是一个简单的文本文件。事实上有一套完整的宏语言可供你更加充分地控制图片处理过程。
然而如果你有真的有非常多的系列图片需要处理这也将是冗长乏味的工作。这种情况下前往“Process”->“Batch”->“Macro”会弹出一个你可以设置批量处理工作的新窗口图 9
![](http://www.linuxjournal.com/files/linuxjournal.com/ufiles/imagecache/large-550px-centered/u1000009/12172fijif9.png)
*图 9. 对批量输入的图片用单一命令运行宏。*
这个窗口中你能选择应用哪个宏文件、输入图片所在的源目录和你想写入输出图片的输出目录。也可以设置输出文件格式及通过文件名筛选输入图片中需要使用的。万事具备之后点击窗口下方的的“Process”按钮开始批量操作。
若这是会重复多次的工作你可以点击窗口底部的“Save”按钮保存批量处理到一个文本文件。点击也在窗口底部的“Open”按钮重新加载相同的工作。这个功能可以使得研究中最冗余部分自动化这样你就可以在重点放在实际的科学研究中。
考虑到单单是 ImageJ 主页就有超过 500 个插件和超过 300 种宏可供使用,简短起见,我只能在这篇短文中提出最基本的话题。幸运的是,还有很多专业领域的教程可供使用,项目主页上还有关于 ImageJ 核心的非常棒的文档。如果你觉得这个工具对研究有用,你研究的专业领域也会有很多信息指引你。
--------------------------------------------------------------------------------
作者简介:
Joey Bernard 有物理学和计算机科学的相关背景。这对他在新不伦瑞克大学当计算研究顾问的日常工作大有裨益。他也教计算物理和并行程序规划。
--------------------------------
via: https://www.linuxjournal.com/content/image-processing-linux
作者:[Joey Bernard][a]
译者:[XYenChi](https://github.com/XYenChi)
校对:[wxy](https://github.com/wxy)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://www.linuxjournal.com/users/joey-bernard
[1]:https://www.linuxjournal.com/tag/science
[2]:https://www.linuxjournal.com/tag/statistics
[3]:https://www.linuxjournal.com/users/joey-bernard
[4]:https://imagej.net/Fiji

View File

@ -0,0 +1,98 @@
用 coredumpctl 更好地记录 bug
===========
![](https://fedoramagazine.org/wp-content/uploads/2017/11/coredump.png-945x400.jpg)
一个不幸的事实是,所有的软件都有 bug一些 bug 会导致系统崩溃。当它出现的时候,它经常会在磁盘上留下一个被称为“<ruby>核心转储<rt>core dump</rt></ruby>”的数据文件。该文件包含有关系统崩溃时的相关数据,可能有助于确定发生崩溃的原因。通常开发者要求提供 “<ruby>回溯<rt>backtrace</rt></ruby>” 形式的数据,以显示导致崩溃的指令流。开发人员可以使用它来修复 bug 以改进系统。如果系统发生了崩溃,以下是如何轻松生成 <ruby>回溯<rt>backtrace</rt></ruby> 的方法。
### 从使用 coredumpctl 开始
大多数 Fedora 系统使用[自动错误报告工具ABRT][2]来自动捕获崩溃文件并记录 bug。但是如果你禁用了此服务或删除了该软件包则此方法可能会有所帮助。
如果你遇到系统崩溃,请首先确保你运行的是最新的软件。更新通常包含修复程序,这些更新通常含有已经发现的会导致严重错误和崩溃的错误的修复。当你更新后,请尝试重现导致错误的情况。
如果崩溃仍然发生,或者你已经在运行最新的软件,那么可以使用有用的 `coredumpctl` 工具。此程序可帮助查找和处理崩溃。要查看系统上所有核心转储列表,请运行以下命令:
```
coredumpctl list
```
如果你看到比预期长的列表,请不要感到惊讶。有时系统组件在后台默默地崩溃,并自行恢复。快速查找今天的转储的简单方法是使用 `-since` 选项:
```
coredumpctl list --since=today
```
“PID” 列包含用于标识转储的进程 ID。请注意这个数字因为你会之后再用到它。或者如果你不想记住它使用下面的命令将它赋值给一个变量
```
MYPID=<PID>
```
要查看关于核心转储的信息,请使用此命令(使用 `$MYPID` 变量或替换 PID 编号):
```
coredumpctl info $MYPID
```
### 安装 debuginfo 包
在核心转储中的数据以及原始代码中的指令之间调试符号转义。这个符号数据可能相当大。与大多数用户运行在 Fedora 系统上的软件包不同,符号以 “debuginfo” 软件包的形式安装。要确定你必须安装哪些 debuginfo 包,请先运行以下命令:
```
coredumpctl gdb $MYPID
```
这可能会在屏幕上显示大量信息。最后一行可能会告诉你使用 `dnf` 安装更多的 debuginfo 软件包。[用 sudo ][3]运行该命令以安装:
```
sudo dnf debuginfo-install <packages...>
```
然后再次尝试 `coredumpctl gdb $MYPID` 命令。**你可能需要重复执行此操作**,因为其他符号会在回溯中展开。
### 捕获回溯
在调试器中运行以下命令以记录信息:
```
set logging file mybacktrace.txt
set logging on
```
你可能会发现关闭分页有帮助。对于长的回溯,这可以节省时间。
```
set pagination off
```
现在运行回溯:
```
thread apply all bt full
```
现在你可以输入 `quit` 来退出调试器。`mybacktrace.txt` 包含可附加到 bug 或问题的追踪信息。或者,如果你正在与某人实时合作,则可以将文本上传到 pastebin。无论哪种方式你现在可以向开发人员提供更多的帮助来解决问题。
---------------------------------
作者简介:
Paul W. Frields
Paul W. Frields 自 1997 年以来一直是 Linux 用户和爱好者,并于 2003 年在 Fedora 发布不久后加入 Fedora。他是 Fedora 项目委员会的创始成员之一,从事文档、网站发布、宣传、工具链开发和维护软件。他于 2008 年 2 月至 2010 年 7 月加入 Red Hat担任 Fedora 项目负责人,现任红帽公司工程部经理。他目前和妻子和两个孩子住在弗吉尼亚州。
--------------------------------------------------------------------------------
via: https://fedoramagazine.org/file-better-bugs-coredumpctl/
作者:[Paul W. Frields][a]
译者:[geekpi](https://github.com/geekpi)
校对:[wxy](https://github.com/wxy)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://fedoramagazine.org/author/pfrields/
[1]:https://fedoramagazine.org/file-better-bugs-coredumpctl/
[2]:https://github.com/abrt/abrt
[3]:https://fedoramagazine.org/howto-use-sudo/

View File

@ -0,0 +1,102 @@
5 款最酷的 Linux 终端模拟器
============================================================
![Cool retro term](https://www.linux.com/sites/lcom/files/styles/rendered_file/public/banner2.png)
> Carla Schroder 正在看着那些她喜欢的终端模拟器, 包括展示在这儿的 Cool Retro Term。
虽然,我们可以继续使用老旧的 GNOME 终端、Konsole以及好笑而孱弱的旧式 xterm。 不过,让我们带着尝试某种新东西的心境,回过头来看看 5 款酷炫并且实用的 Linux 终端。
### Xiki
首先我要推荐的第一个终端是 [Xiki][10]。 Xiki 是 Craig Muth 的智慧结晶,他是一个天才程序员,也是一个有趣的人(有趣在此处的意思是幽默,可能还有其它的意思)。 很久以前我在 [遇见 XikiLinux 和 Mac OS X 下革命性命令行 Shell][11] 一文中介绍过 Xiki。 Xiki 不仅仅是又一款终端模拟器;它也是一个扩展命令行用途、加快命令行速度的交互式环境。
视频: https://youtu.be/bUR_eUVcABg
Xiki 支持鼠标,并且在绝大多数命令行 Shell 上都支持。 它有大量的屏显帮助,而且可以使用鼠标和键盘快速导航。 它体现在速度上的一个简单例子就是增强了 `ls` 命令。 Xiki 可以快速穿过文件系统上的多层目录,而不用持续的重复输入 `ls` 或者 `cd` 或者利用那些巧妙的正则表达式。
Xiki 可以与许多文本编辑器相集成, 提供了一个永久的便签, 有一个快速搜索引擎, 同时像他们所说的,还有许许多多的功能。 Xiki 是如此的有特色、如此的不同, 所以学习和了解它的最快的方式可以看 [Craig 的有趣和实用的视频][12]。
### Cool Retro Term
我推荐 [Cool Retro Term][13] (如题图显示) 主要因为它的外观,以及它的实用性。 它将我们带回了阴极射线管显示器的时代,这不算很久以前,而我也没有怀旧的意思,我死也不会放弃我的 LCD 屏幕。它基于 [Konsole][14] 因此有着 Konsole 的优秀功能。可以通过 Cool Retro Term 的配置文件菜单来改变它的外观。配置文件包括 Amber、Green、Pixelated、Apple 和 Transparent Green 等等,而且全都包括一个像真的一样的扫描线。并不是全都是有用的,例如 Vintage 配置文件看起来就像一个闪烁着的老旧的球面屏。
Cool Retro Term 的 GitHub 仓库有着详细的安装指南,且 Ubuntu 用户有 [PPA][15]。
### Sakura
你要是想要一个优秀的轻量级、易配置的终端,可以尝试下 [Sakura][16](图 1 它依赖少不像 GNOME 终端 和 Konsole在 GNOME 和 KDE 中牵扯了很多组件。其大多数选项是可以通过右键菜单配置的,例如选项卡的标签、 颜色、大小、选项卡的默认数量、字体、铃声,以及光标类型。 你可以在你个人的配置文件 `~/.config/sakura/sakura.conf` 里面设置更多的选项,例如绑定快捷键。
![sakura](https://www.linux.com/sites/lcom/files/styles/rendered_file/public/fig-1_9.png)
*图 1 Sakura 是一个优秀的、轻量级的、可配置的终端。*
命令行选项详见 `man sakura`。可以使用这些来从命令行启动 sakura或者在你的图形启动器上使用它们。 例如,打开 4 个选项卡并设置窗口标题为 “MyWindowTitle”
```
$ sakura -t MyWindowTitle -n 4
```
### Terminology
[Terminology][17] 来自 Enlightenment 图形环境的郁葱可爱的世界,它能够被美化成任何你所想要的样子 (图 2)。 它有许多有用的功能:独立的拆分窗口、打开文件和 URL、文件图标、选项卡林林总总。 它甚至能运行在没有图形界面的 Linux 控制台上。
![Terminology](https://www.linux.com/sites/lcom/files/styles/rendered_file/public/fig-2_6.png)
*图 2: Terminology 也能够运行在没有图形界面的 Linux 控制台上。*
当你打开多个拆分窗口时,每个窗口都能设置不同的背景,并且背景文件可以是任意媒体文件:图像文件、视频或者音乐文件。它带有一堆便于清晰可读的暗色主题和透明主题,它甚至一个 Nyan 猫主题。它没有滚动条,因此需要使用组合键 `Shift+PageUp``Shift+PageDown` 进行上下导航。
它有多个控件:一个右键单击菜单,上下文对话框,以及命令行选项。右键单击菜单里包含世界上最小的字体,且 Miniview 可显示一个微观的文件树,但我没有找到可以使它们易于辨读的选项。当你打开多个标签时,可以点击小标签浏览器来打开一个可以上下滚动的选择器。任何东西都是可配置的;通过 `man terminology` 可以查看一系列的命令和选项,包括一批不错的快捷键快捷方式。奇怪的是,帮助里面没有包括以下命令,这是我偶然发现的:
* tyalpha
* tybg
* tycat
* tyls
* typop
* tyq
使用 `tybg [filename]` 命令来设置背景,不带参数的 `tybg` 命令来移除背景。 运行 `typop [filename]` 来打开文件。 `tyls` 命令以图标视图列出文件。 加上 `-h` 选项运行这些命令可以了解它们是干什么的。 即使有可读性的怪癖Terminology 依然是快速、漂亮和实用的。
### Tilda
已经有几个优秀的下拉式终端模拟器,包括 Guake 和 Yakuake。 [Tilda][18] (图 3) 是其中最简单和轻量级的一个。 打开 Tilda 后它会保持打开状态, 你可以通过快捷键来显示和隐藏它。 Tilda 快捷键是默认设置的 你可以设置自己喜欢的快捷键。 它一直打开着的,随时准备工作,但是直到你需要它的时候才会出现。
![](https://www.linux.com/sites/lcom/files/styles/rendered_file/public/fig-3_3.png)
*图 3: Tilda 是最简单和轻量级的一个终端模拟器。*
Tilda 选项方面有很好的补充包括默认的大小、位置、外观、绑定键、搜索条、鼠标动作以及标签条。 这些都被右键单击菜单控制。
_学习更多关于 Linux 的知识可以通过 Linux 基金会 和 edX 的免费课程 ["Linux 介绍" ][9]。_
--------------------------------------------------------------------------------
via: https://www.linux.com/learn/intro-to-linux/2017/11/5-coolest-linux-terminal-emulators
作者:[CARLA SCHRODER][a]
译者:[cnobelw](https://github.com/cnobelw)
校对:[wxy](https://github.com/wxy)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://www.linux.com/users/cschroder
[1]:https://www.linux.com/licenses/category/used-permission
[2]:https://www.linux.com/licenses/category/used-permission
[3]:https://www.linux.com/licenses/category/used-permission
[4]:https://www.linux.com/licenses/category/used-permission
[5]:https://www.linux.com/files/images/fig-1png-9
[6]:https://www.linux.com/files/images/fig-2png-6
[7]:https://www.linux.com/files/images/fig-3png-3
[8]:https://www.linux.com/files/images/banner2png
[9]:https://training.linuxfoundation.org/linux-courses/system-administration-training/introduction-to-linux
[10]:http://xiki.org/
[11]:https://www.linux.com/learn/meet-xiki-revolutionary-command-shell-linux-and-mac-os-x
[12]:http://xiki.org/screencasts/
[13]:https://github.com/Swordfish90/cool-retro-term
[14]:https://www.linux.com/learn/expert-tips-and-tricks-kate-and-konsole
[15]:https://launchpad.net/~bugs-launchpad-net-falkensweb/+archive/ubuntu/cool-retro-term
[16]:https://bugs.launchpad.net/sakura
[17]:https://www.enlightenment.org/about-terminology
[18]:https://github.com/lanoxx/tilda

View File

@ -0,0 +1,92 @@
如何轻松记住 Linux 命令
=================
![](https://www.maketecheasier.com/assets/uploads/2017/10/rc-feat.jpg)
Linux 新手往往对命令行心存畏惧。部分原因是因为需要记忆大量的命令,毕竟掌握命令是高效使用命令行的前提。
不幸的是,学习这些命令并无捷径,然而在你开始学习命令之初,有些工具还是可以帮到你的。
### history
![Linux Bash History 命令](https://www.maketecheasier.com/assets/uploads/2017/10/rc-bash-history.jpg)
首先要介绍的是命令行工具 `history`,它能帮你记住那些你曾经用过的命令。包括应用最广泛的 Bash 在内的大多数 [Linux shell][1],都会创建一个历史文件来包含那些你输入过的命令。如果你用的是 Bash这个历史文件就是 `/home/<username>/.bash_history`
这个历史文件是纯文本格式的,你可以用任意的文本编辑器打开来浏览和搜索。
### apropos
确实存在一个可以帮你找到其他命令的命令。这个命令就是 `apropos`,它能帮你找出合适的命令来完成你的搜索。比如,假设你需要知道哪个命令可以列出目录的内容,你可以运行下面命令:
```shell
apropos "list directory"
```
![Linux Apropos](https://www.maketecheasier.com/assets/uploads/2017/10/rc-apropos.jpg)
这就搜索出结果了,非常直接。给 “directory” 加上复数后再试一下。
```shell
apropos "list directories"
```
这次没用了。`apropos` 所作的其实就是搜索一系列命令的描述。描述不匹配的命令不会纳入结果中。
还有其他的用法。通过 `-a` 标志,你可以以更灵活的方式来增加搜索关键字。试试这条命令:
```shell
apropos "match pattern"
```
![Linux Apropos -a Flag](https://www.maketecheasier.com/assets/uploads/2017/10/rc-apropos-a.jpg)
你会觉得应该会有一些匹配的内容出现,比如 [grep][2] 对吗? 然而实际上并没有匹配出任何结果。再说一次apropos 只会根据字面内容进行搜索。
现在让我们试着用 `-a` 标志来把单词分割开来。LCTT 译注该选项的意思是“and”即多个关键字都存在但是不需要正好是连在一起的字符串。
```shell
apropos "match" -a "pattern"
```
这一下,你可以看到很多期望的结果了。
`apropos` 是一个很棒的工具,不过你需要留意它的缺陷。
### ZSH
![Linux ZSH Autocomplete](https://www.maketecheasier.com/assets/uploads/2017/10/rc-zsh.jpg)
ZSH 其实并不是用于记忆命令的工具。它其实是一种 shell。你可以用 [ZSH][3] 来替代 Bash 作为你的命令行 shell。ZSH 包含了自动纠错机制,能在你输入命令的时候给你予提示。开启该功能后,它会提示你相近的选择。在 ZSH 中你可以像往常一样使用命令行,同时你还能享受到极度安全的网络以及其他一些非常好用的特性。充分利用 ZSH 的最简单方法就是使用 [Oh-My-ZSH][4]。
### 速记表
最后,也可能是最间的方法就是使用 [速记表][5]。
有很多在线的速记表,比如[这个][6] 可以帮助你快速查询命令。
![linux-commandline-cheatsheet](https://www.maketecheasier.com/assets/uploads/2013/10/linux-commandline-cheatsheet.gif)
为了快速查询,你可以寻找图片格式的速记表,然后将它设置为你的桌面墙纸。
这并不是记忆命令的最好方法,但是这么做可以帮你节省在线搜索遗忘命令的时间。
在学习时依赖这些方法,最终你会发现你会越来越少地使用这些工具。没有人能够记住所有的事情,因此偶尔遗忘掉某些东西或者遇到某些没有见过的东西也很正常。这也是这些工具以及因特网存在的意义。
--------------------------------------------------------------------------------
via: https://www.maketecheasier.com/remember-linux-commands/
作者:[Nick Congleton][a]
译者:[lujun9972](https://github.com/lujun9972)
校对:[wxy](https://github.com/wxy)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://www.maketecheasier.com/author/nickcongleton/
[1]: https://www.maketecheasier.com/alternative-linux-shells/
[2]: https://www.maketecheasier.com/what-is-grep-and-uses/
[3]: https://www.maketecheasier.com/understanding-the-different-shell-in-linux-zsh-shell/
[4]: https://github.com/robbyrussell/oh-my-zsh
[5]: https://www.maketecheasier.com/premium/cheatsheet/linux-command-line/
[6]: https://www.cheatography.com/davechild/cheat-sheets/linux-command-line/

View File

@ -0,0 +1,157 @@
tmate秒级分享你的终端会话
=================
不久前,我们写过一篇关于 [teleconsole](https://www.2daygeek.com/teleconsole-share-terminal-session-instantly-to-anyone-in-seconds/) 的介绍,该工具可用于快速分享终端给任何人(任何你信任的人)。今天我们要聊一聊另一款类似的应用,名叫 `tmate`
`tmate` 有什么用?它可以让你在需要帮助时向你的朋友们求助。
### 什么是 tmate
[tmate](https://tmate.io/) 的意思是 `teammates`,它是 tmux 的一个分支,并且使用相同的配置信息(例如快捷键配置,配色方案等)。它是一个终端多路复用器,同时具有即时分享终端的能力。它允许在单个屏幕中创建并操控多个终端,同时这些终端还能与其他同事分享。
你可以分离会话,让作业在后台运行,然后在想要查看状态时重新连接会话。`tmate` 提供了一个即时配对的方案,让你可以与一个或多个队友共享一个终端。
在屏幕的地步有一个状态栏,显示了当前会话的一些诸如 ssh 命令之类的共享信息。
### tmate 是怎么工作的?
- 运行 `tmate` 时,会通过 `libssh` 在后台创建一个连接到 tmate.io (由 tmate 开发者维护的后台服务器)的 ssh 连接。
- tmate.io 服务器的 ssh 密钥通过 DH 交换进行校验。
- 客户端通过本地 ssh 密钥进行认证。
- 连接创建后,本地 tmux 服务器会生成一个 150 位(不可猜测的随机字符)会话令牌。
- 队友能通过用户提供的 SSH 会话 ID 连接到 tmate.io。
### 使用 tmate 的必备条件
由于 `tmate.io` 服务器需要通过本地 ssh 密钥来认证客户机,因此其中一个必备条件就是生成 SSH 密钥 key。
记住,每个系统都要有自己的 SSH 密钥。
```shell
$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/magi/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/magi/.ssh/id_rsa.
Your public key has been saved in /home/magi/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:3ima5FuwKbWyyyNrlR/DeBucoyRfdOtlUmb5D214NC8 magi@magi-VirtualBox
The key's randomart image is:
+---[RSA 2048]----+
| |
| |
| . |
| . . = o |
| *ooS= . + o |
| . =.@*o.o.+ E .|
| =o==B++o = . |
| o.+*o+.. . |
| ..o+o=. |
+----[SHA256]-----+
```
### 如何安装 tmate
`tmate` 已经包含在某些发行版的官方仓库中,可以通过包管理器来安装。
对于 Debian/Ubuntu可以使用 [APT-GET 命令](https://www.2daygeek.com/apt-get-apt-cache-command-examples-manage-packages-debian-ubuntu-systems/)或者 [APT 命令](https://www.2daygeek.com/apt-command-examples-manage-packages-debian-ubuntu-systems/)to 来安装。
```shell
$ sudo apt-get install software-properties-common
$ sudo add-apt-repository ppa:tmate.io/archive
$ sudo apt-get update
$ sudo apt-get install tmate
```
你也可以从官方仓库中安装 tmate。
```shell
$ sudo apt-get install tmate
```
对于 Fedora使用 [DNF 命令](https://www.2daygeek.com/dnf-command-examples-manage-packages-fedora-system/) 来安装。
```shell
$ sudo dnf install tmate
```
对于基于 Arch Linux 的系统,使用 []()[Yaourt 命令](https://www.2daygeek.com/install-yaourt-aur-helper-on-arch-linux/)或 []()[Packer 命令](https://www.2daygeek.com/install-packer-aur-helper-on-arch-linux/) 来从 AUR 仓库中安装。
```shell
$ yaourt -S tmate
```
```shell
$ packer -S tmate
```
对于 openSUSE使用 [Zypper 命令](https://www.2daygeek.com/zypper-command-examples-manage-packages-opensuse-system/) 来安装。
```shell
$ sudo zypper in tmate
```
### 如何使用 tmate
成功安装后,打开终端然后输入下面命令,就会打开一个新的会话,在屏幕底部,你能看到 SSH 会话的 ID。
```shell
$ tmate
```
![](https://www.2daygeek.com/wp-content/uploads/2017/11/tmate-instantly-share-your-terminal-session-to-anyone-in-seconds-1.png)
要注意的是SSH 会话 ID 会在几秒后消失,不过不要紧,你可以通过下面命令获取到这些详细信息。
```shell
$ tmate show-messages
```
`tmate``show-messages` 命令会显示 tmate 的日志信息,其中包含了该 ssh 连接内容。
![](https://www.2daygeek.com/wp-content/uploads/2017/11/tmate-instantly-share-your-terminal-session-to-anyone-in-seconds-2.png)
现在,分享你的 SSH 会话 ID 给你的朋友或同事从而允许他们观看终端会话。除了 SSH 会话 ID 以外,你也可以分享 web URL。
另外你还可以选择分享的是只读会话还是可读写会话。
### 如何通过 SSH 连接会话
只需要在终端上运行你从朋友那得到的 SSH 终端 ID 就行了。类似下面这样。
```shell
$ ssh session: ssh 3KuRj95sEZRHkpPtc2y6jcokP@sg2.tmate.io
```
![](https://www.2daygeek.com/wp-content/uploads/2017/11/tmate-instantly-share-your-terminal-session-to-anyone-in-seconds-4.png)
### 如何通过 Web URL 连接会话
打开浏览器然后访问朋友给你的 URL 就行了。像下面这样。
![](https://www.2daygeek.com/wp-content/uploads/2017/11/tmate-instantly-share-your-terminal-session-to-anyone-in-seconds-3.png)
只需要输入 `exit` 就能退出会话了。
```
[Source System Output]
[exited]
[Remote System Output]
[server exited]
Connection to sg2.tmate.io closed by remote host。
Connection to sg2.tmate.io closed。
```
--------------------------------------------------------------------------------
via: https://www.2daygeek.com/tmate-instantly-share-your-terminal-session-to-anyone-in-seconds/
作者:[Magesh Maruthamuthu][a]
译者:[lujun9972](https://github.com/lujun9972)
校对:[wxy](https://github.com/wxy)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出

View File

@ -0,0 +1,75 @@
如何在 Linux 下安装安卓文件传输助手
===============
如果你尝试在 Ubuntu 下连接你的安卓手机,你也许可以试试 Linux 下的安卓文件传输助手。
本质上来说,这个应用是谷歌 macOS 版本的一个克隆。它是用 Qt 编写的,用户界面非常简洁,使得你能轻松在 Ubuntu 和安卓手机之间传输文件和文件夹。
现在,有可能一部分人想知道有什么是这个应用可以做,而 NautilusUbuntu 默认的文件资源管理器)不能做的,答案是没有。
当我将我的 Nexus 5X记得选择 [媒体传输协议 MTP][7] 选项)连接在 Ubuntu 上时,在 [GVfs][8]LCTT 译注: GNOME 桌面下的虚拟文件系统)的帮助下,我可以打开、浏览和管理我的手机,就像它是一个普通的 U 盘一样。
[![Nautilus MTP integration with a Nexus 5X](http://www.omgubuntu.co.uk/wp-content/uploads/2017/11/browsing-android-mtp-nautilus.jpg)][9]
但是*一些*用户在使用默认的文件管理器时,在 MTP 的某些功能上会出现问题:比如文件夹没有正确加载,创建新文件夹后此文件夹不存在,或者无法在媒体播放器中使用自己的手机。
这就是要为 Linux 系统用户设计一个安卓文件传输助手应用的原因,将这个应用当做将 MTP 设备安装在 Linux 下的另一种选择。如果你使用 Linux 下的默认应用时一切正常,你也许并不需要尝试使用它 (除非你真的很想尝试新鲜事物)。
![Android File Transfer Linux App](http://www.omgubuntu.co.uk/wp-content/uploads/2017/11/android-file-transfer-for-linux-750x662.jpg)
该 app 特点:
*   简洁直观的用户界面
*   支持文件拖放功能(从 Linux 系统到手机)
*   支持批量下载 (从手机到 Linux系统)
*   显示传输进程对话框
*   FUSE 模块支持
*   没有文件大小限制
*   可选命令行工具
### Ubuntu 下安装安卓手机文件助手的步骤
以上就是对这个应用的介绍,下面是如何安装它的具体步骤。
这有一个 [PPA](个人软件包集)源为 Ubuntu 14.04 LTS、16.04 LTS 和 Ubuntu 17.10 提供可用应用。
为了将这一 PPA 加入你的软件资源列表中,执行这条命令:
```
sudo add-apt-repository ppa:samoilov-lex/aftl-stable
```
接着,为了在 Ubuntu 下安装 Linux版本的安卓文件传输助手执行
```
sudo apt-get update && sudo apt install android-file-transfer
```
这样就行了。
你会在你的应用列表中发现这一应用的启动图标。
在你启动这一应用之前,要确保没有其他应用(比如 Nautilus已经挂载了你的手机。如果其它应用正在使用你的手机就会显示“无法找到 MTP 设备”。要解决这一问题,将你的手机从 Nautilus或者任何正在使用你的手机的应用上移除然后再重新启动安卓文件传输助手。
--------------------------------------------------------------------------------
via: http://www.omgubuntu.co.uk/2017/11/android-file-transfer-app-linux
作者:[JOEY SNEDDON][a]
译者:[wenwensnow](https://github.com/wenwensnow)
校对:[wxy](https://github.com/wxy)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://plus.google.com/117485690627814051450/?rel=author
[1]:https://plus.google.com/117485690627814051450/?rel=author
[2]:http://www.omgubuntu.co.uk/category/app
[3]:http://www.omgubuntu.co.uk/category/download
[4]:https://github.com/whoozle/android-file-transfer-linux
[5]:http://www.omgubuntu.co.uk/2017/11/android-file-transfer-app-linux
[6]:http://android.com/filetransfer?linkid=14270770
[7]:https://en.wikipedia.org/wiki/Media_Transfer_Protocol
[8]:https://en.wikipedia.org/wiki/GVfs
[9]:http://www.omgubuntu.co.uk/wp-content/uploads/2017/11/browsing-android-mtp-nautilus.jpg
[10]:https://launchpad.net/~samoilov-lex/+archive/ubuntu/aftl-stable

View File

@ -0,0 +1,72 @@
开源云技能认证:系统管理员的核心竞争力
=========
![os jobs](https://www.linux.com/sites/lcom/files/styles/rendered_file/public/open-house-sysadmin.jpg?itok=i5FHc3lu "os jobs")
> [2017年开源工作报告][1](以下简称“报告”)显示,具有开源云技术认证的系统管理员往往能获得更高的薪酬。
报告调查的受访者中53% 认为系统管理员是雇主们最期望被填补的职位空缺之一,因此,技术娴熟的系统管理员更受青睐而收获高薪职位,但这一职位,并没想象中那么容易填补。
系统管理员主要负责服务器和其他电脑操作系统的安装、服务支持和维护,及时处理服务中断和预防其他问题的出现。
总的来说今年的报告指出开源领域人才需求最大的有开源云47%应用开发44%大数据43%开发运营和安全42%)。
此外报告对人事经理的调查显示58% 期望招揽更多的开源人才67% 认为开源人才的需求增长会比业内其他领域更甚。有些单位视开源人才为招聘最优选则,它们招聘的开源人才较上年增长了 2 个百分点。
同时89% 的人事经理认为很难找到颇具天赋的开源人才。
### 为什么要获取认证
报告显示,对系统管理员的需求刺激着人事经理为 53% 的组织/机构提供正规的培训和专业技术认证,而这一比例去年为 47%。
对系统管理方面感兴趣的 IT 人才考虑获取 Linux 认证已成为行业规律。随便查看几个知名的招聘网站,你就能发现:[CompTIA Linux+][3] 认证是入门级 Linux 系统管理员的最高认证;如果想胜任高级别的系统管理员职位,获取[红帽认证工程师RHCE][4]和[红帽认证系统管理员RHCSA][5]则是不可或缺的。
戴士Dice[2017 技术行业薪资调查][6]显示2016 年系统管理员的薪水为 79,538 美元,较上年下降了 0.8%;系统架构师的薪水为 125,946 美元,同比下降 4.7%。尽管如此,该调查发现“高水平专业人才仍最受欢迎,特别是那些精通支持产业转型发展所需技术的人才”。
在开源技术方面HBase一个开源的分布式数据库技术人才的薪水在戴士 2017 技术行业薪资调查中排第一。在计算机网络和数据库领域,掌握 OpenVMS 操作系统技术也能获得高薪。
### 成为出色的系统管理员
出色的系统管理员须在问题出现时马上处理,这意味着你必须时刻准备应对可能出现的状况。这个职位追求“零责备的、精益的、流程或技术上交互式改进的”思维方式和善于自我完善的人格,成为一个系统管理员意味着“你必将与开源软件如 Linux、BSD 甚至开源 Solaris 等结下不解之缘”Paul English ^译注1 在 [opensource.com][7] 上发文指出。
Paul English 认为,现在的系统管理员较以前而言,要更多地与软件打交道,而且要能够编写脚本来协助系统管理。
>译注1Paul English计算机科学学士UNIX/Linux 系统管理员PreOS Security Inc. 公司 CEO2015-2017 年于为推动系统管理员发展实践的非盈利组织——<ruby>专业系统管理员联盟<rt>League of Professional System Administrator</rt></ruby>担任董事会成员。
### 展望 2018
[Robert Half 2018 年技术人才薪资导览][8]预测 2018 年北美地区许多单位将聘用大量系统管理方面的专业人才,同时个人软实力和领导力水平作为优秀人才的考量因素,越来越受到重视。
该报告指出:“良好的聆听能力和批判性思维能力对于理解和解决用户的问题和担忧至关重要,也是 IT 从业者必须具备的重要技能,特别是从事服务台和桌面支持工作相关的技术人员。”
这与[Linux基金会][9]^译注2 提出的不同阶段的系统管理员必备技能相一致,都强调了强大的分析能力和快速处理问题的能力。
>译注2<ruby>Linux 基金会<rt>The Linux Foundation</rt></ruby>,成立于 2000 年,致力于围绕开源项目构建可持续发展的生态系统,以加速开源项目的技术开发和商业应用;它是世界上最大的开源非盈利组织,在推广、保护和推进 Linux 发展,协同开发,维护“历史上最大的共享资源”上功勋卓越。
如果想逐渐爬上系统管理员职位的金字塔上层,还应该对系统配置的结构化方法充满兴趣;且拥有解决系统安全问题的经验;用户身份验证管理的经验;与非技术人员进行非技术交流的能力;以及优化系统以满足最新的安全需求的能力。
- [下载][10]2017年开源工作报告全文以获取更多信息。
-----------------------
via: https://www.linux.com/blog/open-source-cloud-skills-and-certification-are-key-sysadmins
作者:[linux.com][a]
译者:[wangy325](https://github.com/wangy325)
校对:[wxy](https://github.com/wxy)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://www.linux.com/blog/open-source-cloud-skills-and-certification-are-key-sysadmins
[1]:https://www.linuxfoundation.org/blog/2017-jobs-report-highlights-demand-open-source-skills/
[2]:https://www.linux.com/licenses/category/creative-commons-zero
[3]:https://certification.comptia.org/certifications/linux?tracking=getCertified/certifications/linux.aspx
[4]:https://www.redhat.com/en/services/certification/rhce
[5]:https://www.redhat.com/en/services/certification/rhcsa
[6]:http://marketing.dice.com/pdf/Dice_TechSalarySurvey_2017.pdf?aliId=105832232
[7]:https://opensource.com/article/17/7/truth-about-sysadmins
[8]:https://www.roberthalf.com/salary-guide/technology
[9]:https://www.linux.com/learn/10-essential-skills-novice-junior-and-senior-sysadmins%20%20
[10]:http://bit.ly/2017OSSjobsreport

View File

@ -0,0 +1,97 @@
在命令行中使用 DuckDuckGo 搜索
=============
![](http://www.omgubuntu.co.uk/wp-content/uploads/2017/11/duckduckgo.png)
此前我们介绍了[如何在命令行中使用 Google 搜索][3]。许多读者反馈说他们平时使用 [Duck Duck Go][4],这是一个功能强大而且保密性很强的搜索引擎。
正巧,最近出现了一款能够从命令行搜索 DuckDuckGo 的工具。它叫做 ddgr我把它读作 “dodger”非常好用。
像 [Googler][7] 一样ddgr 是一个完全开源而且非官方的工具。没错,它并不属于 DuckDuckGo。所以如果你发现它返回的结果有些奇怪请先询问这个工具的开发者而不是搜索引擎的开发者。
### DuckDuckGo 命令行应用
![](http://www.omgubuntu.co.uk/wp-content/uploads/2017/11/ddgr-gif.gif)
[DuckDuckGo BangsDuckDuckGo 快捷搜索)][8] 可以帮助你轻易地在 DuckDuckGo 上找到想要的信息(甚至 _本网站 omgubuntu_ 都有快捷搜索。ddgr 非常忠实地呈现了这个功能。
和网页版不同的是,你可以更改每页返回多少结果。这比起每次查询都要看三十多条结果要方便一些。默认界面经过了精心设计,在不影响可读性的情况下尽量减少了占用空间。
`ddgr` 有许多功能和亮点,包括:
* 更改搜索结果数
* 支持 Bash 自动补全
* 使用 DuckDuckGo Bangs
* 在浏览器中打开链接
* ”手气不错“选项
* 基于时间、地区、文件类型等的筛选功能
* 极少的依赖项
你可以从 Github 的项目页面上下载支持各种系统的 `ddgr`
- [从 Github 下载 “ddgr”][9]
另外,在 Ubuntu 16.04 LTS 或更新版本中,你可以使用 PPA 安装 ddgr。这个仓库由 ddgr 的开发者维护。如果你想要保持在最新版本的话,推荐使用这种方式安装。
需要提醒的是,在本文创作时,这个 PPA 中的 ddgr _并不是_ 最新版本,而是一个稍旧的版本(缺少 -num 选项)。
使用以下命令添加 PPA
```
sudo add-apt-repository ppa:twodopeshaggy/jarun
sudo apt-get update
```
### 如何使用 ddgr 在命令行中搜索 DuckDuckGo
安装完毕后,你只需打开你的终端模拟器,并运行:
```
ddgr
```
然后输入查询内容:
```
search-term
```
你可以限制搜索结果数:
```
ddgr --num 5 search-term
```
或者自动在浏览器中打开第一条搜索结果:
```
ddgr -j search-term
```
你可以使用参数和选项来提高搜索精确度。使用以下命令来查看所有的参数:
```
ddgr -h
```
--------------------------------------------------------------------------------
via: http://www.omgubuntu.co.uk/2017/11/duck-duck-go-terminal-app
作者:[JOEY SNEDDON][a]
译者:[yixunx](https://github.com/yixunx)
校对:[wxy](https://github.com/wxy)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://plus.google.com/117485690627814051450/?rel=author
[1]:https://plus.google.com/117485690627814051450/?rel=author
[2]:http://www.omgubuntu.co.uk/category/download
[3]:http://www.omgubuntu.co.uk/2017/08/search-google-from-the-command-line
[4]:http://duckduckgo.com/
[5]:http://www.omgubuntu.co.uk/2017/11/duck-duck-go-terminal-app
[6]:https://github.com/jarun/ddgr
[7]:https://github.com/jarun/googler
[8]:https://duckduckgo.com/bang
[9]:https://github.com/jarun/ddgr/releases/tag/v1.1

View File

@ -1,130 +0,0 @@
Translating by chao-zhi
Be a force for good in your community
============================================================
>Find out how to give the gift of an out, learn about the power of positive intent, and more.
![Be a force for good in your community](https://opensource.com/sites/default/files/styles/image-full-size/public/images/life/people_remote_teams_world.png?itok=wI-GW8zX "Be a force for good in your community")
>Image by : opensource.com
Passionate debate is among the hallmark traits of open source communities and open organizations. On our best days, these debates are energetic and constructive. They are heated, yet moderated with humor and goodwill. All parties remain focused on facts, on the shared purpose of collaborative problem-solving, and driving continuous improvement. And for many of us, they're just plain fun.
On our worst days, these debates devolve into rehashing the same old arguments on the same old topics. Or we turn on one another, delivering insults—passive-aggressive or outright nasty, depending on our style—and eroding the passion, trust, and productivity of our communities.
We've all been there, watching and feeling helpless, as a community conversation begins to turn toxic. Yet, as [DeLisa Alexander recently shared][1], there are so many ways that each and every one of us can be a force for good in our communities.
In the first article of this "open culture" series, I will share a few strategies for how you can intervene, in that crucial moment, and steer everyone to a more positive and productive place.
### Don't call people out. Call them up.
Recently, I had lunch with my friend and colleague, [Mark Rumbles][2]. Over the years, we've collaborated on a number of projects that support open culture and leadership at Red Hat. On this day, Mark asked me how I was holding up, as he saw I'd recently intervened in a mailing list conversation when I saw the debate was getting ugly.
Fortunately, the dust had long since settled, and in fact I'd almost forgotten about the conversation. Nevertheless, it led us to talk about the challenges of open and frank debate in a community that has thousands of members.
>One of the biggest ways we can be a force for good in our communities is to respond to conflict in a way that compels everyone to elevate their behavior, rather than escalate it.
Mark said something that struck me as rather insightful. He said, "You know, as a community, we are really good at calling each other out. But what I'd like to see us do more of is calling each other _up_."
Mark is absolutely right. One of the biggest ways we can be a force for good in our communities is to respond to conflict in a way that compels everyone to elevate their behavior, rather than escalate it.
### Assume positive intent
We can start by making a simple assumption when we observe poor behavior in a heated conversation: It's entirely possible that there are positive intentions somewhere in the mix.
This is admittedly not an easy thing to do. When I see signs that a debate is turning nasty, I pause and ask myself what Steven Covey calls The Humanizing Question:
"Why would a reasonable, rational, and decent person do something like this?"
Now, if this is one of your "usual suspects"—a community member with a propensity toward negative behavior--perhaps your first thought is, "Um, what if this person _isn't_ reasonable, rational, or decent?"
Stay with me, now. I'm not suggesting that you engage in some touchy-feely form of self-delusion. It's called The Humanizing Question not only because asking it humanizes the other person, but also because it humanizes _you_.
And that, in turn, helps you respond or intervene from the most productive possible place.
### Seek to understand the reasons for community dissent
When I ask myself why a reasonable, rational, and decent person might do something like this, time and again, it comes down to the same few reasons:
* They don't feel heard.
* They don't feel respected.
* They don't feel understood.
One easy positive intention we can apply to almost any poor behavior, then, is that the person wants to be heard, respected, or understood. That's pretty reasonable, I suppose.
By standing in this more objective and compassionate place, we can see that their behavior is _almost certainly _**_not_**_ going to help them get what they want, _and that the community will suffer as a result . . . without our help.
For me, that inspires a desire to help everyone get "unstuck" from this ugly place we're in.
Before I intervene, though, I ask myself a follow-up question: _What other positive intentions might be driving this behavior?_
Examples that readily jump to mind include:
* They are worried that we're missing something important, or we're making a mistake, and no one else seems to see it.
* They want to feel valued for their contributions.
* They are burned out, because of overworking in the community or things happening in their personal life.
* They are tired of something being broken and frustrated that no one else seems to see the damage or inconvenience that creates.
* ...and so on and so forth.
With that, I have a rich supply of positive intent that I can ascribe to their behavior. I'm ready to reach out and offer them some help, in the form of an out.
### Give the gift of an out
What is an out? Think of it as an escape hatch. It's a way to exit the conversation, or abandon the poor behavior and resume behaving like a decent person, without losing face. It's calling someone up, rather than calling them out.
You've probably experienced this, as some point in your life, when _you_ were behaving poorly in a conversation, ranting and hollering and generally raising a fuss about something or another, and someone graciously offered _you_ a way out. Perhaps they chose not to "take the bait" by responding to your unkind choice of words, and instead, said something that demonstrated they believed you were a reasonable, rational, and decent human being with positive intentions, such as:
> _So, uh, what I'm hearing is that you're really worried about this, and you're frustrated because it seems like no one is listening. Or maybe you're concerned that we're missing the significance of it. Is that about right?_
And here's the thing: Even if that wasn't entirely true (perhaps you had less-than-noble intentions), in that moment, you probably grabbed ahold of that life preserver they handed you, and gladly accepted the opportunity to reframe your poor behavior. You almost certainly pivoted and moved to a more productive place, likely without even recognizing it.
Perhaps you said something like, "Well, it's not that exactly, but I just worry that we're headed down the wrong path here, and I get what you're saying that as community, we can't solve every problem at the same time, but if we don't solve this one soon, bad things are going to happen…"
In the end, the conversation almost certainly began to move to a more productive place, or you all agreed to disagree.
We all have the opportunity to offer an upset person a safe way out of that destructive place they're operating from. Here's how.
### Bad behavior or bad actor?
If the person is particularly agitated, they may not hear or accept the first out you hand them. That's okay. Most likely, their lizard brain--that prehistoric amygdala that was once critical for human survival—has taken over, and they need a few more moments to recognize you're not a threat. Just keep gently but firmly treating them as if they _were_ a rational, reasonable, decent human being, and watch what happens.
In my experience, these community interventions end in one of three ways:
Most often, the person actually _is_ a reasonable person, and soon enough, they gratefully and graciously accept the out. In the process, everyone breaks out of the black vs. white, "win or lose" mindset. People begin to think up creative alternatives and "win-win" outcomes that benefit everyone.
>Why would a reasonable, rational, and decent person do something like this?
Occasionally, the person is not particularly reasonable, rational, or decent by nature, but when treated with such consistent, tireless, patient generosity and kindness (by you), they are shamed into retreating from the conversation. This sounds like, "Well, I think I've said all I have to say. Thanks for hearing me out." Or, for less enlightened types, "Well, I'm tired of this conversation. Let's drop it." (Yes, please. Thank you.)
Less often, the person is what's known as a _bad actor_, or in community management circles, a pot-stirrer. These folks do exist, and they thrive on drama. Guess what? By consistently engaging in a kind, generous, community-calming way, and entirely ignoring all attempts to escalate the situation, you effectively shift the conversation into an area that holds little interest for them. They have no choice but to abandon it. Winners all around.
That's the power of assuming positive intent. By responding to angry and hostile words with grace and dignity, you can diffuse a flamewar, untangle and solve tricky problems, and quite possibly make a new friend or two in the process.
Am I successful every time I apply this principle? Heck, no. But I never regret the choice to assume positive intent. And I can vividly recall a few unfortunate occasions when I assumed negative intent and responded in a way that further contributed to the problem.
Now it's your turn. I'd love to hear about some strategies and principles you apply, to be a force for good when conversations get heated in your community. Share your thoughts in the comments below.
Next time, we'll explore more ways to be a force for good in your community, and I'll share some tips for handling "Mr. Grumpy."
--------------------------------------------------------------------------------
作者简介:
![](https://opensource.com/sites/default/files/styles/profile_pictures/public/pictures/headshot-square_0.jpg?itok=FS97b9YD)
Rebecca Fernandez is a Principal Employment Branding + Communications Specialist at Red Hat, a contributor to The Open Organization book, and the maintainer of the Open Decision Framework. She is interested in open source and the intersection of the open source way with business management models. Twitter: @ruhbehka
--------------------------------------------------------------------------------
via: https://opensource.com/open-organization/17/1/force-for-good-community
作者:[Rebecca Fernandez][a]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://opensource.com/users/rebecca
[1]:https://opensource.com/business/15/5/5-ways-promote-inclusive-environment
[2]:https://twitter.com/leadership_365

View File

@ -1,290 +0,0 @@
GOOGLE CHROMEONE YEAR IN
========================================
Four weeks ago, emailed notice of a free massage credit revealed that Ive been at Google for a year. Time flies when youre [drinking from a firehose][3].
When I mentioned my anniversary, friends and colleagues from other companies asked what Ive learned while working on Chrome over the last year. This rambling post is an attempt to answer that question.
### NON-MASKABLE INTERRUPTS
While I _started_ at Google just over a year ago, I havent actually _worked_ there for a full year yet. My second son (Nate) was born a few weeks early, arriving ten workdays after my first day of work.
I took full advantage of Googles very generous twelve weeks of paternity leave, taking a few weeks after we brought Nate home, and the balance as spring turned to summer. In a year, we went from having an enormous infant to an enormous toddler whos taking his first steps and trying to emulate everything his 3 year-old brother (Noah) does.
![Baby at the hospital](https://textplain.files.wordpress.com/2017/01/image55.png?w=318&h=468 "New Release")
![First birthday cake](https://textplain.files.wordpress.com/2017/01/image56.png?w=484&h=466)
I mention this because its had a huge impact on my work over the last year—_much_ more than Id naively expected.
When Noah was born, Id been at Telerik for [almost a year][4], and Id been hacking on Fiddler alone for nearly a decade. I took a short paternity leave, and my coding hours shifted somewhat (I started writing code late at night between bottle feeds), but otherwise my work wasnt significantly impacted.
As I pondered joining Google Chromes security team, I expected pretty much the same—a bit less sleep, a bit of scheduling awkwardness, but I figured things would fall into a good routine in a few months.
Things turned out somewhat differently.
Perhaps sensing that my life had become too easy, fate decided that 2016 was the year Id get sick. _Constantly_. (Our theory is that Noah was bringing home germs from pre-school; he got sick a bunch too, but recovered quickly each time.) I was sick more days in 2016 than I was in the prior decade, including a month-long illness in the spring. _That_ ended with a bout of pneumonia that concluded with a doctor-mandated seven days away from the office. As I coughed my brains out on the sofa at home, I derived some consolation in thinking about Googles generous life insurance package. But for the most part, my illnesses were minor—enough to keep me awake at night and coughing all day, but otherwise able to work.
Mathematically, you might expect two kids to be twice as much work as one, but in our experience, it hasnt worked out that way. Instead, it varies between 80% (when the kids happily play together) to 400% (when theyre colliding like atoms in a runaway nuclear reactor). Thanks to my wifes heroic efforts, we found a workable _daytime _routine. The nights, however, have been unexpectedly difficult. Big brother Noah is at an age where he usually sleeps through the night, but hes sure to wake me up every morning at 6:30am sharp. Fortunately, Nate has been a pretty good sleeper, but even now, at just over a year old, he usually still wakes up and requires attention twice a night or so.
I cant_ remember _the last time I had eight hours of sleep in a row. And thats been _extremely _challenging… because I cant remember _much else_ either. Learning new things when you dont remember them the next day is a brutal, frustrating process.
When Noah was a baby, I could simply sleep in after a long night. Even if I didnt get enough sleep, it wouldnt really matter—Id been coding in C# on Fiddler for a decade, and deadlines were few and far between. If all else failed, Id just avoid working on any especially gnarly code and spend the day handling support requests, updating graphics, or doing other simple and straightforward grunt work from my backlog.
Things are much different on Chrome.
### ROLES
When I first started talking to the Chrome Security team about coming aboard, it was for a role on the Developer Advocacy team. Id be driving HTTPS adoption across the web and working with big sites to unblock their migrations in any way I could. Id already been doing the first half of that for fun (delivering [talks][5] at conferences like Codemash and [Velocity][6]), and Id previously spent eight years as a Security Program Manager for the Internet Explorer team. I had _tons _of relevant experience. Easy peasy.
I interviewed for the Developer Advocate role. The hiring committee kicked back my packet and said I should interview as a Technical Program Manager instead.
I interviewed as a Technical Program Manager. The hiring committee kicked back my packet and said I should interview as a Developer Advocate instead.
The Chrome team resolved the deadlock by hiring me as a Senior Software Engineer (SWE).
I was initially _very _nervous about this, having not written any significant C++ code in over a decade—except for one [in-place replacement][7] of IE9s caching logic which Id coded as a PM because I couldnt find a developer to do the work. But eventually I started believing in my own pep talk: _“I mean, how hard could it be, right? Ive been troubleshooting code in web browsers for almost two decades now. Im not a complete dummy. Ill ramp up. Itll be rough, but itll work out. Hell, I started writing Fiddler not knowing either C# nor HTTP, and _that _turned out pretty good. Ill buy some books and get caught up. Theres no way that Google would have just hired me as a C++ developer without asking me any C++ coding questions if it wasnt going to all be okay. Right? Right?!?”_
### THE FIREHOSE
I knew I had a lot to learn, and fast, but it took me a while to realize just how much else I didnt know.
Googles primary development platform is Linux, an OS that I would install every few years, play with for a day, then forget about. My new laptop was a Mac, a platform Id used a bit more, but still one for which I was about a twentieth as proficient as I was on Windows. The Chrome Windows team made a half-hearted attempt to get me to join their merry band, but warned me honestly that some of the tooling wasnt quite as good as it was on Linux and itd probably be harder for me to get help. So I tried to avoid Windows for the first few months, ordering a puny Windows machine that took around four times longer to build Chrome than my obscenely powerful Linux box (with its 48 logical cores). After a few months, I gave up on trying to avoid Windows and started using it as my primary platform. I was more productive, but incredibly slow builds remained a problem for a few months. Everyone told me to just order _another_ obscenely powerful box to put next to my Linux one, but it felt wrong to have hardware at my desk that collectively cost more than my first car—especially when, at Microsoft, I bought all my own hardware. I eventually mentioned my cost/productivity dilemma to a manager, who noted I was getting paid a Google engineers salary and then politely asked me if I was just really terrible at math. I ordered a beastly Windows machine and now my builds scream. (To the extent that _any_ C++ builds can scream, of course. At Telerik, I was horrified when a full build of Fiddler slowed to a full 5 seconds on my puny Windows machine; my typical Chrome build today still takes about 15 minutes.)
Beyond learning different operating systems, Id never used Googles apps before (Docs/Sheets/Slides); luckily, I found these easy to pick up, although I still havent fully figured out how Google Drive file organization works. Google Docs, in particular, is so good that Ive pretty much given up on Microsoft Word (which headed downhill after the 2010 version). Google Keep is a low-powered alternative to OneNote (which is, as far as I can tell, banned because it syncs to Microsoft servers) and I havent managed to get it to work well for my needs. Google Plus still hasnt figured out how to support pasting of images via CTRL+V, a baffling limitation for something meant to compete in the space… hell, even _Microsoft Yammer _supports that, for gods sake. The only real downside to the web apps is that tab/window management on modern browsers is still a very much unsolved problem (but more on that in a bit).
But these speedbumps all pale in comparison to Gmail. Oh, Gmail. As a program manager at Microsoft, pretty much your _entire life _is in your inbox. After twelve years with Outlook and Exchange, switching to Gmail was a train wreck. “_What do you mean, there arent folders? How do I mark this message as low priority? Wheres the button to format text with strikethrough? What do you mean, I cant drag an email to my calendar? What the hell does this Archive thing do? Wheres that message I was just looking at? Hell, where did my Gmail tab even go—it got lost in a pile of sixty other tabs across four top-level Chrome windows. WTH??? How does anyone get anything done?”_
### COMMUNICATION AND REMOTE WORK
While Telerik had an office in Austin, I didnt interact with other employees very often, and when I did they were usually in other offices. I thought I had a handle on remote work, but I really didnt. Working with a remote team on a daily basis is just _different_.
With communication happening over mail, IRC, Hangouts, bugs, document markup comments, GVC (video conferencing), G+, and discussion lists, it was often hard to [figure out which mechanisms to use][8], let alone which recipients to target. Undocumented pitfalls abounded (many discussion groups were essentially abandoned while others were unexpectedly broad; turning on chat history was deemed a “no-no” for document retention reasons).
It often it took a bit of research to even understand who various communication participants were and how they related to the projects at hand.
After years of email culture at Microsoft, I grew accustomed to a particular style of email, and Googles is just _different._ Mail threads were long, with frequent additions of new recipients and many terse remarks. Many times, Id reply privately to someone on a side thread, with a clarifying question, or suggesting a counterpoint to something they said. The response was often “_Hey, this just went to me. Mind adding on the main thread?_”
Im working remotely, with peers around the world, so real-time communication with my team is essential. Some Chrome subteams use Hangouts, but the Security team largely uses IRC.
[
![XKCD comic on IRC](https://textplain.files.wordpress.com/2017/01/image30.png?w=1320&h=560 "https://xkcd.com/1782/")
][9]
Now, Ive been chatting with people online since BBSes were a thing (Ive got a five digit ICQ number somewhere), but my knowledge of IRC was limited to the fact that it was a common way of taking over suckers machines with buffer overflows in the 90s. My new teammates tried to explain how to IRC repeatedly: “_Oh, its easy, you just get this console IRC client. No, no, you dont run it on your own workstation, thatd be crazy. You wouldnt have history! You provision a persistent remote VM on a machine in Googles cloud, then SSH to that, then you run screens and then you run your IRC client in that. Easy peasy._”
Getting onto IRC remained on my “TODO” list for five months before I finally said “F- it”, installed [HexChat][10] on my Windows box, disabled automatic sleep, and called it done. Its worked fairly well.
### GOOGLE DEVELOPER TOOLING
When an engineer first joins Google, they start with a week or two of technical training on the Google infrastructure. Ive worked in software development for nearly two decades, and Ive never even dreamed of the development environment Google engineers get to use. I felt like Charlie Bucket on his tour of Willa Wonkas Chocolate Factory—astonished by the amazing and unbelievable goodies available at any turn. The computing infrastructure was something out of Star Trek, the development tools were slick and amazing, the _process_ was jaw-dropping.
While I was doing a “hello world” coding exercise in Googles environment, a former colleague from the IE team pinged me on Hangouts chat, probably because hed seen my tweets about feeling like an imposter as a SWE.  He sent me a link to click, which I did. Code from Googles core advertising engine appeared in my browser. Googles engineers have access to nearly all of the code across the whole company. This alone was astonishing—in contrast, Id initially joined the IE team so I could get access to the networking code to figure out why the Office Online teams website wasnt working. “Neat, I can see everything!” I typed back. “Push the Analyze button” he instructed. I did, and some sort of automated analyzer emitted a report identifying a few dozen performance bugs in the code. “Wow, thats amazing!” I gushed. “Now, push the Fix button” he instructed. “Uh, this isnt some sort of security red team exercise, right?” I asked. He assured me that it wasnt. I pushed the button. The code changed to fix some unnecessary object copies. “Amazing!” I effused. “Click Submit” he instructed. I did, and watched as the system compiled the code in the cloud, determined which tests to run, and ran them. Later that afternoon, an owner of the code in the affected folder typed LGTM (Googlers approve changes by typing the acronym for Looks Good To Me) on the change list I had submitted, and my change was live in production later that day. I was, in a word, gobsmacked. That night, I searched the entire codebase for [misuse][11] of an IE cache control token and proposed fixes for the instances I found. I also narcissistically searched for my own name and found a bunch of references to blog posts Id written about assorted web development topics.
Unfortunately for Chrome Engineers, the introduction to Googles infrastructure is followed by a major letdown—because Chromium is open-source, the Chrome team itself doesnt get to take advantage of most of Googles internal goodies. Development of Chrome instead resembles C++ development at most major companies, albeit with an automatically deployed toolchain and enhancements like a web-based code review tool and some super-useful scripts. The most amazing of these is called [bisect-builds][12], and it allows a developer to very quickly discover what build of Chrome introduced a particular bug. You just give it a “known good” build number and a “known bad” build number and it  automatically downloads and runs the minimal number of builds to perform a binary search for the build that introduced a given bug:
![Console showing bisect builds running](https://textplain.files.wordpress.com/2017/01/image31.png?w=1320&h=514 "Binary searching for regressions")
Firefox has [a similar system][13], but Idve killed for something like this back when I was reproducing and reducing bugs in IE. While its easy to understand how the system functions, it works so well that it feels like magic. Other useful scripts include the presubmit checks that run on each change list before you submit them for code review—they find and flag various style violations and other problems.
Compilation itself typically uses a local compiler; on Windows, we use the MSVC command line compiler from Visual Studio 2015 Update 3, although work is underway to switch over to [Clang][14]. Compilation and linking all of Chrome takes quite some time, although on my new beastly dev boxes its not _too_ bad. Googlers do have one special perk—we can use Goma (a distributed compiler system that runs on Googles amazing internal cloud) but I havent taken advantage of that so far.
For bug tracking, Chrome recently moved to [Monorail][15], a straightforward web-based bug tracking system. It works fairly well, although it is somewhat more cumbersome than it needs to be and would be much improved with [a few tweaks][16]. Monorail is open-source, but I havent committed to it myself yet.
For code review, Chrome presently uses [Rietveld][17], a web-based system, but this is slated to change in the near(ish) future. Like Monorail, its pretty straightforward although it would benefit from some minor usability tweaks; I committed one trivial change myself, but the pending migration to a different system means that it isnt likely to see further improvements.
As an open-source project, Chromium has quite a bit of public [documentation for developers][18], including [Design Documents][19]. Unfortunately, Chrome moves so fast that many of the design documents are out-of-date, and its not always obvious whats current and what was replaced long ago. The team does _value_ engineers investment in the documents, however, and various efforts are underway to update the documents and reduce Chromes overall architectural complexity. I expect these will be ongoing battles forever, just like in any significant active project.
### WHAT IVE DONE
“Thats all well and good,” my reader asks, “but _what have you done_ in the last year?”
### I WROTE SOME CODE
My first check in to Chrome [landed][20] in February; it was a simple adjustment to limit Public-Key-Pins to 60 days. Assorted other checkins trickled in through the spring before I went on paternity leave. The most _fun_ fix I did cleaned up a tiny [UX glitch][21] that sat unnoticed in Chrome for almost a decade; it was mostly interesting because it was a minor thing that Id tripped over for years, including back in IE. (The root cause was arguably that MSDN documentation about DWM lied; I fixed the bug in Chrome, sent the fix to IE, and asked MSDN to fix their docs).
I fixed a number of [minor][22] [security][23] [bugs][24], and lately Ive been working on [UX issues][25] related to Chromes HTTPS user-experience. Back in 2005, I wrote [a blog post][26] complaining about websites using HTTPS incorrectly, and now, just over a decade later, Chrome and Firefox are launching UI changes to warn users when a site is collecting sensitive information on pages which are Not Secure; Im delighted to have a small part in those changes.
Having written a handful of Internet Explorer Extensions in the past, I was excited to discover the joy of writing Chrome extensions. Chrome extensions are fun, simple, and powerful, and theres none of the complexity and crashes of COM.
[
![My 3 Chrome Extensions](https://textplain.files.wordpress.com/2017/01/image201.png?w=1288&h=650 "My 3 Chrome Extensions")
][27]
My first and most significant extension is the moarTLS Analyzer its related to my HTTPS work at Google and its proven very useful in discovering sites that could improve their security. I [blogged about it][28] and the process of [developing it][29] last year.
Because I run several different Chrome instances on my PC (and they update daily or weekly), I found myself constantly needing to look up the Chrome version number for bug reports and the like. I wrote a tiny extension that shows the version number in a button on the toolbar (so its captured in screenshots too!):
![Show Chrome Version screenshot](https://textplain.files.wordpress.com/2017/02/image.png?w=886&h=326 "Show Chrome Version")
More than once, I spent an hour or so trying to reproduce and reduce a bug that had been filed against Chrome. When I found out the cause, Id jubilently add my notes to the issue in the Monorail bug tracker, click “Save changes” and discover that someone more familiar with the space had beaten me to the punch and figured it out while Id had the bug open on my screen. Adding an “Issue has been updated” alert to the bug tracker itself seemed like the right way to go, but it would require some changes that I wasnt able to commit on my own. So, instead I built an extension that provides such alerts within the page until the [feature][30] can be added to the tracker itself.
Each of these extensions was a joy to write.
### I FILED SOME BUGS
Im a diligent self-hoster, and I run Chrome Canary builds on all of my devices. I submit crash reports and [file bugs][31] with as much information as I can. My proudest moment was in helping narrow down a bizarre and intermittent problem users had with Chrome on Windows 10, where Chrome tabs would crash on every startup until you rebooted the OS. My [blog post][32] explains the full story, and encourages others to file bugs as they encounter them.
### I TRIAGED MORE BUGS
Ive been developing software for Windows for just over two decades, and inevitably Ive learned quite a bit about it, including the undocumented bits. Thats given me a leg up in understanding bugs in the Windows code. Some of the most fun include issues in Drag and Drop, like this [gem][33] of a bug that means that you cant drop files from Chrome to most applications in Windows. More meaningful [bugs][34] [relate][35] [to][36] [problems][37] with Windows Mark-of-the-Web security feature (about which Ive [blogged][38] [about][39] [several][40] times).
### I TOOK SHERIFF ROTATIONS
Google teams have the notion of sheriffs—a rotating assignment that ensures that important tasks (like triaging incoming security bugs) always has a defined owner, without overwhelming any single person. Each Sheriff has a term of ~1 week where they take on additional duties beyond their day-to-day coding, designing, testing, etc.
The Sheriff system has some real benefits—perhaps the most important of which is creating a broad swath of people experienced and qualified in making triage decisions around security vulnerabilities. The alternative is to leave such tasks to a single owner, rapidly increasing their [bus factor][41] and thus the risk to the project. (I know this from first-hand experience. After IE8 shipped, I was on my way out the door to join another team. Then IEs Security PM left, leaving a gaping hole that I felt obliged to stay around to fill. It worked out okay for me and the team, but it was tense all around.)
Im on two sheriff rotations: [Enamel][42] (my subteam) and the broader Chrome Security Sheriff.
The Enamel rotations tasks are akin to what I used to do as a Program Manager at Microsoft—triage incoming bugs, respond to questions in the [Help Forums][43], and generally act as a point of contact for my immediate team.
In contrast, the Security Sheriff rotation is more work, and somewhat more exciting. The Security Sheriffs [duties][44] include triaging all bugs of type “Security”, assigning priority, severity, and finding an owner for each. Most security bugs are automatically reported by [our fuzzers][45] (a tireless robot army!), but we also get reports from the public and from Chrome team members and [Project Zero][46] too.
At Microsoft, incoming security bug reports were first received and evaluated by the Microsoft Security Response Center (MSRC); valid reports were passed along to the IE team after some level of analysis and reproduction was undertaken. In general, all communication was done through MSRC, and the turnaround cycle on bugs was _typically _on the order of weeks to months.
In contrast, anyone can [file a security bug][47] against Chrome, and every week lots of people do. One reason for that is that Chrome has a [Vulnerability Rewards program][48] which pays out up to $100K for reports of vulnerabilities in Chrome and Chrome OS. Chrome paid out just under $1M USD in bounties [last year][49]. This is an _awesome _incentive for researchers to responsibly disclose bugs directly to us, and the bounties are _much _higher than those of nearly any other project.
In his “[Hacker Quantified Security][50]” talk at the OReilly Security conference, HackerOne CTO and Cofounder Alex Rice showed the following chart of bounty payout size for vulnerabilities when explaining why he was using a Chromebook. Apologies for the blurry photo, but the line at the top shows Chrome OS, with the 90th percentile line miles below as severity rises to Critical:
[
![Vulnerability rewards by percentile. Chrome is WAY off the chart.](https://textplain.files.wordpress.com/2017/01/image_thumb6.png?w=962&h=622 "Chrome Vulnerability Rewards are Yuuuuge")
][51]
With a top bounty of $100000 for an exploit or exploit chain that fully compromises a Chromebook, researchers are much more likely to send their bugs to us than to try to find a buyer on the black market.
Bug bounties are great, except when theyre not. Unfortunately, many filers dont bother to read the [Chrome Security FAQ][52] which explains what constitutes a security vulnerability and the great many things that do not. Nearly every week, we have at least one person (and often more) file a bug noting “_I can use the Developer Tools to read my own password out of a webpage. Can I have a bounty?_” or “_If I install malware on my PC, I can see what happens inside Chrome” _or variations of these.
Because we take security bug reports very seriously, we often spend a lot of time on what seem like garbage filings to verify that theres not just some sort of communication problem. This exposes one downside of the sheriff process—the lack of continuity from week to week.
In the fall, we had one bug reporter file a new issue every week that was just a collection of security related terms (XSS! CSRF! UAF! EoP! Dangling Pointer! Script Injection!) lightly wrapped in prose, including screenshots, snippets from websites, console output from developer tools, and the like. Each week, the sheriff would investigate, ask for more information, and engage in a fruitless back and forth with the filer trying to figure out what claim was being made. Eventually I caught on to what was happening and started monitoring the sheriffs queue, triaging the new findings directly and sparing the sheriff of the week. But even today we still catch folks who lookup old bug reports (usually Wont Fixed issues), copy/paste the content into new bugs, and file them into the queue. Its frustrating, but coming from a closed bug database, Id choose the openness of the Chrome bug database every time.
Getting ready for my first Sherriff rotation, I started watching the incoming queue a few months earlier and felt ready for my first rotation in September. Day One was quiet, with a few small issues found by fuzzers and one or two junk reports from the public which I triaged away with pointers to the “_Why isnt a vulnerability_” entries in the Security FAQ. I spent the rest of the day writing a fix for a lower-priority security [bug][53] that had been filed a month before. A pretty successful day, I thought.
Day Two was more interesting. Scanning the queue, I saw a few more fuzzer issues and [one external report][54] whose text started with “Here is a Chrome OS exploit chain.” The report was about two pages long, and had a forty-two page PDF attachment explaining the four exploits the finder had used to take over a fully-patched Chromebook.
![Star Wars trench run photo](https://textplain.files.wordpress.com/2017/02/image1.png?w=478&h=244 "Defenses can't keep up!")
Watching Lukes X-wing take out the Death Star in Star Wars was no more exciting than reading the PDFs tale of how a single byte memory overwrite in the DNS resolver code could weave its way through the many-layered security features of the Chromebook and achieve a full compromise. It was like the most amazing magic trick youve ever seen.
I hopped over to IRC. “So, do we see full compromises of Chrome OS every week?” I asked innocently.
“No. Why?” came the reply from several corners. I pasted in the bug link and a few moments later the replies started flowing in “OMG. Amazing!” Even guys from Project Zero were impressed, and theyre magicians who build exploits like this (usually for other products) all the time. The researcher had found one small bug and a variety of neglected components that were thought to be unreachable and put together a deadly chain.
The first patches were out for code review that evening, and by the next day, wed reached out to the open-source owner of the DNS component with the 1-byte overwrite bug so he could release patches for the other projects using his code. Within a few days, fixes to other components landed and had been ported to all of the supported versions of Chrome OS. Two weeks later, the Chrome Vulnerability rewards team added the [reward-100000][55] tag, the only bug so far to be so marked. Four weeks after that, I had to hold my tongue when Alex mentioned that “no ones ever claimed that $100000 bounty” during his “Hacker Quantified Security” talk. Just under 90 days from filing, the bug was unrestricted and made available for public viewing.
The remainder of my first Sheriff rotation was considerably less exciting, although still interesting. I spent some time looking through the components the researcher had abused in his exploit chain and filed a few bugs. Ultimately, the most risky component he used was removed entirely.
### OUTREACH AND BLOGGING
Beyond working on the Enamel team (focused on Chromes security UI surface), I also work on the “MoarTLS” project, designed to help encourage and assist the web as a whole in moving to HTTPS. This takes a number of forms—I help maintain the [HTTPS on Top Sites Report Card][56], I do consultations and HTTPS Audits with major sites as they enable HTTPS on their sites. I discover, reduce, and file bugs on Chromes and other browsers support of features like Upgrade-Insecure-Requests. I publish a [running list of articles][57] on why and how sites should enable TLS. I hassle teams all over Google (and the web in general) to enable HTTPS on every single hyperlink they emit. I responsibly disclosed security bugs in a number of products and sites, including [a vulnerability][58] in Hillary Clintons fundraising emails. I worked to send a notification to many many many thousands of sites collecting user information non-securely, warning them of the [UI changes in Chrome 56][59].
When I applied to Google for the Developer Advocate role, I expected Id be delivering public talks _constantly_, but as a SWE Ive only given a few talks, including my  [Migrating to HTTPS talk][60] at the first OReilly Security Conference. I had a lot of fun at that conference, catching up with old friends from the security community (mostly ex-Microsofties). I also went to my first [Chrome Dev Summit][61], where I didnt have a public talk (my colleagues did) but I did get to talk to some major companies about deploying HTTPS.
I also blogged [quite a bit][62]. At Microsoft, I started blogging because I got tired of repeating myself, and because our Exchange server and document retention policies had started making it hard or impossible to find old responses—I figured “Well, if I publish everything on the web, Google will find it, and Internet Archive will back it up.”
Ive kept blogging since leaving Microsoft, and Im happy that I have even though my reader count numbers are much lower than they were at Microsoft. Ive managed to mostly avoid trouble, although my posts are not entirely uncontroversial. At Microsoft, they wouldnt let me publish [this post][63] (because it was too frank); in my first month at Google, I got a phone call at home (during the first portion of my paternity leave) from a Google Director complaining that Id written [something][64] that was too harsh about a change Microsoft had made. But for the most part, my blogging seems not to ruffle too many feathers.
### TIDBITS
* Food at Google is generally _really _good; Im at a satellite office in Austin, so the selection is much smaller than on the main campuses, but the rotating menu is fairly broad and always has at least three major options. And the breakfasts! I gained about 15 pounds in my first few months, but my pneumonia took it off and Ive restrained my intake since I came back.
* At Microsoft, I always sneered at companies offering free food (“Im an adult professional. I can pay for my lunch.”), but its definitely convenient to not have to hassle with payments. And until the government closes the loophole, its a way to increase employees compensation without getting taxed.
* For the first three months, I was impressed and slightly annoyed that all of the snack options in Googles micro-kitchens are healthy (e.g. fruit)—probably a good thing, since I sit about twenty feet from one. Then I saw someone open a drawer and pull out some M&Ms, and I learned the secret—all of the junk food is in drawers. The selection is impressive and ranges from the popular to the high end.
* Google makes heavy use of the “open-office concept.” I think this makes sense for some teams, but its not at all awesome for me. Id gladly take a 10% salary cut for a private office. I doubt Im alone.
* Coworkers at Google range from very smart to insanely off-the-scales-smart. Yet, almost all of them are humble, approachable, and kind.
* Google, like Microsoft, offers gift matching for charities. This is an awesome perk, and one I aim to max out every year. Im awed by people who go [far][1] beyond that.
* **Window Management  **I mentioned earlier that one downside of web-based tools is that its hard to even _find _the right tab when Ive got dozens of open tabs that Im flipping between. The [Quick Tabs extension][2] is one great mitigation; it shows your tabs in a searchable, most-recently-used list in a convenient dropdown:
[
![QuickTabs Extension](https://textplain.files.wordpress.com/2017/01/image59.png?w=526&h=376 "A Searchable MRU of open tabs. Yes please!")
][65]
Another trick that I learned just this month is that you can instruct Chrome to open a site in “App” mode, where it runs in its own top-level window (with no other tabs), showing the sites icon as the icon in the Windows taskbar. Its easy:
On Windows, run chrome.exe app=https://mail.google.com
While on OS X, run open -n -b com.google.Chrome args app=[https://news.google.com][66]
_Tip: The easy way to create a shortcut to a the current page in app mode is to click the Chrome Menu > More Tools > Add to {shelf/desktop} and tick the Open as Window checkbox._
I now have [SlickRun][67] MagicWords set up for **mail**, **calendar**, and my other critical applications.
--------------------------------------------------------------------------------
via: https://textslashplain.com/2017/02/01/google-chrome-one-year-in/
作者:[ericlaw][a]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://textslashplain.com/author/ericlaw1979/
[1]:https://www.jefftk.com/p/leaving-google-joining-wave
[2]:https://chrome.google.com/webstore/detail/quick-tabs/jnjfeinjfmenlddahdjdmgpbokiacbbb
[3]:https://textslashplain.com/2015/12/23/my-next-adventure/
[4]:http://sdtimes.com/telerik-acquires-fiddler-debugger-along-with-its-creator/
[5]:https://bayden.com/dl/Codemash2015-ericlaw-https-in-2015.pptx
[6]:https://conferences.oreilly.com/velocity/devops-web-performance-2015/public/content/2015/04/16-https-stands-for-user-experience
[7]:https://textslashplain.com/2015/04/09/on-appreciation/
[8]:https://xkcd.com/1254/
[9]:http://m.xkcd.com/1782/
[10]:https://hexchat.github.io/
[11]:https://blogs.msdn.microsoft.com/ieinternals/2009/07/20/internet-explorers-cache-control-extensions/
[12]:https://www.chromium.org/developers/bisect-builds-py
[13]:https://mozilla.github.io/mozregression/
[14]:https://chromium.googlesource.com/chromium/src/+/lkgr/docs/clang.md
[15]:https://bugs.chromium.org/p/monorail/adminIntro
[16]:https://bugs.chromium.org/p/monorail/issues/list?can=2&q=reporter%3Aelawrence
[17]:https://en.wikipedia.org/wiki/Rietveld_(software)
[18]:https://www.chromium.org/developers
[19]:https://www.chromium.org/developers/design-documents
[20]:https://codereview.chromium.org/1733973004/
[21]:https://codereview.chromium.org/2244263002/
[22]:https://codereview.chromium.org/2323273003/
[23]:https://codereview.chromium.org/2368593002/
[24]:https://codereview.chromium.org/2347923002/
[25]:https://codereview.chromium.org/search?closed=1&owner=elawrence&reviewer=&cc=&repo_guid=&base=&project=&private=1&commit=1&created_before=&created_after=&modified_before=&modified_after=&order=&format=html&keys_only=False&with_messages=False&cursor=&limit=30
[26]:https://blogs.msdn.microsoft.com/ie/2005/04/20/tls-and-ssl-in-the-real-world/
[27]:https://chrome.google.com/webstore/search/bayden?hl=en-US&_category=extensions
[28]:https://textslashplain.com/2016/03/17/seek-and-destroy-non-secure-references-using-the-moartls-analyzer/
[29]:https://textslashplain.com/2016/03/18/building-the-moartls-analyzer/
[30]:https://bugs.chromium.org/p/monorail/issues/detail?id=1739
[31]:https://bugs.chromium.org/p/chromium/issues/list?can=1&q=reporter%3Ame&colspec=ID+Pri+M+Stars+ReleaseBlock+Component+Status+Owner+Summary+OS+Modified&x=m&y=releaseblock&cells=ids
[32]:https://textslashplain.com/2016/08/18/file-the-bug/
[33]:https://bugs.chromium.org/p/chromium/issues/detail?id=540547
[34]:https://bugs.chromium.org/p/chromium/issues/detail?id=601538
[35]:https://bugs.chromium.org/p/chromium/issues/detail?id=595844#c6
[36]:https://bugs.chromium.org/p/chromium/issues/detail?id=629637
[37]:https://bugs.chromium.org/p/chromium/issues/detail?id=591343
[38]:https://textslashplain.com/2016/04/04/downloads-and-the-mark-of-the-web/
[39]:https://blogs.msdn.microsoft.com/ieinternals/2011/03/23/understanding-local-machine-zone-lockdown/
[40]:https://blogs.msdn.microsoft.com/ieinternals/2012/06/19/enhanced-protected-mode-and-local-files/
[41]:https://en.wikipedia.org/wiki/Bus_factor
[42]:https://www.chromium.org/Home/chromium-security/enamel
[43]:https://productforums.google.com/forum/#!forum/chrome
[44]:https://www.chromium.org/Home/chromium-security/security-sheriff
[45]:https://blog.chromium.org/2012/04/fuzzing-for-security.html
[46]:https://en.wikipedia.org/wiki/Project_Zero_(Google)
[47]:https://bugs.chromium.org/p/chromium/issues/entry?template=Security%20Bug
[48]:https://www.google.com/about/appsecurity/chrome-rewards/
[49]:https://security.googleblog.com/2017/01/vulnerability-rewards-program-2016-year.html
[50]:https://conferences.oreilly.com/security/network-data-security-ny/public/schedule/detail/53296
[51]:https://textplain.files.wordpress.com/2017/01/image58.png
[52]:https://dev.chromium.org/Home/chromium-security/security-faq
[53]:https://bugs.chromium.org/p/chromium/issues/detail?id=639126#c11
[54]:https://bugs.chromium.org/p/chromium/issues/detail?id=648971
[55]:https://bugs.chromium.org/p/chromium/issues/list?can=1&q=label%3Areward-100000&colspec=ID+Pri+M+Stars+ReleaseBlock+Component+Status+Owner+Summary+OS+Modified&x=m&y=releaseblock&cells=ids
[56]:https://www.google.com/transparencyreport/https/grid/?hl=en
[57]:https://whytls.com/
[58]:https://textslashplain.com/2016/09/22/use-https-for-all-inbound-links/
[59]:https://security.googleblog.com/2016/09/moving-towards-more-secure-web.html
[60]:https://www.safaribooksonline.com/library/view/the-oreilly-security/9781491960035/video287622.html
[61]:https://developer.chrome.com/devsummit/
[62]:https://textslashplain.com/2016/
[63]:https://blogs.msdn.microsoft.com/ieinternals/2013/10/16/strict-p3p-validation/
[64]:https://textslashplain.com/2016/01/20/putting-users-first/
[65]:https://chrome.google.com/webstore/detail/quick-tabs/jnjfeinjfmenlddahdjdmgpbokiacbbb
[66]:https://news.google.com/
[67]:https://bayden.com/slickrun/

View File

@ -1,155 +0,0 @@
Education of a Programmer
============================================================
_When I left Microsoft in October 2016 after almost 21 years there and almost 35 years in the industry, I took some time to reflect on what I had learned over all those years. This is a lightly edited version of that post. Pardon the length!_
There are an amazing number of things you need to know to be a proficient programmerdetails of languages, APIs, algorithms, data structures, systems and tools. These things change all the timenew languages and programming environments spring up and there always seems to be some hot new tool or language that “everyone” is using. It is important to stay current and proficient. A carpenter needs to know how to pick the right hammer and nail for the job and needs to be competent at driving the nail straight and true.
At the same time, Ive found that there are some concepts and strategies that are applicable over a wide range of scenarios and across decades. We have seen multiple orders of magnitude change in the performance and capability of our underlying devices and yet certain ways of thinking about the design of systems still say relevant. These are more fundamental than any specific implementation. Understanding these recurring themes is hugely helpful in both the analysis and design of the complex systems we build.
Humility and Ego
This is not limited to programming, but in an area like computing which exhibits so much constant change, one needs a healthy balance of humility and ego. There is always more to learn and there is always someone who can help you learn itif you are willing and open to that learning. One needs both the humility to recognize and acknowledge what you dont know and the ego that gives you confidence to master a new area and apply what you already know. The biggest challenges I have seen are when someone works in a single deep area for a long time and “forgets” how good they are at learning new things. The best learning comes from actually getting hands dirty and building something, even if it is just a prototype or hack. The best programmers I know have had both a broad understanding of technology while at the same time have taken the time to go deep into some technology and become the expert. The deepest learning happens when you struggle with truly hard problems.
End to End Argument
Back in 1981, Jerry Saltzer, Dave Reed and Dave Clark were doing early work on the Internet and distributed systems and wrote up their [classic description][4] of the end to end argument. There is much misinformation out there on the Internet so it can be useful to go back and read the original paper. They were humble in not claiming inventionfrom their perspective this was a common engineering strategy that applies in many areas, not just in communications. They were simply writing it down and gathering examples. A minor paraphrasing is:
When implementing some function in a system, it can be implemented correctly and completely only with the knowledge and participation of the endpoints of the system. In some cases, a partial implementation in some internal component of the system may be important for performance reasons.
The SRC paper calls this an “argument”, although it has been elevated to a “principle” on Wikipedia and in other places. In fact, it is better to think of it as an argumentas they detail, one of the hardest problem for a system designer is to determine how to divide responsibilities between components of a system. This ends up being a discussion that involves weighing the pros and cons as you divide up functionality, isolate complexity and try to design a reliable, performant system that will be flexible to evolving requirements. There is no simple set of rules to follow.
Much of the discussion on the Internet focuses on communications systems, but the end-to-end argument applies in a much wider set of circumstances. One example in distributed systems is the idea of “eventual consistency”. An eventually consistent system can optimize and simplify by letting elements of the system get into a temporarily inconsistent state, knowing that there is a larger end-to-end process that can resolve these inconsistencies. I like the example of a scaled-out ordering system (e.g. as used by Amazon) that doesnt require every request go through a central inventory control choke point. This lack of a central control point might allow two endpoints to sell the same last book copy, but the overall system needs some type of resolution system in any case, e.g. by notifying the customer that the book has been backordered. That last book might end up getting run over by a forklift in the warehouse before the order is fulfilled anyway. Once you realize an end-to-end resolution system is required and is in place, the internal design of the system can be optimized to take advantage of it.
In fact, it is this design flexibility in the service of either ongoing performance optimization or delivering other system features that makes this end-to-end approach so powerful. End-to-end thinking often allows internal performance flexibility which makes the overall system more robust and adaptable to changes in the characteristics of each of the components. This makes an end-to-end approach “anti-fragile” and resilient to change over time.
An implication of the end-to-end approach is that you want to be extremely careful about adding layers and functionality that eliminates overall performance flexibility. (Or other flexibility, but performance, especially latency, tends to be special.) If you expose the raw performance of the layers you are built on, end-to-end approaches can take advantage of that performance to optimize for their specific requirements. If you chew up that performance, even in the service of providing significant value-add functionality, you eliminate design flexibility.
The end-to-end argument intersects with organizational design when you have a system that is large and complex enough to assign whole teams to internal components. The natural tendency of those teams is to extend the functionality of those components, often in ways that start to eliminate design flexibility for applications trying to deliver end-to-end functionality built on top of them.
One of the challenges in applying the end-to-end approach is determining where the end is. “Little fleas have lesser fleas… and so on ad infinitum.”
Concentrating Complexity
Coding is an incredibly precise art, with each line of execution required for correct operation of the program. But this is misleading. Programs are not uniform in the overall complexity of their components or the complexity of how those components interact. The most robust programs isolate complexity in a way that lets significant parts of the system appear simple and straightforward and interact in simple ways with other components in the system. Complexity hiding can be isomorphic with other design approaches like information hiding and data abstraction but I find there is a different design sensibility if you really focus on identifying where the complexity lies and how you are isolating it.
The example Ive returned to over and over again in my [writing][5] is the screen repaint algorithm that was used by early character video terminal editors like VI and EMACS. The early video terminals implemented control sequences for the core action of painting characters as well as additional display functions to optimize redisplay like scrolling the current lines up or down or inserting new lines or moving characters within a line. Each of those commands had different costs and those costs varied across different manufacturers devices. (See [TERMCAP][6] for links to code and a fuller history.) A full-screen application like a text editor wanted to update the screen as quickly as possible and therefore needed to optimize its use of these control sequences to transition the screen from one state to another.
These applications were designed so this underlying complexity was hidden. The parts of the system that modify the text buffer (where most innovation in functionality happens) completely ignore how these changes are converted into screen update commands. This is possible because the performance cost of computing the optimal set of updates for  _any_  change in the content is swamped by the performance cost of actually executing the update commands on the terminal itself. It is a common pattern in systems design that performance analysis plays a key part in determining how and where to hide complexity. The screen update process can be asynchronous to the changes in the underlying text buffer and can be independent of the actual historical sequence of changes to the buffer. It is not important  _how_  the buffer changed, but only  _what_  changed. This combination of asynchronous coupling, elimination of the combinatorics of historical path dependence in the interaction between components and having a natural way for interactions to efficiently batch together are common characteristics used to hide coupling complexity.
Success in hiding complexity is determined not by the component doing the hiding but by the consumers of that component. This is one reason why it is often so critical for a component provider to actually be responsible for at least some piece of the end-to-end use of that component. They need to have clear optics into how the rest of the system interacts with their component and how (and whether) complexity leaks out. This often shows up as feedback like “this component is hard to use”which typically means that it is not effectively hiding the internal complexity or did not pick a functional boundary that was amenable to hiding that complexity.
Layering and Componentization
It is the fundamental role of a system designer to determine how to break down a system into components and layers; to make decisions about what to build and what to pick up from elsewhere. Open Source may keep money from changing hands in this “build vs. buy” decision but the dynamics are the same. An important element in large scale engineering is understanding how these decisions will play out over time. Change fundamentally underlies everything we do as programmers, so these design choices are not only evaluated in the moment, but are evaluated in the years to come as the product continues to evolve.
Here are a few things about system decomposition that end up having a large element of time in them and therefore tend to take longer to learn and appreciate.
* Layers are leaky. Layers (or abstractions) are [fundamentally leaky][1]. These leaks have consequences immediately but also have consequences over time, in two ways. One consequence is that the characteristics of the layer leak through and permeate more of the system than you realize. These might be assumptions about specific performance characteristics or behavior ordering that is not an explicit part of the layer contract. This means that you generally are more  _vulnerable_  to changes in the internal behavior of the component that you understood. A second consequence is it also means you are more  _dependent_  on that internal behavior than is obvious, so if you consider changing that layer the consequences and challenges are probably larger than you thought.
* Layers are too functional. It is almost a truism that a component you adopt will have more functionality than you actually require. In some cases, the decision to use it is based on leveraging that functionality for future uses. You adopt specifically because you want to “get on the train” and leverage the ongoing work that will go into that component. There are a few consequences of building on this highly functional layer. 1) The component will often make trade-offs that are biased by functionality that you do not actually require. 2) The component will embed complexity and constraints because of functionality you do not require and those constraints will impede future evolution of that component. 3) There will be more surface area to leak into your application. Some of that leakage will be due to true “leaky abstractions” and some will be explicit (but generally poorly controlled) increased dependence on the full capabilities of the component. Office is big enough that we found that for any layer we built on, we eventually fully explored its functionality in some part of the system. While that might appear to be positive (we are more completely leveraging the component), all uses are not equally valuable. So we end up having a massive cost to move from one layer to another based on this long-tail of often lower value and poorly recognized use cases. 4) The additional functionality creates complexity and opportunities for misuse. An XML validation API we used would optionally dynamically download the schema definition if it was specified as part of the XML tree. This was mistakenly turned on in our basic file parsing code which resulted in both a massive performance degradation as well as an (unintentional) distributed denial of service attack on a w3c.org web server. (These are colloquially known as “land mine” APIs.)
* Layers get replaced. Requirements evolve, systems evolve, components are abandoned. You eventually need to replace that layer or component. This is true for external component dependencies as well as internal ones. This means that the issues above will end up becoming important.
* Your build vs. buy decision will change. This is partly a corollary of above. This does not mean the decision to build or buy was wrong at the time. Often there was no appropriate component when you started and it only becomes available later. Or alternatively, you use a component but eventually find that it does not match your evolving requirements and your requirements are narrow enough, well-understood or so core to your value proposition that it makes sense to own it yourself. It does mean that you need to be just as concerned about leaky layers permeating more of the system for layers you build as well as for layers you adopt.
* Layers get thick. As soon as you have defined a layer, it starts to accrete functionality. The layer is the natural throttle point to optimize for your usage patterns. The difficulty with a thick layer is that it tends to reduce your ability to leverage ongoing innovation in underlying layers. In some sense this is why OS companies hate thick layers built on top of their core evolving functionalitythe pace at which innovation can be adopted is inherently slowed. One disciplined approach to avoid this is to disallow any additional state storage in an adaptor layer. Microsoft Foundation Classes took this general approach in building on top of Win32\. It is inevitably cheaper in the short term to just accrete functionality on to an existing layer (leading to all the eventual problems above) rather than refactoring and recomponentizing. A system designer who understands this looks for opportunities to break apart and simplify components rather than accrete more and more functionality within them.
Einsteinian Universe
I had been designing asynchronous distributed systems for decades but was struck by this quote from Pat Helland, a SQL architect, at an internal Microsoft talk. “We live in an Einsteinian universethere is no such thing as simultaneity. “ When building distributed systemsand virtually everything we build is a distributed systemyou cannot hide the distributed nature of the system. Its just physics. This is one of the reasons Ive always felt Remote Procedure Call, and especially “transparent” RPC that explicitly tries to hide the distributed nature of the interaction, is fundamentally wrong-headed. You need to embrace the distributed nature of the system since the implications almost always need to be plumbed completely through the system design and into the user experience.
Embracing the distributed nature of the system leads to a number of things:
* You think through the implications to the user experience from the start rather than trying to patch on error handling, cancellation and status reporting as an afterthought.
* You use asynchronous techniques to couple components. Synchronous coupling is  _impossible._  If something appears synchronous, its because some internal layer has tried to hide the asynchrony and in doing so has obscured (but definitely not hidden) a fundamental characteristic of the runtime behavior of the system.
* You recognize and explicitly design for interacting state machines and that these states represent robust long-lived internal system states (rather than ad-hoc, ephemeral and undiscoverable state encoded by the value of variables in a deep call stack).
* You recognize that failure is expected. The only guaranteed way to detect failure in a distributed system is to simply decide you have waited “too long”. This naturally means that [cancellation is first-class][2]. Some layer of the system (perhaps plumbed through to the user) will need to decide it has waited too long and cancel the interaction. Cancelling is only about reestablishing local state and reclaiming local resourcesthere is no way to reliably propagate that cancellation through the system. It can sometimes be useful to have a low-cost, unreliable way to attempt to propagate cancellation as a performance optimization.
* You recognize that cancellation is not rollback since it is just reclaiming local resources and state. If rollback is necessary, it needs to be an end-to-end feature.
* You accept that you can never really know the state of a distributed component. As soon as you discover the state, it may have changed. When you send an operation, it may be lost in transit, it might be processed but the response is lost, or it may take some significant amount of time to process so the remote state ultimately transitions at some arbitrary time in the future. This leads to approaches like idempotent operations and the ability to robustly and efficiently rediscover remote state rather than expecting that distributed components can reliably track state in parallel. The concept of “[eventual consistency][3]” succinctly captures many of these ideas.
I like to say you should “revel in the asynchrony”. Rather than trying to hide it, you accept it and design for it. When you see a technique like idempotency or immutability, you recognize them as ways of embracing the fundamental nature of the universe, not just one more design tool in your toolbox.
Performance
I am sure Don Knuth is horrified by how misunderstood his partial quote “Premature optimization is the root of all evil” has been. In fact, performance, and the incredible exponential improvements in performance that have continued for over 6 decades (or more than 10 decades depending on how willing you are to project these trends through discrete transistors, vacuum tubes and electromechanical relays), underlie all of the amazing innovation we have seen in our industry and all the change rippling through the economy as “software eats the world”.
A key thing to recognize about this exponential change is that while all components of the system are experiencing exponential change, these exponentials are divergent. So the rate of increase in capacity of a hard disk changes at a different rate from the capacity of memory or the speed of the CPU or the latency between memory and CPU. Even when trends are driven by the same underlying technology, exponentials diverge. [Latency improvements fundamentally trail bandwidth improvements][7]. Exponential change tends to look linear when you are close to it or over short periods but the effects over time can be overwhelming. This overwhelming change in the relationship between the performance of components of the system forces reevaluation of design decisions on a regular basis.
A consequence of this is that design decisions that made sense at one point no longer make sense after a few years. Or in some cases an approach that made sense two decades ago starts to look like a good trade-off again. Modern memory mapping has characteristics that look more like process swapping of the early time-sharing days than it does like demand paging. (This does sometimes result in old codgers like myself claiming that “thats just the same approach we used back in 75”ignoring the fact that it didnt make sense for 40 years and now does again because some balance between two componentsmaybe flash and NAND rather than disk and core memoryhas come to resemble a previous relationship).
Important transitions happen when these exponentials cross human constraints. So you move from a limit of two to the sixteenth characters (which a single user can type in a few hours) to two to the thirty-second (which is beyond what a single person can type). So you can capture a digital image with higher resolution than the human eye can perceive. Or you can store an entire music collection on a hard disk small enough to fit in your pocket. Or you can store a digitized video recording on a hard disk. And then later the ability to stream that recording in real time makes it possible to “record” it by storing it once centrally rather than repeatedly on thousands of local hard disks.
The things that stay as a fundamental constraint are three dimensions and the speed of light. Were back to that Einsteinian universe. We will always have memory hierarchiesthey are fundamental to the laws of physics. You will always have stable storage and IO, memory, computation and communications. The relative capacity, latency and bandwidth of these elements will change, but the system is always about how these elements fit together and the balance and tradeoffs between them. Jim Gray was the master of this analysis.
Another consequence of the fundamentals of 3D and the speed of light is that much of performance analysis is about three things: locality, locality, locality. Whether it is packing data on disk, managing processor cache hierarchies, or coalescing data into a communications packet, how data is packed together, the patterns for how you touch that data with locality over time and the patterns of how you transfer that data between components is fundamental to performance. Focusing on less code operating on less data with more locality over space and time is a good way to cut through the noise.
Jon Devaan used to say “design the data, not the code”. This also generally means when looking at the structure of a system, Im less interested in seeing how the code interactsI want to see how the data interacts and flows. If someone tries to explain a system by describing the code structure and does not understand the rate and volume of data flow, they do not understand the system.
A memory hierarchy also implies we will always have cacheseven if some system layer is trying to hide it. Caches are fundamental but also dangerous. Caches are trying to leverage the runtime behavior of the code to change the pattern of interaction between different components in the system. They inherently need to model that behavior, even if that model is implicit in how they fill and invalidate the cache and test for a cache hit. If the model is poor  _or becomes_  poor as the behavior changes, the cache will not operate as expected. A simple guideline is that caches  _must_  be instrumentedtheir behavior will degrade over time because of changing behavior of the application and the changing nature and balance of the performance characteristics of the components you are modeling. Every long-time programmer has cache horror stories.
I was lucky that my early career was spent at BBN, one of the birthplaces of the Internet. It was very natural to think about communications between asynchronous components as the natural way systems connect. Flow control and queueing theory are fundamental to communications systems and more generally the way that any asynchronous system operates. Flow control is inherently resource management (managing the capacity of a channel) but resource management is the more fundamental concern. Flow control also is inherently an end-to-end responsibility, so thinking about asynchronous systems in an end-to-end way comes very naturally. The story of [buffer bloat][8]is well worth understanding in this context because it demonstrates how lack of understanding the dynamics of end-to-end behavior coupled with technology “improvements” (larger buffers in routers) resulted in very long-running problems in the overall network infrastructure.
The concept of “light speed” is one that Ive found useful in analyzing any system. A light speed analysis doesnt start with the current performance, it asks “what is the best theoretical performance I could achieve with this design?” What is the real information content being transferred and at what rate of change? What is the underlying latency and bandwidth between components? A light speed analysis forces a designer to have a deeper appreciation for whether their approach could ever achieve the performance goals or whether they need to rethink their basic approach. It also forces a deeper understanding of where performance is being consumed and whether this is inherent or potentially due to some misbehavior. From a constructive point of view, it forces a system designer to understand what are the true performance characteristics of their building blocks rather than focusing on the other functional characteristics.
I spent much of my career building graphical applications. A user sitting at one end of the system defines a key constant and constraint in any such system. The human visual and nervous system is not experiencing exponential change. The system is inherently constrained, which means a system designer can leverage ( _must_  leverage) those constraints, e.g. by virtualization (limiting how much of the underlying data model needs to be mapped into view data structures) or by limiting the rate of screen update to the perception limits of the human visual system.
The Nature of Complexity
I have struggled with complexity my entire career. Why do systems and apps get complex? Why doesnt development within an application domain get easier over time as the infrastructure gets more powerful rather than getting harder and more constrained? In fact, one of our key approaches for managing complexity is to “walk away” and start fresh. Often new tools or languages force us to start from scratch which means that developers end up conflating the benefits of the tool with the benefits of the clean start. The clean start is what is fundamental. This is not to say that some new tool, platform or language might not be a great thing, but I can guarantee it will not solve the problem of complexity growth. The simplest way of controlling complexity growth is to build a smaller system with fewer developers.
Of course, in many cases “walking away” is not an alternativethe Office business is built on hugely valuable and complex assets. With OneNote, Office “walked away” from the complexity of Word in order to innovate along a different dimension. Sway is another example where Office decided that we needed to free ourselves from constraints in order to really leverage key environmental changes and the opportunity to take fundamentally different design approaches. With the Word, Excel and PowerPoint web apps, we decided that the linkage with our immensely valuable data formats was too fundamental to walk away from and that has served as a significant and ongoing constraint on development.
I was influenced by Fred Brooks “[No Silver Bullet][9]” essay about accident and essence in software development. There is much irreducible complexity embedded in the essence of what the software is trying to model. I just recently re-read that essay and found it surprising on re-reading that two of the trends he imbued with the most power to impact future developer productivity were increasing emphasis on “buy” in the “build vs. buy” decisionforeshadowing the change that open-source and cloud infrastructure has had. The other trend was the move to more “organic” or “biological” incremental approaches over more purely constructivist approaches. A modern reader sees that as the shift to agile and continuous development processes. This in 1986!
I have been much taken with the work of Stuart Kauffman on the fundamental nature of complexity. Kauffman builds up from a simple model of Boolean networks (“[NK models][10]”) and then explores the application of this fundamentally mathematical construct to things like systems of interacting molecules, genetic networks, ecosystems, economic systems and (in a limited way) computer systems to understand the mathematical underpinning to emergent ordered behavior and its relationship to chaotic behavior. In a highly connected system, you inherently have a system of conflicting constraints that makes it (mathematically) hard to evolve that system forward (viewed as an optimization problem over a rugged landscape). A fundamental way of controlling this complexity is to batch the system into independent elements and limit the interconnections between elements (essentially reducing both “N” and “K” in the NK model). Of course this feels natural to a system designer applying techniques of complexity hiding, information hiding and data abstraction and using loose asynchronous coupling to limit interactions between components.
A challenge we always face is that many of the ways we want to evolve our systems cut across all dimensions. Real-time co-authoring has been a very concrete (and complex) recent example for the Office apps.
Complexity in our data models often equates with “power”. An inherent challenge in designing user experiences is that we need to map a limited set of gestures into a transition in the underlying data model state space. Increasing the dimensions of the state space inevitably creates ambiguity in the user gesture. This is “[just math][11]” which means that often times the most fundamental way to ensure that a system stays “easy to use” is to constrain the underlying data model.
Management
I started taking leadership roles in high school (student council president!) and always found it natural to take on larger responsibilities. At the same time, I was always proud that I continued to be a full-time programmer through every management stage. VP of development for Office finally pushed me over the edge and away from day-to-day programming. Ive enjoyed returning to programming as I stepped away from that job over the last yearit is an incredibly creative and fulfilling activity (and maybe a little frustrating at times as you chase down that “last” bug).
Despite having been a “manager” for over a decade by the time I arrived at Microsoft, I really learned about management after my arrival in 1996\. Microsoft reinforced that “engineering leadership is technical leadership”. This aligned with my perspective and helped me both accept and grow into larger management responsibilities.
The thing that most resonated with me on my arrival was the fundamental culture of transparency in Office. The managers job was to design and use transparent processes to drive the project. Transparency is not simple, automatic, or a matter of good intentionsit needs to be designed into the system. The best transparency comes by being able to track progress as the granular output of individual engineers in their day-to-day activity (work items completed, bugs opened and fixed, scenarios complete). Beware subjective red/green/yellow, thumbs-up/thumbs-down dashboards!
I used to say my job was to design feedback loops. Transparent processes provide a way for every participant in the processfrom individual engineer to manager to exec to use the data being tracked to drive the process and result and understand the role they are playing in the overall project goals. Ultimately transparency ends up being a great tool for empowermentthe manager can invest more and more local control in those closest to the problem because of confidence they have visibility to the progress being made. Coordination emerges naturally.
Key to this is that the goal has actually been properly framed (including key resource constraints like ship schedule). Decision-making that needs to constantly flow up and down the management chain usually reflects poor framing of goals and constraints by management.
I was at Beyond Software when I really internalized the importance of having a singular leader over a project. The engineering manager departed (later to hire me away for FrontPage) and all four of the leads were hesitant to step into the rolenot least because we did not know how long we were going to stick around. We were all very technically sharp and got along well so we decided to work as peers to lead the project. It was a mess. The one obvious problem is that we had no strategy for allocating resources between the pre-existing groupsone of the top responsibilities of management! The deep accountability one feels when you know you are personally in charge was missing. We had no leader really accountable for unifying goals and defining constraints.
I have a visceral memory of the first time I fully appreciated the importance of  _listening_  for a leader. I had just taken on the role of Group Development Manager for Word, OneNote, Publisher and Text Services. There was a significant controversy about how we were organizing the text services team and I went around to each of the key participants, heard what they had to say and then integrated and wrote up all I had heard. When I showed the write-up to one of the key participants, his reaction was “wow, you really heard what I had to say”! All of the largest issues I drove as a manager (e.g. cross-platform and the shift to continuous engineering) involved carefully listening to all the players. Listening is an active process that involves trying to understand the perspectives and then writing up what I learned and testing it to validate my understanding. When a key hard decision needed to happen, by the time the call was made everyone knew they had been heard and understood (whether they agreed with the decision or not).
It was the previous job, as FrontPage development manager, where I internalized the “operational dilemma” inherent in decision making with partial information. The longer you wait, the more information you will have to make a decision. But the longer you wait, the less flexibility you will have to actually implement it. At some point you just need to make a call.
Designing an organization involves a similar tension. You want to increase the resource domain so that a consistent prioritization framework can be applied across a larger set of resources. But the larger the resource domain, the harder it is to actually have all the information you need to make good decisions. An organizational design is about balancing these two factors. Software complicates this because characteristics of the software can cut across the design in an arbitrary dimensionality. Office has used [shared teams][12] to address both these issues (prioritization and resources) by having cross-cutting teams that can share work (add resources) with the teams they are building for.
One dirty little secret you learn as you move up the management ladder is that you and your new peers arent suddenly smarter because you now have more responsibility. This reinforces that the organization as a whole better be smarter than the leader at the top. Empowering every level to own their decisions within a consistent framing is the key approach to making this true. Listening and making yourself accountable to the organization for articulating and explaining the reasoning behind your decisions is another key strategy. Surprisingly, fear of making a dumb decision can be a useful motivator for ensuring you articulate your reasoning clearly and make sure you listen to all inputs.
Conclusion
At the end of my interview round for my first job out of college, the recruiter asked if I was more interested in working on “systems” or “apps”. I didnt really understand the question. Hard, interesting problems arise at every level of the software stack and Ive had fun plumbing all of them. Keep learning.
--------------------------------------------------------------------------------
via: https://hackernoon.com/education-of-a-programmer-aaecf2d35312
作者:[ Terry Crowley][a]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://hackernoon.com/@terrycrowley
[1]:https://medium.com/@terrycrowley/leaky-by-design-7b423142ece0#.x67udeg0a
[2]:https://medium.com/@terrycrowley/how-to-think-about-cancellation-3516fc342ae#.3pfjc5b54
[3]:http://queue.acm.org/detail.cfm?id=2462076
[4]:http://web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf
[5]:https://medium.com/@terrycrowley/model-view-controller-and-loose-coupling-6370f76e9cde#.o4gnupqzq
[6]:https://en.wikipedia.org/wiki/Termcap
[7]:http://www.ll.mit.edu/HPEC/agendas/proc04/invited/patterson_keynote.pdf
[8]:https://en.wikipedia.org/wiki/Bufferbloat
[9]:http://worrydream.com/refs/Brooks-NoSilverBullet.pdf
[10]:https://en.wikipedia.org/wiki/NK_model
[11]:https://medium.com/@terrycrowley/the-math-of-easy-to-use-14645f819201#.untmk9eq7
[12]:https://medium.com/@terrycrowley/breaking-conways-law-a0fdf8500413#.gqaqf1c5k

View File

@ -1,3 +1,5 @@
translating by hopefully2333
# [The One in Which I Call Out Hacker News][14]

View File

@ -1,53 +0,0 @@
When Does Your OS Run?
============================================================
Heres a question: in the time it takes you to read this sentence, has your OS been  _running_ ? Or was it only your browser? Or were they perhaps both idle, just waiting for you to  _do something already_ ?
These questions are simple but they cut through the essence of how software works. To answer them accurately we need a good mental model of OS behavior, which in turn informs performance, security, and troubleshooting decisions. Well build such a model in this post series using Linux as the primary OS, with guest appearances by OS X and Windows. Ill link to the Linux kernel sources for those who want to delve deeper.
The fundamental axiom here is that  _at any given moment, exactly one task is active on a CPU_ . The task is normally a program, like your browser or music player, or it could be an operating system thread, but it is one task. Not two or more. Never zero, either. One. Always.
This sounds like trouble. For what if, say, your music player hogs the CPU and doesnt let any other tasks run? You would not be able to open a tool to kill it, and even mouse clicks would be futile as the OS wouldnt process them. You could be stuck blaring “What does the fox say?” and incite a workplace riot.
Thats where interrupts come in. Much as the nervous system interrupts the brain to bring in external stimuli a loud noise, a touch on the shoulder the [chipset][1] in a computers motherboard interrupts the CPU to deliver news of outside events key presses, the arrival of network packets, the completion of a hard drive read, and so on. Hardware peripherals, the interrupt controller on the motherboard, and the CPU itself all work together to implement these interruptions, called interrupts for short.
Interrupts are also essential in tracking that which we hold dearest: time. During the [boot process][2] the kernel programs a hardware timer to issue timer interrupts at a periodic interval, for example every 10 milliseconds. When the timer goes off, the kernel gets a shot at the CPU to update system statistics and take stock of things: has the current program been running for too long? Has a TCP timeout expired? Interrupts give the kernel a chance to both ponder these questions and take appropriate actions. Its as if you set periodic alarms throughout the day and used them as checkpoints: should I be doing what Im doing right now? Is there anything more pressing? One day you find ten years have got behind you.
These periodic hijackings of the CPU by the kernel are called ticks, so interrupts quite literally make your OS tick. But theres more: interrupts are also used to handle some software events like integer overflows and page faults, which involve no external hardware. Interrupts are the most frequent and crucial entry point into the OS kernel. Theyre not some oddity for the EE people to worry about, theyre  _the_  mechanism whereby your OS runs.
Enough talk, lets see some action. Below is a network card interrupt in an Intel Core i5 system. The diagrams now have image maps, so you can click on juicy bits for more information. For example, each device links to its Linux driver.
![](http://duartes.org/gustavo/blog/img/os/hardware-interrupt.png)
<map id="mapHwInterrupt" name="mapHwInterrupt"><area shape="poly" coords="490,294,490,354,270,354,270,294" href="https://github.com/torvalds/linux/blob/v3.17/drivers/net/ethernet/intel/e1000e/netdev.c"><area shape="poly" coords="754,294,754,354,534,354,534,294" href="https://github.com/torvalds/linux/blob/v3.16/drivers/hid/usbhid/usbkbd.c"><area shape="poly" coords="488,490,488,598,273,598,273,490" href="https://github.com/torvalds/linux/blob/v3.16/arch/x86/kernel/apic/io_apic.c"><area shape="poly" coords="720,490,720,598,506,598,506,490" href="https://github.com/torvalds/linux/blob/v3.17/arch/x86/kernel/hpet.c"></map>
Lets take a look at this. First off, since there are many sources of interrupts, it wouldnt be very helpful if the hardware simply told the CPU “hey, something happened!” and left it at that. The suspense would be unbearable. So each device is assigned an interrupt request line, or IRQ, during power up. These IRQs are in turn mapped into interrupt vectors, a number between 0 and 255, by the interrupt controller. By the time an interrupt reaches the CPU it has a nice, well-defined number insulated from the vagaries of hardware.
The CPU in turn has a pointer to whats essentially an array of 255 functions, supplied by the kernel, where each function is the handler for that particular interrupt vector. Well look at this array, the Interrupt Descriptor Table (IDT), in more detail later on.
Whenever an interrupt arrives, the CPU uses its vector as an index into the IDT and runs the appropriate handler. This happens as a special function call that takes place in the context of the currently running task, allowing the OS to respond to external events quickly and with minimal overhead. So web servers out there indirectly  _call a function in your CPU_  when they send you data, which is either pretty cool or terrifying. Below we show a situation where a CPU is busy running a Vim command when an interrupt arrives:
![](http://duartes.org/gustavo/blog/img/os/vim-interrupted.png)
Notice how the interrupts arrival causes a switch to kernel mode and [ring zero][3] but it  _does not change the active task_ . Its as if Vim made a magic function call straight into the kernel, but Vim is  _still there_ , its [address space][4] intact, waiting for that call to return.
Exciting stuff! Alas, I need to keep this post-sized, so lets finish up for now. I understand we have not answered the opening question and have in fact opened up new questions, but you now suspect ticks were taking place while you read that sentence. Well find the answers as we flesh out our model of dynamic OS behavior, and the browser scenario will become clear. If you have questions, especially as the posts come out, fire away and Ill try to answer them in the posts themselves or as comments. Next installment is tomorrow on [RSS][5] and [Twitter][6].
--------------------------------------------------------------------------------
via: http://duartes.org/gustavo/blog/post/when-does-your-os-run/
作者:[gustavo ][a]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:http://duartes.org/gustavo/blog/about/
[1]:http://duartes.org/gustavo/blog/post/motherboard-chipsets-memory-map
[2]:http://duartes.org/gustavo/blog/post/kernel-boot-process
[3]:http://duartes.org/gustavo/blog/post/cpu-rings-privilege-and-protection
[4]:http://duartes.org/gustavo/blog/post/anatomy-of-a-program-in-memory
[5]:http://feeds.feedburner.com/GustavoDuarte
[6]:http://twitter.com/food4hackers

View File

@ -1,993 +0,0 @@
Translating by qhwdw Network automation with Ansible
================
### Network Automation
As the IT industry transforms with technologies from server virtualization to public and private clouds with self-service capabilities, containerized applications, and Platform as a Service (PaaS) offerings, one of the areas that continues to lag behind is the network.
Over the past 5+ years, the network industry has seen many new trends emerge, many of which are categorized as software-defined networking (SDN).
###### Note
SDN is a new approach to building, managing, operating, and deploying networks. The original definition for SDN was that there needed to be a physical separation of the control plane from the data (packet forwarding) plane, and the decoupled control plane must control several devices.
Nowadays, many more technologies get put under the _SDN umbrella_, including controller-based networks, APIs on network devices, network automation, whitebox switches, policy networking, Network Functions Virtualization (NFV), and the list goes on.
For purposes of this report, we refer to SDN solutions as solutions that include a network controller as part of the solution, and improve manageability of the network but dont necessarily decouple the control plane from the data plane.
One of these trends is the emergence of application programming interfaces (APIs) on network devices as a way to manage and operate these devices and truly offer machine to machine communication. APIs simplify the development process when it comes to automation and building network applications, providing more structure on how data is modeled. For example, when API-enabled devices return data in JSON/XML, it is structured and easier to work with as compared to CLI-only devices that return raw text that then needs to be manually parsed.
Prior to APIs, the two primary mechanisms used to configure and manage network devices were the command-line interface (CLI) and Simple Network Management Protocol (SNMP). If we look at each of those, the CLI was meant as a human interface to the device, and SNMP wasnt built to be a real-time programmatic interface for network devices.
Luckily, as many vendors scramble to add APIs to devices, sometimes _just because_ its a check in the box on an RFP, there is actually a great byproduct—enabling network automation. Once a true API is exposed, the process for accessing data within the device, as well as managing the configuration, is greatly simplified, but as well review in this report, automation is also possible using more traditional methods, such as CLI/SNMP.
###### Note
As network refreshes happen in the months and years to come, vendor APIs should no doubt be tested and used as key decision-making criteria for purchasing network equipment (virtual and physical). Users should want to know how data is modeled by the equipment, what type of transport is used by the API, if the vendor offers any libraries or integrations to automation tools, and if open standards/protocols are being used.
Generally speaking, network automation, like most types of automation, equates to doing things faster. While doing more faster is nice, reducing the time for deployments and configuration changes isnt always a problem that needs solving for many IT organizations.
Including speed, well now take a look at a few of the reasons that IT organizations of all shapes and sizes should look at gradually adopting network automation. You should note that the same principles apply to other types of automation as well.
### Simplified Architectures
Today, every network is a unique snowflake, and network engineers take pride in solving transport and application issues with one-off network changes that ultimately make the network not only harder to maintain and manage, but also harder to automate.
Instead of thinking about network automation and management as a secondary or tertiary project, it needs to be included from the beginning as new architectures and designs are deployed. Which features work across vendors? Which extensions work across platforms? What type of API or automation tooling works when using particular network device platforms? When these questions get answered earlier on in the design process, the resulting architecture becomes simpler, repeatable, and easier to maintain _and_ automate, all with fewer vendor proprietary extensions enabled throughout the network.
### Deterministic Outcomes
In an enterprise organization, change review meetings take place to review upcoming changes on the network, the impact they have on external systems, and rollback plans. In a world where a human is touching the CLI to make those _upcoming changes_, the impact of typing the wrong command is catastrophic. Imagine a team with three, four, five, or 50 engineers. Every engineer may have his own way of making that particular _upcoming change_. And the ability to use a CLI or a GUI does not eliminate or reduce the chance of error during the control window for the change.
Using proven and tested network automation helps achieve more predictable behavior and gives the executive team a better chance at achieving deterministic outcomes, moving one step closer to having the assurance that the task is going to get done right the first time without human error.
### Business Agility
It goes without saying that network automation offers speed and agility not only for deploying changes, but also for retrieving data from network devices as fast as the business demands. Since the advent of server virtualization, server and virtualization admins have had the ability to deploy new applications almost instantaneously. And the faster applications are deployed, the more questions are raised as to why it takes so long to configure a VLAN, route, FW ACL, or load-balancing policy.
By understanding the most common workflows within an organization and _why_ network changes are really required, the process to deploy modern automation tooling such as Ansible becomes much simpler.
This chapter introduced some of the high-level points on why you should consider network automation. In the next section, we take a look at what Ansible is and continue to dive into different types of network automation that are relevant to IT organizations of all sizes.
### What Is Ansible?
Ansible is one of the newer IT automation and configuration management platforms that exists in the open source world. Its often compared to other tools such as Puppet, Chef, and SaltStack. Ansible emerged on the scene in 2012 as an open source project created by Michael DeHaan, who also created Cobbler and cocreated Func, both of which are very popular in the open source community. Less than 18 months after the Ansible open source project started, Ansible Inc. was formed and received $6 million in Series A funding. It became and is still the number one contributor to and supporter of the Ansible open source project. In October 2015, Red Hat acquired Ansible Inc.
But, what exactly is Ansible?
_Ansible is a super-simple automation platform that is agentless and extensible._
Lets dive into this statement in a bit more detail and look at the attributes of Ansible that have helped it gain a significant amount of traction within the industry.
### Simple
One of the most attractive attributes of Ansible is that you _DO NOT_ need any special coding skills in order to get started. All instructions, or tasks to be automated, are documented in a standard, human-readable data format that anyone can understand. It is not uncommon to have Ansible installed and automating tasks in under 30 minutes!
For example, the following task from an Ansible playbook is used to ensure a VLAN exists on a Cisco Nexus switch:
```
- nxos_vlan: vlan_id=100 name=web_vlan
```
You can tell by looking at this almost exactly what its going to do without understanding or writing any code!
###### Note
The second half of this report covers the Ansible terminology (playbooks, plays, tasks, modules, etc.) in great detail. However, we have included a few brief examples in the meantime to convey key concepts when using Ansible for network automation.
### Agentless
If you look at other tools on the market, such as Puppet and Chef, youll learn that, by default, they require that each device you are automating have specialized software installed. This is _NOT_ the case with Ansible, and this is the major reason why Ansible is a great choice for networking automation.
Its well understood that IT automation tools, including Puppet, Chef, CFEngine, SaltStack, and Ansible, were initially built to manage and automate the configuration of Linux hosts to increase the pace at which applications are deployed. Because Linux systems were being automated, getting agents installed was never a technical hurdle to overcome. If anything, it just delayed the setup, since now _N_ number of hosts (the hosts you want to automate) needed to have software deployed on them.
On top of that, when agents are used, there is additional complexity required for DNS and NTP configuration. These are services that most environments do have already, but when you need to get something up fairly quick or simply want to see what it can do from a test perspective, it could significantly delay the overall setup and installation process.
Since this report is meant to cover Ansible for network automation, its worth pointing out that having Ansible as an agentless platform is even more compelling to network admins than to sysadmins. Why is this?
Its more compelling for network admins because as mentioned, Linux operating systems are open, and anything can be installed on them. For networking, this is definitely not the case, although it is gradually changing. If we take the most widely deployed network operating system, Cisco IOS, as just one example and ask the question, _"Can third-party software be installed on IOS based platforms?"_ it shouldnt come as a surprise that the answer is _NO_.
For the last 20+ years, nearly all network operating systems have been closed and vertically integrated with the underlying network hardware. Because its not so easy to load an agent on a network device (router, switch, load balancer, firewall, etc.) without vendor support, having an automation platform like Ansible that was built from the ground up to be agentless and extensible is just what the doctor ordered for the network industry. We can finally start eliminating manual interactions with the network with ease!
### Extensible
Ansible is also extremely extensible. As open source and code start to play a larger role in the network industry, having platforms that are extensible is a must. This means that if the vendor or community doesnt provide a particular feature or function, the open source community, end user, customer, consultant, or anyone else can _extend_ Ansible to enable a given set of functionality. In the past, the network vendor or tool vendor was on the hook to provide the new plug-ins and integrations. Imagine using an automation platform like Ansible, and your network vendor of choice releases a new feature that you _really_ need automated. While the network vendor or Ansible could in theory release the new plug-in to automate that particular feature, the great thing is, anyone from your internal engineers to your value-added reseller (VARs) or consultant could now provide these integrations.
It is a fact that Ansible is extremely extensible because as stated, Ansible was initially built to automate applications and systems. It is because of Ansibles extensibility that Ansible integrations have been written for network vendors, including but not limited to Cisco, Arista, Juniper, F5, HP, A10, Cumulus, and Palo Alto Networks.
### Why Ansible for Network Automation?
Weve taken a brief look at what Ansible is and also some of the benefits of network automation, but why should Ansible be used for network automation?
In full transparency, many of the reasons already stated are what make Ansible such as great platform for automating application deployments. However, well take this a step further now, getting even more focused on networking, and continue to outline a few other key points to be aware of.
### Agentless
The importance of an agentless architecture cannot be stressed enough when it comes to network automation, especially as it pertains to automating existing devices. If we take a look at all devices currently installed at various parts of the network, from the DMZ and campus, to the branch and data center, the lions share of devices do _NOT_ have a modern device API. While having an API makes things so much simpler from an automation perspective, an agentless platform like Ansible makes it possible to automate and manage those _legacy_ _(traditional)_ devices, for example, _CLI-based devices_, making it a tool that can be used in any network environment.
###### Note
If CLI-only devices are integrated with Ansible, the mechanisms as to how the devices are accessed for read-only and read-write operations occur through protocols such as telnet, SSH, and SNMP.
As standalone network devices like routers, switches, and firewalls continue to add support for APIs, SDN solutions are also emerging. The one common theme with SDN solutions is that they all offer a single point of integration and policy management, usually in the form of an SDN controller. This is true for solutions such as Cisco ACI, VMware NSX, Big Switch Big Cloud Fabric, and Juniper Contrail, as well as many of the other SDN offerings from companies such as Nuage, Plexxi, Plumgrid, Midokura, and Viptela. This even includes open source controllers such as OpenDaylight.
These solutions all simplify the management of networks, as they allow an administrator to start to migrate from box-by-box management to network-wide, single-system management. While this is a great step in the right direction, these solutions still dont eliminate the risks for human error during change windows. For example, rather than configure _N_ switches, you may need to configure a single GUI that could take just as long in order to make the required configuration change—it may even be more complex, because after all, who prefers a GUI _over_ a CLI! Additionally, you may possibly have different types of SDN solutions deployed per application, network, region, or data center.
The need to automate networks, for configuration management, monitoring, and data collection, does not go away as the industry begins migrating to controller-based network architectures.
As most software-defined networks are deployed with a controller, nearly all controllers expose a modern REST API. And because Ansible has an agentless architecture, it makes it extremely simple to automate not only legacy devices that may not have an API, but also software-defined networking solutions via REST APIs, all without requiring any additional software (agents) on the endpoints. The net result is being able to automate any type of device using Ansible with or without an API.
### Free and Open Source Software (FOSS)
Being that Ansible is open source with all code publicly accessible on GitHub, it is absolutely free to get started using Ansible. It can literally be installed and providing value to network engineers in minutes. Ansible, the open source project, or Ansible Inc., do not require any meetings with sales reps before they hand over software either. That is stating the obvious, since its true for all open source projects, but being that the use of open source, community-driven software within the network industry is fairly new and gradually increasing, we wanted to explicitly make this point.
It is also worth stating that Ansible, Inc. is indeed a company and needs to make money somehow, right? While Ansible is open source, it also has an enterprise product called Ansible Tower that adds features such as role-based access control (RBAC), reporting, web UI, REST APIs, multi-tenancy, and much more, which is usually a nice fit for enterprises looking to deploy Ansible. And the best part is that even Ansible Tower is _FREE_ for up to 10 devices—so, at least you can get a taste of Tower to see if it can benefit your organization without spending a dime and sitting in countless sales meetings.
### Extensible
We stated earlier that Ansible was primarily built as an automation platform for deploying Linux applications, although it has expanded to Windows since the early days. The point is that the Ansible open source project did not have the goal of automating network infrastructure. The truth is that the more the Ansible community understood how flexible and extensible the underlying Ansible architecture was, the easier it became to _extend_ Ansible for their automation needs, which included networking. Over the past two years, there have been a number of Ansible integrations developed, many by industry independents such as Matt Oswalt, Jason Edelman, Kirk Byers, Elisa Jasinska, David Barroso, Michael Ben-Ami, Patrick Ogenstad, and Gabriele Gerbino, as well as by leading networking network vendors such as Arista, Juniper, Cumulus, Cisco, F5, and Palo Alto Networks.
### Integrating into Existing DevOps Workflows
Ansible is used for application deployments within IT organizations. Its used by operations teams that need to manage the deployment, monitoring, and management of various types of applications. By integrating Ansible with the network infrastructure, it expands what is possible when new applications are turned up or migrated. Rather than have to wait for a new top of rack (TOR) switch to be turned up, a VLAN to be added, or interface speed/duplex to be checked, all of these network-centric tasks can be automated and integrated into existing workflows that already exist within the IT organization.
### Idempotency
The term _idempotency_ (pronounced item-potency) is used often in the world of software development, especially when working with REST APIs, as well as in the world of _DevOps_ automation and configuration management frameworks, including Ansible. One of Ansibles beliefs is that all Ansible modules (integrations) should be idempotent. Okay, so what does it mean for a module to be idempotent? After all, this is a new term for most network engineers.
The answer is simple. Being idempotent allows the defined task to run one time or a thousand times without having an adverse effect on the target system, only ever making the change once. In other words, if a change is required to get the system into its desired state, the change is made; and if the device is already in its desired state, no change is made. This is unlike most traditional custom scripts and the copy and pasting of CLI commands into a terminal window. When the same command or script is executed repeatedly on the same system, errors are (sometimes) raised. Ever paste a command set into a router and get some type of error that invalidates the rest of your configuration? Was that fun?
Another example is if you have a text file or a script that configures 10 VLANs, the same commands are then entered 10 times _EVERY_ time the script is run. If an idempotent Ansible module is used, the existing configuration is gathered first from the network device, and each new VLAN being configured is checked against the current configuration. Only if the new VLAN needs to be added (or changed—VLAN name, as an example) is a change or command actually pushed to the device.
As the technologies become more complex, the value of idempotency only increases because with idempotency, you shouldnt care about the _existing_ state of the network device being modified, only the _desired_ state that you are trying to achieve from a network configuration and policy perspective.
### Network-Wide and Ad Hoc Changes
One of the problems solved with configuration management tools is configuration drift (when a devices desired configuration gradually drifts, or changes, over time due to manual change and/or having multiple disparate tools being used in an environment)—in fact, this is where tools like Puppet and Chef got started. Agents _phone home_ to the head-end server, validate its configuration, and if a change is required, the change is made. The approach is simple enough. What if an outage occurs and you need to troubleshoot though? You usually bypass the management system, go direct to a device, find the fix, and quickly leave for the day, right? Sure enough, at the next time interval when the agent phones back home, the change made to fix the problem is overwritten (based on how the _master/head-end server_ is configured). One-off changes should always be limited in highly automated environments, but tools that still allow for them are greatly valuable. As you guessed, one of these tools is Ansible.
Because Ansible is agentless, there is not a default push or pull to prevent configuration drift. The tasks to automate are defined in what is called an Ansible playbook. When using Ansible, it is up to the user to run the playbook. If the playbook is to be executed at a given time interval and youre not using Ansible Tower, you will definitely know how often the tasks are run; if you are just using the native Ansible command line from a terminal prompt, the playbook is run once and only once.
Running a playbook once by default is attractive for network engineers. It is added peace of mind that changes made manually on the device are not going to be automatically overwritten. Additionally, the scope of devices that a playbook is executed against is easily changed when needed such that even if a single change needs to automate only a single device, Ansible can still be used. The _scope_ of devices is determined by what is called an Ansible inventory file; the inventory could have one device or a thousand devices.
The following shows a sample inventory file with two groups defined and a total of six network devices:
```
[core-switches]
dc-core-1
dc-core-2
[leaf-switches]
leaf1
leaf2
leaf3
leaf4
```
To automate all hosts, a snippet from your play definition in a playbook looks like this:
```
hosts: all
```
And to automate just one leaf switch, it looks like this:
```
hosts: leaf1
```
And just the core switches:
```
hosts: core-switches
```
###### Note
As stated previously, playbooks, plays, and inventories are covered in more detail later on this report.
Being able to easily automate one device or _N_ devices makes Ansible a great choice for making those one-off changes when they are required. Its also great for those changes that are network-wide: possibly for shutting down all interfaces of a given type, configuring interface descriptions, or adding VLANs to wiring closets across an enterprise campus network.
### Network Task Automation with Ansible
This report is gradually getting more technical in two areas. The first area is around the details and architecture of Ansible, and the second area is about exactly what types of tasks can be automated from a network perspective with Ansible. The latter is what well take a look at in this chapter.
Automation is commonly equated with speed, and considering that some network tasks dont require speed, its easy to see why some IT teams dont see the value in automation. VLAN configuration is a great example because you may be thinking, "How _fast_ does a VLAN really need to get created? Just how many VLANs are being added on a daily basis? Do _I_ really need automation?”
In this section, we are going to focus on several other tasks where automation makes sense such as device provisioning, data collection, reporting, and compliance. But remember, as we stated earlier, automation is much more than speed and agility as its offering you, your team, and your business more predictable and more deterministic outcomes.
### Device Provisioning
One of the easiest and fastest ways to get started using Ansible for network automation is creating device configuration files that are used for initial device provisioning and pushing them to network devices.
If we take this process and break it down into two steps, the first step is creating the configuration file, and the second is pushing the configuration onto the device.
First, we need to decouple the _inputs_ from the underlying vendor proprietary syntax (CLI) of the config file. This means well have separate files with values for the configuration parameters such as VLANs, domain information, interfaces, routing, and everything else, and then, of course, a configuration template file(s). For this example, this is our standard golden template thats used for all devices getting deployed. Ansible helps bridge the gap between rendering the inputs and values with the configuration template. In less than a few seconds, Ansible can generate hundreds of configuration files predictably and reliably.
Lets take a quick look at an example of taking a current configuration and decomposing it into a template and separate variables (inputs) file.
Here is an example of a configuration file snippet:
```
hostname leaf1
ip domain-name ntc.com
!
vlan 10
name web
!
vlan 20
name app
!
vlan 30
name db
!
vlan 40
name test
!
vlan 50
name misc
```
If we extract the input values, this file is transformed into a template.
###### Note
Ansible uses the Python-based Jinja2 templating language, thus the template called _leaf.j2_ is a Jinja2 template.
Note that in the following example the _double curly braces_ denote a variable.
The resulting template looks like this and is given the filename _leaf.j2_:
```
!
hostname {{ inventory_hostname }}
ip domain-name {{ domain_name }}
!
!
{% for vlan in vlans %}
vlan {{ vlan.id }}
name {{ vlan.name }}
{% endfor %}
!
```
Since the double curly braces denote variables, and we see those values are not in the template, they need to be stored somewhere. They get stored in a variables file. A matching variables file for the previously shown template looks like this:
```
---
hostname: leaf1
domain_name: ntc.com
vlans:
- { id: 10, name: web }
- { id: 20, name: app }
- { id: 30, name: db }
- { id: 40, name: test }
- { id: 50, name: misc }
```
This means if the team that controls VLANs wants to add a VLAN to the network devices, no problem. Have them change it in the variables file and regenerate a new config file using the Ansible module called `template`. This whole process is idempotent too; only if there is a change to the template or values being entered will a new configuration file be generated.
Once the configuration is generated, it needs to be _pushed_ to the network device. One such method to push configuration files to network devices is using the open source Ansible module called `napalm_install_config`.
The next example is a sample playbook to _build and push_ a configuration to network devices. Again, this playbook uses the `template` module to build the configuration files and the `napalm_install_config` to push them and activate them as the new running configurations on the devices.
Even though every line isnt reviewed in the example, you can still make out what is actually happening.
###### Note
The following playbook introduces new concepts such as the built-in variable `inventory_hostname`. These concepts are covered in [Ansible Terminology and Getting Started][1].
```
---
- name: BUILD AND PUSH NETWORK CONFIGURATION FILES
hosts: leaves
connection: local
gather_facts: no
tasks:
- name: BUILD CONFIGS
template:
src=templates/leaf.j2
dest=configs/{{inventory_hostname }}.conf
- name: PUSH CONFIGS
napalm_install_config:
hostname={{ inventory_hostname }}
username={{ un }}
password={{ pwd }}
dev_os={{ os }}
config_file=configs/{{ inventory_hostname }}.conf
commit_changes=1
replace_config=0
```
This two-step process is the simplest way to get started with network automation using Ansible. You simply template your configs, build config files, and push them to the network device—otherwise known as the _BUILD and PUSH_ method.
###### Note
Another example like this is reviewed in much more detail in [Ansible Network Integrations][2].
### Data Collection and Monitoring
Monitoring tools typically use SNMP—these tools poll certain management information bases (MIBs) and return data to the monitoring tool. Based on the data being returned, it may be more or less than you actually need. What if interface stats are being polled? You are likely getting back every counter that is displayed in a _show interface_ command. What if you only need _interface resets_ and wish to see these resets correlated to the interfaces that have CDP/LLDP neighbors on them? Of course, this is possible with current technology; it could be you are running multiple show commands and parsing the output manually, or youre using an SNMP-based tool but going between tabs in the GUI trying to find the data you actually need. How does Ansible help with this?
Being that Ansible is totally open and extensible, its possible to collect and monitor the exact counters or values needed. This may require some up-front custom work but is totally worth it in the end, because the data being gathered is what you need, not what the vendor is providing you. Ansible also provides intuitive ways to perform certain tasks conditionally, which means based on data being returned, you can perform subsequent tasks, which may be to collect more data or to make a configuration change.
Network devices have _A LOT_ of static and ephemeral data buried inside, and Ansible helps extract the bits you need.
You can even use Ansible modules that use SNMP behind the scenes, such as a module called `snmp_device_version`. This is another open source module that exists within the community:
```
- name: GET SNMP DATA
snmp_device_version:
host=spine
community=public
version=2c
```
Running the preceding task returns great information about a device and adds some level of discovery capabilities to Ansible. For example, that task returns the following data:
```
{"ansible_facts": {"ansible_device_os": "nxos", "ansible_device_vendor": "cisco", "ansible_device_version": "7.0(3)I2(1)"}, "changed": false}
```
You can now determine what type of device something is without knowing up front. All you need to know is the read-only community string of the device.
### Migrations
Migrating from one platform to the next is never an easy task. This may be from the same vendor or from different vendors. Vendors may offer a script or a tool to help with migrations. Ansible can be used to build out configuration templates for all types of network devices and operating systems in such a way that you could generate a configuration file for all vendors given a defined and common set of inputs (common data model). Of course, if there are vendor proprietary extensions, theyll need to be accounted for, too. Having this type of flexibility helps with not only migrations, but also disaster recovery (DR), as its very common to have different switch models in the production and DR data centers, maybe even different vendors.
### Configuration Management
As stated, configuration management is the most common type of automation. What Ansible allows you to do fairly easily is create _roles_ to streamline the consumption of task-based automation. From a high level, a role is a logical grouping of reusable tasks that are automated against a particular group of devices. Another way to think about roles is to think about workflows. First and foremost, workflows and processes need to be understood before automation is going to start adding value. Its always important to start small and expand from there.
For example, a set of tasks that automate the configuration of routers and switches is very common and is a great place to start. But where do the IP addresses come from that are configured on network devices? Maybe an IP address management solution? Once the IP addresses are allocated for a given function and deployed, does DNS need to be updated too? Do DHCP scopes need to be created?
Can you see how the workflow can start small and gradually expand across different IT systems? As the workflow continues to expand, so would the role.
### Compliance
As with many forms of automation, making configuration changes with any type of automation tool is seen as a risk. While making manual changes could arguably be riskier, as youve read and may have experienced firsthand, Ansible has capabilities to automate data collection, monitoring, and configuration building, which are all "read-only" and "low risk" actions. One _low risk_ use case that can use the data being gathered is configuration compliance checks and configuration validation. Does the deployed configuration meet security requirements? Are the required networks configured? Is protocol XYZ disabled? Since each module, or integration, with Ansible returns data, it is quite simple to _assert_ that something is _TRUE_ or _FALSE_. And again, based on _it_ being _TRUE_ or _FALSE_, its up to you to determine what happens next—maybe it just gets logged, or maybe a complex operation is performed.
### Reporting
We now understand that Ansible can also be used to collect data and perform compliance checks. The data being returned and collected from the device by way of Ansible is up for grabs in terms of what you want to do with it. Maybe the data being returned becomes inputs to other tasks, or maybe you just want to create reports. Being that reports are generated from templates combined with the actual important data to be inserted into the template, the process to create and use reporting templates is the same process used to create configuration templates.
From a reporting perspective, these templates may be flat text files, markdown files that are viewed on GitHub, HTML files that get dynamically placed on a web server, and the list goes on. The user has the power to create the exact type of report she wishes, inserting the exact data she needs to be part of that report.
It is powerful to create reports not only for executive management, but also for the ops engineers, since there are usually different metrics both teams need.
### How Ansible Works
After looking at what Ansible can offer from a network automation perspective, well now take a look at how Ansible works. You will learn about the overall communication flow from an Ansible control host to the nodes that are being automated. First, we review how Ansible works _out of the box_, and we then take a look at how Ansible, and more specifically Ansible _modules_, work when network devices are being automated.
### Out of the Box
By now, you should understand that Ansible is an automation platform. In fact, it is a lightweight automation platform that is installed on a single server or on every administrators laptop within an organization. You decide. Ansible is easily installed using utilities such as pip, apt, and yum on Linux-based machines.
###### Note
The machine that Ansible is installed on is referred to as the _control host_ through the remainder of this report.
The control host will perform all automation tasks that are defined in an Ansible playbook (dont worry; well cover playbooks and other Ansible terms soon enough). The important piece for now is to understand that a playbook is simply a set of automation tasks and instructions that gets executed on a given number of hosts.
When a playbook is created, you also need to define which hosts you want to automate. The mapping between the playbook and the hosts to automate happens by using what is known as an Ansible inventory file. This was already shown in an earlier example, but here is another sample inventory file showing two groups: `cisco`and `arista`:
```
[cisco]
nyc1.acme.com
nyc2.acme.com
[arista]
sfo1.acme.com
sfo2.acme.com
```
###### Note
You can also use IP addresses within the inventory file, instead of hostnames. For these examples, the hostnames were resolvable via DNS.
As you can see, the Ansible inventory file is a text file that lists hosts and groups of hosts. You then reference a specific host or a group from within the playbook, thus dictating which hosts get automated for a given play and playbook. This is shown in the following two examples.
The first example shows what it looks like if you wanted to automate all hosts within the `cisco` group, and the second example shows how to automate just the _nyc1.acme.com_ host:
```
---
- name: TEST PLAYBOOK
hosts: cisco
tasks:
- TASKS YOU WANT TO AUTOMATE
```
```
---
- name: TEST PLAYBOOK
hosts: nyc1.acme.com
tasks:
- TASKS YOU WANT TO AUTOMATE
```
Now that the basics of inventory files are understood, we can take a look at how Ansible (the control host) communicates with devices _out of the box_ and how tasks are automated on Linux endpoints. This is an important concept to understand, as this is usually different when network devices are being automated.
There are two main requirements for Ansible to work out of the box to automate Linux-based systems. These requirements are SSH and Python.
First, the endpoints must support SSH for transport, since Ansible uses SSH to connect to each target node. Because Ansible supports a pluggable connection architecture, there are also various plug-ins available for different types of SSH implementations.
The second requirement is how Ansible gets around the need to require an _agent_ to preexist on the target node. While Ansible does not require a software agent, it does require an onboard Python execution engine. This execution engine is used to execute Python code that is transmitted from the Ansible control host to the target node being automated.
If we elaborate on this out of the box workflow, it is broken down as follows:
1. When an Ansible play is executed, the control host connects to the Linux-based target node using SSH.
2. For each task, that is, Ansible module being executed within the play, Python code is transmitted over SSH and executed directly on the remote system.
3. Each Ansible module upon execution on the remote system returns JSON data to the control host. This data includes information such as if the configuration changed, if the task passed/failed, and other module-specific data.
4. The JSON data returned back to Ansible can then be used to generate reports using templates or as inputs to subsequent modules.
5. Repeat step 3 for each task that exists within the play.
6. Repeat step 1 for each play within the playbook.
Shouldnt this mean that network devices should work out of the box with Ansible because they also support SSH? It is true that network devices do support SSH, but it is the first requirement combined with the second one that limits the functionality possible for network devices.
To start, most network devices do not support Python, so it makes using the default Ansible connection mechanism process a non-starter. That said, over the past few years, vendors have added Python support on several different device platforms. However, most of these platforms still lack the integration needed to allow Ansible to get direct access to a Linux shell over SSH with the proper permissions to copy over the required code, create temp directories and files, and execute the code on box. While all the parts are there for Ansible to work natively with SSH/Python _and_ Linux-based network devices, it still requires network vendors to open their systems more than they already have.
###### Note
It is worth noting that Arista does offer native integration because it is able to drop SSH users directly into a Linux shell with access to a Python execution engine, which in turn does allow Ansible to use its default connection mechanism. Because we called out Arista, we need to also highlight Cumulus as working with Ansibles default connection mechanism, too. This is because Cumulus Linux is native Linux, and there isnt a need to use a vendor API for the automation of the Cumulus Linux OS.
### Ansible Network Integrations
The previous section covered the way Ansible works by default. We looked at how Ansible sets up a connection to a device at the beginning of a _play_, executes tasks by copying Python code to the devices, executes the code, and then returns results back to the Ansible control host.
In this section, well take a look at what this process is when automating network devices with Ansible. As already covered, Ansible has a pluggable connection architecture. For _most_ network integrations, the `connection` parameter is set to `local`. The most common place to make the connection type local is within the playbook, as shown in the following example:
```
---
- name: TEST PLAYBOOK
hosts: cisco
connection: local
tasks:
- TASKS YOU WANT TO AUTOMATE
```
Notice how within the play definition, this example added the `connection` parameter as compared to the examples in the previous section.
This tells Ansible not to connect to the target device via SSH and to just connect to the local machine running the playbook. Basically, this delegates the connection responsibility to the actual Ansible modules being used within the _tasks_ section of the playbook. Delegating power for each type of module allows the modules to connect to the device in whatever fashion necessary; this could be NETCONF for Juniper and HP Comware7, eAPI for Arista, NX-API for Cisco Nexus, or even SNMP for traditional/legacy-based systems that dont have a programmatic API.
###### Note
Network integrations in Ansible come in the form of Ansible modules. While we continue to whet your appetite using terminology such as playbooks, plays, tasks, and modules to convey key concepts, each of these terms are finally covered in greater detail in [Ansible Terminology and Getting Started][3] and [Hands-on Look at Using Ansible for Network Automation][4].
Lets take a look at another sample playbook:
```
---
- name: TEST PLAYBOOK
hosts: cisco
connection: local
tasks:
- nxos_vlan: vlan_id=10 name=WEB_VLAN
```
If you notice, this playbook now includes a task, and this task uses the `nxos_vlan` module. The `nxos_vlan` module is just a Python file, and it is in this file where the connection to the Cisco NX-OS device is made using NX-API. However, the connection could have been set up using any other device API, and this is how vendors and users like us are able to build our own integrations. Integrations (modules) are typically done on a per-feature basis, although as youve already seen with modules like `napalm_install_config`, they can be used to _push_ a full configuration file, too.
One of the major differences is that with the default connection mechanism, Ansible launches a persistent SSH connection to the device, and this connection persists for a given play. When the connection setup and teardown occurs within the module, as with many network modules that use `connection=local`, Ansible is logging in/out of the device on _every_ task versus this happening on the play level.
And in traditional Ansible fashion, each network module returns JSON data. The only difference is the massaging of this data is happening locally on the Ansible control host versus on the target node. The data returned back to the playbook varies per vendor and type of module, but as an example, many of the Cisco NX-OS modules return back existing state, proposed state, and end state, as well as the commands (if any) that are being sent to the device.
As you get started using Ansible for network automation, it is important to remember that setting the connection parameter to local is taking Ansible out of the connection setup/teardown process and leaving that up to the module. This is why modules supported for different types of vendor platforms will have different ways of communicating with the devices.
### Ansible Terminology and Getting Started
This chapter walks through many of the terms and key concepts that have been gradually introduced already in this report. These are terms such as _inventory file_, _playbook_, _play_, _tasks_, and _modules_. We also review a few other concepts that are helpful to be aware of when getting started with Ansible for network automation.
Please reference the following sample inventory file and playbook throughout this section, as they are continuously used in the examples that follow to convey what each Ansible term means.
_Sample inventory_:
```
# sample inventory file
# filename inventory
[all:vars]
user=admin
pwd=admin
[tor]
rack1-tor1 vendor=nxos
rack1-tor2 vendor=nxos
rack2-tor1 vendor=arista
rack2-tor2 vendor=arista
[core]
core1
core2
```
_Sample playbook_:
```
---
# sample playbook
# filename site.yml
- name: PLAY 1 - Top of Rack (TOR) Switches
hosts: tor
connection: local
tasks:
- name: ENSURE VLAN 10 EXISTS ON CISCO TOR SWITCHES
nxos_vlan:
vlan_id=10
name=WEB_VLAN
host={{ inventory_hostname }}
username=admin
password=admin
when: vendor == "nxos"
- name: ENSURE VLAN 10 EXISTS ON ARISTA TOR SWITCHES
eos_vlan:
vlanid=10
name=WEB_VLAN
host={{ inventory_hostname }}
username={{ user }}
password={{ pwd }}
when: vendor == "arista"
- name: PLAY 2 - Core (TOR) Switches
hosts: core
connection: local
tasks:
- name: ENSURE VLANS EXIST IN CORE
nxos_vlan:
vlan_id={{ item }}
host={{ inventory_hostname }}
username={{ user }}
password={{ pwd }}
with_items:
- 10
- 20
- 30
- 40
- 50
```
### Inventory File
Using an inventory file, such as the preceding one, enables us to automate tasks for specific hosts and groups of hosts by referencing the proper host/group using the `hosts` parameter that exists at the top section of each play.
It is also possible to store variables within an inventory file. This is shown in the example. If the variable is on the same line as a host, it is a host-specific variable. If the variables are defined within brackets such as `[all:vars]`, it means that the variables are in scope for the group `all`, which is a default group that includes _all_ hosts in the inventory file.
###### Note
Inventory files are the quickest way to get started with Ansible, but should you already have a source of truth for network devices such as a network management tool or CMDB, it is possible to create and use a dynamic inventory script rather than a static inventory file.
### Playbook
The playbook is the top-level object that is executed to automate network devices. In our example, this is the file _site.yml_, as depicted in the preceding example. A playbook uses YAML to define the set of tasks to automate, and each playbook is comprised of one or more plays. This is analogous to a football playbook. Like in football, teams have playbooks made up of plays, and Ansible playbooks are made up of plays, too.
###### Note
YAML is a data format that is supported by all programming languages. YAML is itself a superset of JSON, and its quite easy to recognize YAML files, as they always start with three dashes (hyphens), `---`.
### Play
One or more plays can exist within an Ansible playbook. In the preceding example, there are two plays within the playbook. Each starts with a _header_ section where play-specific parameters are defined.
The two plays from that example have the following parameters defined:
`name`
The text `PLAY 1 - Top of Rack (TOR) Switches` is arbitrary and is displayed when the playbook runs to improve readability during playbook execution and reporting. This is an optional parameter.
`hosts`
As covered previously, this is the host or group of hosts that are automated in this particular play. This is a required parameter.
`connection`
As covered previously, this is the type of connection mechanism used for the play. This is an optional parameter, but is commonly set to `local` for network automation plays.
Each play is comprised of one or more tasks.
### Tasks
Tasks represent what is automated in a declarative manner without worrying about the underlying syntax or "how" the operation is performed.
In our example, the first play has two tasks. Each task ensures VLAN 10 exists. The first task does this for Cisco Nexus devices, and the second task does this for Arista devices:
```
tasks:
- name: ENSURE VLAN 10 EXISTS ON CISCO TOR SWITCHES
nxos_vlan:
vlan_id=10
name=WEB_VLAN
host={{ inventory_hostname }}
username=admin
password=admin
when: vendor == "nxos"
```
Tasks can also use the `name` parameter just like plays can. As with plays, the text is arbitrary and is displayed when the playbook runs to improve readability during playbook execution and reporting. It is an optional parameter for each task.
The next line in the example task starts with `nxos_vlan`. This tell us that this task will execute the Ansible module called `nxos_vlan`.
Well now dig deeper into modules.
### Modules
It is critical to understand modules within Ansible. While any programming language can be used to write Ansible modules as long as they return JSON key-value pairs, they are almost always written in Python. In our example, we see two modules being executed: `nxos_vlan` and `eos_vlan`. The modules are both Python files; and in fact, while you cant tell from looking at the playbook, the real filenames are _eos_vlan.py_ and _nxos_vlan.py_, respectively.
Lets look at the first task in the first play from the preceding example:
```
- name: ENSURE VLAN 10 EXISTS ON CISCO TOR SWITCHES
nxos_vlan:
vlan_id=10
name=WEB_VLAN
host={{ inventory_hostname }}
username=admin
password=admin
when: vendor == "nxos"
```
This task executes `nxos_vlan`, which is a module that automates VLAN configuration. In order to use modules, including this one, you need to specify the desired state or configuration policy you want the device to have. This example states: VLAN 10 should be configured with the name `WEB_VLAN`, and it should exist on each switch being automated. We can see this easily with the `vlan_id`and `name` parameters. There are three other parameters being passed into the module as well. They are `host`, `username`, and `password`:
`host`
This is the hostname (or IP address) of the device being automated. Since the hosts we want to automate are already defined in the inventory file, we can use the built-in Ansible variable `inventory_hostname`. This variable is equal to what is in the inventory file. For example, on the first iteration, the host in the inventory file is `rack1-tor1`, and on the second iteration, it is `rack1-tor2`. These names are passed into the module and then within the module, a DNS lookup occurs on each name to resolve it to an IP address. Then the communication begins with the device.
`username`
Username used to log in to the switch.
`password`
Password used to log in to the switch.
The last piece to cover here is the use of the `when` statement. This is how Ansible performs conditional tasks within a play. As we know, there are multiple devices and types of devices that exist within the `tor` group for this play. Using `when` offers an option to be more selective based on any criteria. Here we are only automating Cisco devices because we are using the `nxos_vlan` module in this task, while in the next task, we are automating only the Arista devices because the `eos_vlan` module is used.
###### Note
This isnt the only way to differentiate between devices. This is being shown to illustrate the use of `when` and that variables can be defined within the inventory file.
Defining variables in an inventory file is great for getting started, but as you continue to use Ansible, youll want to use YAML-based variables files to help with scale, versioning, and minimizing change to a given file. This will also simplify and improve readability for the inventory file and each variables file used. An example of a variables file was given earlier when the build/push method of device provisioning was covered.
Here are a few other points to understand about the tasks in the last example:
* Play 1 task 1 shows the `username` and `password` hardcoded as parameters being passed into the specific module (`nxos_vlan`).
* Play 1 task 1 and play 2 passed variables into the module instead of hardcoding them. This masks the `username` and `password`parameters, but its worth noting that these variables are being pulled from the inventory file (for this example).
* Play 1 uses a _horizontal_ key=value syntax for the parameters being passed into the modules, while play 2 uses the vertical key=value syntax. Both work just fine. You can also use vertical YAML syntax with "key: value" syntax.
* The last task also introduces how to use a _loop_ within Ansible. This is by using `with_items` and is analogous to a for loop. That particular task is looping through five VLANs to ensure they all exist on the switch. Note: its also possible to store these VLANs in an external YAML variables file as well. Also note that the alternative to not using `with_items` would be to have one task per VLAN—and that just wouldnt scale!
### Hands-on Look at Using Ansible for Network Automation
In the previous chapter, a general overview of Ansible terminology was provided. This covered many of the specific Ansible terms, such as playbooks, plays, tasks, modules, and inventory files. This section will continue to provide working examples of using Ansible for network automation, but will provide more detail on working with modules to automate a few different types of devices. Examples will include automating devices from multiple vendors, including Cisco, Arista, Cumulus, and Juniper.
The examples in this section assume the following:
* Ansible is installed.
* The proper APIs are enabled on the devices (NX-API, eAPI, NETCONF).
* Users exist with the proper permissions on the system to make changes via the API.
* All Ansible modules exist on the system and are in the library path.
###### Note
Setting the module and library path can be done within the _ansible.cfg_ file. You can also use the `-M` flag from the command line to change it when executing a playbook.
The inventory used for the examples in this section is shown in the following section (with passwords removed and IP addresses changed). In this example, some hostnames are not FQDNs as they were in the previous examples.
### Inventory File
```
[cumulus]
cvx ansible_ssh_host=1.2.3.4 ansible_ssh_pass=PASSWORD
[arista]
veos1
[cisco]
nx1 hostip=5.6.7.8 un=USERNAME pwd=PASSWORD
[juniper]
vsrx hostip=9.10.11.12 un=USERNAME pwd=PASSWORD
```
###### Note
Just in case youre wondering at this point, Ansible does support functionality that allows you to store passwords in encrypted files. If you want to learn more about this feature, check out [Ansible Vault][5] in the docs on the Ansible website.
This inventory file has four groups defined with a single host in each group. Lets review each section in a little more detail:
Cumulus
The host `cvx` is a Cumulus Linux (CL) switch, and it is the only device in the `cumulus` group. Remember that CL is native Linux, so this means the default connection mechanism (SSH) is used to connect to and automate the CL switch. Because `cvx` is not defined in DNS or _/etc/hosts_, well let Ansible know not to use the hostname defined in the inventory file, but rather the name/IP defined for `ansible_ssh_host`. The username to log in to the CL switch is defined in the playbook, but you can see that the password is being defined in the inventory file using the `ansible_ssh_pass` variable.
Arista
The host called `veos1` is an Arista switch running EOS. It is the only host that exists within the `arista` group. As you can see for Arista, there are no other parameters defined within the inventory file. This is because Arista uses a special configuration file for their devices. This file is called _.eapi.conf_ and for our example, it is stored in the home directory. Here is the conf file being used for this example to function properly:
```
[connection:veos1]
host: 2.4.3.4
username: unadmin
password: pwadmin
```
This file contains all required information for Ansible (and the Arista Python library called _pyeapi_) to connect to the device using just the information as defined in the conf file.
Cisco
Just like with Cumulus and Arista, there is only one host (`nx1`) that exists within the `cisco` group. This is an NX-OS-based Cisco Nexus switch. Notice how there are three variables defined for `nx1`. They include `un` and `pwd`, which are accessed in the playbook and passed into the Cisco modules in order to connect to the device. In addition, there is a parameter called `hostip`. This is required because `nx1` is also not defined in DNS or configured in the _/etc/hosts_ file.
###### Note
We could have named this parameter anything. If automating a native Linux device, `ansible_ssh_host` is used just like we saw with the Cumulus example (if the name as defined in the inventory is not resolvable). In this example, we could have still used `ansible_ssh_host`, but it is not a requirement, since well be passing this variable as a parameter into Cisco modules, whereas `ansible_ssh_host` is automatically checked when using the default SSH connection mechanism.
Juniper
As with the previous three groups and hosts, there is a single host `vsrx` that is located within the `juniper` group. The setup within the inventory file is identical to that of Ciscos as both are used the same exact way within the playbook.
### Playbook
The next playbook has four different plays. Each play is built to automate a specific group of devices based on vendor type. Note that this is only one way to perform these tasks within a single playbook. There are other ways in which we could have used conditionals (`when` statement) or created Ansible roles (which is not covered in this report).
Here is the example playbook:
```
---
- name: PLAY 1 - CISCO NXOS
hosts: cisco
connection: local
tasks:
- name: ENSURE VLAN 100 exists on Cisco Nexus switches
nxos_vlan:
vlan_id=100
name=web_vlan
host={{ hostip }}
username={{ un }}
password={{ pwd }}
- name: PLAY 2 - ARISTA EOS
hosts: arista
connection: local
tasks:
- name: ENSURE VLAN 100 exists on Arista switches
eos_vlan:
vlanid=100
name=web_vlan
connection={{ inventory_hostname }}
- name: PLAY 3 - CUMULUS
remote_user: cumulus
sudo: true
hosts: cumulus
tasks:
- name: ENSURE 100.10.10.1 is configured on swp1
cl_interface: name=swp1 ipv4=100.10.10.1/24
- name: restart networking without disruption
shell: ifreload -a
- name: PLAY 4 - JUNIPER SRX changes
hosts: juniper
connection: local
tasks:
- name: INSTALL JUNOS CONFIG
junos_install_config:
host={{ hostip }}
file=srx_demo.conf
user={{ un }}
passwd={{ pwd }}
logfile=deploysite.log
overwrite=yes
diffs_file=junpr.diff
```
You will notice the first two plays are very similar to what we already covered in the original Cisco and Arista example. The only difference is that each group being automated (`cisco` and `arista`) is defined in its own play, and this is in contrast to using the `when`conditional that was used earlier.
There is no right way or wrong way to do this. It all depends on what information is known up front and what fits your environment and use cases best, but our intent is to show a few ways to do the same thing.
The third play automates the configuration of interface `swp1` that exists on the Cumulus Linux switch. The first task within this play ensures that `swp1` is a Layer 3 interface and is configured with the IP address 100.10.10.1\. Because Cumulus Linux is native Linux, the networking service needs to be restarted for the changes to take effect. This could have also been done using Ansible handlers (out of the scope of this report). There is also an Ansible core module called `service` that could have been used, but that would disrupt networking on the switch; using `ifreload` restarts networking non-disruptively.
Up until now in this section, we looked at Ansible modules focused on specific tasks such as configuring interfaces and VLANs. The fourth play uses another option. Well look at a module that _pushes_ a full configuration file and immediately activates it as the new running configuration. This is what we showed previously using `napalm_install_config`, but this example uses a Juniper-specific module called `junos_install_config`.
This module `junos_install_config` accepts several parameters, as seen in the example. By now, you should understand what `user`, `passwd`, and `host` are used for. The other parameters are defined as follows:
`file`
This is the config file that is copied from the Ansible control host to the Juniper device.
`logfile`
This is optional, but if specified, it is used to store messages generated while executing the module.
`overwrite`
When set to yes/true, the complete configuration is replaced with the file being sent (default is false).
`diffs_file`
This is optional, but if specified, will store the diffs generated when applying the configuration. An example of the diff generated when just changing the hostname but still sending a complete config file is shown next:
```
# filename: junpr.diff
[edit system]
- host-name vsrx;
+ host-name vsrx-demo;
```
That covers the detailed overview of the playbook. Lets take a look at what happens when the playbook is executed:
###### Note
Note: the `-i` flag is used to specify the inventory file to use. The `ANSIBLE_HOSTS`environment variable can also be set rather than using the flag each time a playbook is executed.
```
ntc@ntc:~/ansible/multivendor$ ansible-playbook -i inventory demo.yml
PLAY [PLAY 1 - CISCO NXOS] *************************************************
TASK: [ENSURE VLAN 100 exists on Cisco Nexus switches] *********************
changed: [nx1]
PLAY [PLAY 2 - ARISTA EOS] *************************************************
TASK: [ENSURE VLAN 100 exists on Arista switches] **************************
changed: [veos1]
PLAY [PLAY 3 - CUMULUS] ****************************************************
GATHERING FACTS ************************************************************
ok: [cvx]
TASK: [ENSURE 100.10.10.1 is configured on swp1] ***************************
changed: [cvx]
TASK: [restart networking without disruption] ******************************
changed: [cvx]
PLAY [PLAY 4 - JUNIPER SRX changes] ****************************************
TASK: [INSTALL JUNOS CONFIG] ***********************************************
changed: [vsrx]
PLAY RECAP ***************************************************************
to retry, use: --limit @/home/ansible/demo.retry
cvx : ok=3 changed=2 unreachable=0 failed=0
nx1 : ok=1 changed=1 unreachable=0 failed=0
veos1 : ok=1 changed=1 unreachable=0 failed=0
vsrx : ok=1 changed=1 unreachable=0 failed=0
```
You can see that each task completes successfully; and if you are on the terminal, youll see that each changed task was displayed with an amber color.
Lets run this playbook again. By running it again, we can verify that all of the modules are _idempotent_; and when doing this, we see that NO changes are made to the devices and everything is green:
```
PLAY [PLAY 1 - CISCO NXOS] ***************************************************
TASK: [ENSURE VLAN 100 exists on Cisco Nexus switches] ***********************
ok: [nx1]
PLAY [PLAY 2 - ARISTA EOS] ***************************************************
TASK: [ENSURE VLAN 100 exists on Arista switches] ****************************
ok: [veos1]
PLAY [PLAY 3 - CUMULUS] ******************************************************
GATHERING FACTS **************************************************************
ok: [cvx]
TASK: [ENSURE 100.10.10.1 is configured on swp1] *****************************
ok: [cvx]
TASK: [restart networking without disruption] ********************************
skipping: [cvx]
PLAY [PLAY 4 - JUNIPER SRX changes] ******************************************
TASK: [INSTALL JUNOS CONFIG] *************************************************
ok: [vsrx]
PLAY RECAP ***************************************************************
cvx : ok=2 changed=0 unreachable=0 failed=0
nx1 : ok=1 changed=0 unreachable=0 failed=0
veos1 : ok=1 changed=0 unreachable=0 failed=0
vsrx : ok=1 changed=0 unreachable=0 failed=0
```
Notice how there were 0 changes, but they still returned "ok" for each task. This verifies, as expected, that each of the modules in this playbook are idempotent.
### Summary
Ansible is a super-simple automation platform that is agentless and extensible. The network community continues to rally around Ansible as a platform that can be used for network automation tasks that range from configuration management to data collection and reporting. You can push full configuration files with Ansible, configure specific network resources with idempotent modules such as interfaces or VLANs, or simply just automate the collection of information such as neighbors, serial numbers, uptime, and interface stats, and customize reports as you need them.
Because of its architecture, Ansible proves to be a great tool available here and now that helps bridge the gap from _legacy CLI/SNMP_ network device automation to modern _API-driven_ automation.
Ansibles ease of use and agentless architecture accounts for the platforms increasing following within the networking community. Again, this makes it possible to automate devices without APIs (CLI/SNMP); devices that have modern APIs, including standalone switches, routers, and Layer 4-7 service appliances; and even those software-defined networking (SDN) controllers that offer RESTful APIs.
There is no device left behind when using Ansible for network automation.
-----------
作者简介:
![](https://d3tdunqjn7n0wj.cloudfront.net/360x360/jason-edelman-crop-5b2672f569f553a3de3a121d0179efcb.jpg)
Jason Edelman, CCIE 15394 & VCDX-NV 167, is a born and bred network engineer from the great state of New Jersey. He was the typical “lover of the CLI” or “router jockey.” At some point several years ago, he made the decision to focus more on software, development practices, and how they are converging with network engineering. Jason currently runs a boutique consulting firm, Network to Code, helping vendors and end users take advantage of new tools and technologies to reduce their operational inefficiencies. Jason has a Bachelors...
--------------------------------------------------------------------------------
via: https://www.oreilly.com/learning/network-automation-with-ansible
作者:[Jason Edelman][a]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://www.oreilly.com/people/ee4fd-jason-edelman
[1]:https://www.oreilly.com/learning/network-automation-with-ansible#ansible_terminology_and_getting_started
[2]:https://www.oreilly.com/learning/network-automation-with-ansible#ansible_network_integrations
[3]:https://www.oreilly.com/learning/network-automation-with-ansible#ansible_terminology_and_getting_started
[4]:https://www.oreilly.com/learning/network-automation-with-ansible#handson_look_at_using_ansible_for_network_automation
[5]:http://docs.ansible.com/ansible/playbooks_vault.html
[6]:https://www.oreilly.com/people/ee4fd-jason-edelman
[7]:https://www.oreilly.com/people/ee4fd-jason-edelman

View File

@ -1,333 +0,0 @@
# Kprobes Event Tracing on ARMv8
![core-dump](http://www.linaro.org/wp-content/uploads/2016/02/core-dump.png)
### Introduction
Kprobes is a kernel feature that allows instrumenting the kernel by setting arbitrary breakpoints that call out to developer-supplied routines before and after the breakpointed instruction is executed (or simulated). See the kprobes documentation[[1]][2] for more information. Basic kprobes functionality is selected withCONFIG_KPROBES. Kprobes support was added to mainline for arm64 in the v4.8 release.
In this article we describe the use of kprobes on arm64 using the debugfs event tracing interfaces from the command line to collect dynamic trace events. This feature has been available for some time on several architectures (including arm32), and is now available on arm64\. The feature allows use of kprobes without having to write any code.
### Types of Probes
The kprobes subsystem provides three different types of dynamic probes described below.
### Kprobes
The basic probe is a software breakpoint kprobes inserts in place of the instruction you are probing, saving the original instruction for eventual single-stepping (or simulation) when the probe point is hit.
### Kretprobes
Kretprobes is a part of kprobes that allows intercepting a returning function instead of having to set a probe (or possibly several probes) at the return points. This feature is selected whenever kprobes is selected, for supported architectures (including ARMv8).
### Jprobes
Jprobes allows intercepting a call into a function by supplying an intermediary function with the same calling signature, which will be called first. Jprobes is a programming interface only and cannot be used through the debugfs event tracing subsystem. As such we will not be discussing jprobes further here. Consult the kprobes documentation if you wish to use jprobes.
### Invoking Kprobes
Kprobes provides a set of APIs which can be called from kernel code to set up probe points and register functions to be called when probe points are hit. Kprobes is also accessible without adding code to the kernel, by writing to specific event tracing debugfs files to set the probe address and information to be recorded in the trace log when the probe is hit. The latter is the focus of what this document will be talking about. Lastly kprobes can be accessed through the perf command.
### Kprobes API
The kernel developer can write functions in the kernel (often done in a dedicated debug module) to set probe points and take whatever action is desired right before and right after the probed instruction is executed. This is well documented in kprobes.txt.
### Event Tracing
The event tracing subsystem has its own documentation[[2]][3] which might be worth a read to understand the background of event tracing in general. The event tracing subsystem serves as a foundation for both tracepoints and kprobes event tracing. The event tracing documentation focuses on tracepoints, so bear that in mind when consulting that documentation. Kprobes differs from tracepoints in that there is no predefined list of tracepoints but instead arbitrary dynamically created probe points that trigger the collection of trace event information. The event tracing subsystem is controlled and monitored through a set of debugfs files. Event tracing (CONFIG_EVENT_TRACING) will be selected automatically when needed by something like the kprobe event tracing subsystem.
#### Kprobes Events
With the kprobes event tracing subsystem the user can specify information to be reported at arbitrary breakpoints in the kernel, determined simply by specifying the address of any existing probeable instruction along with formatting information. When that breakpoint is encountered during execution kprobes passes the requested information to the common parts of the event tracing subsystem which formats and appends the data to the trace log, much like how tracepoints works. Kprobes uses a similar but mostly separate collection of debugfs files to control and display trace event information. This feature is selected withCONFIG_KPROBE_EVENT. The kprobetrace documentation[[3]][4] provides the essential information on how to use kprobes event tracing and should be consulted to understand details about the examples presented below.
### Kprobes and Perf
The perf tools provide another command line interface to kprobes. In particular “perf probe” allows probe points to be specified by source file and line number, in addition to function name plus offset, and address. The perf interface is really a wrapper for using the debugfs interface for kprobes.
### Arm64 Kprobes
All of the above aspects of kprobes are now implemented for arm64, in practice there are some differences from other architectures though:
* Register name arguments are, of course, architecture specific and can be found in the ARM ARM.
* Not all instruction types can currently be probed. Currently unprobeable instructions include mrs/msr(except DAIF read), exception generation instructions, eret, and hint (except for the nop variant). In these cases it is simplest to just probe a nearby instruction instead. These instructions are blacklisted from probing because the changes they cause to processor state are unsafe to do during kprobe single-stepping or instruction simulation, because the single-stepping context kprobes constructs is inconsistent with what the instruction needs, or because the instruction cant tolerate the additional processing time and exception handling in kprobes (ldx/stx).
* An attempt is made to identify instructions within a ldx/stx sequence and prevent probing, however it is theoretically possible for this check to fail resulting in allowing a probed atomic sequence which can never succeed. Be careful when probing around atomic code sequences.
* Note that because of the details of Linux ARM64 calling conventions it is not possible to reliably duplicate the stack frame for the probed function and for that reason no attempt is made to do so with jprobes, unlike the majority of other architectures supporting jprobes. The reason for this is that there is insufficient information for the callee to know for certain the amount of the stack that is needed.
* Note that the stack pointer information recorded from a probe will reflect the particular stack pointer in use at the time the probe was hit, be it the kernel stack pointer or the interrupt stack pointer.
* There is a list of kernel functions which cannot be probed, usually because they are called as part of kprobes processing. Part of this list is architecture-specific and also includes things like exception entry code.
### Using Kprobes Event Tracing
One common use case for kprobes is instrumenting function entry and/or exit. It is particularly easy to install probes for this since one can just use the function name for the probe address. Kprobes event tracing will look up the symbol name and determine the address. The ARMv8 calling standard defines where the function arguments and return values can be found, and these can be printed out as part of the kprobe event processing.
### Example: Function entry probing
Instrumenting a USB ethernet driver reset function:
```
_$ pwd
/sys/kernel/debug/tracing
$ cat > kprobe_events <<EOF
p ax88772_reset %x0
EOF
$ echo 1 > events/kprobes/enable_
```
At this point a trace event will be recorded every time the drivers _ax8872_reset()_ function is called. The event will display the pointer to the _usbnet_ structure passed in via X0 (as per the ARMv8 calling standard) as this functions only argument. After plugging in a USB dongle requiring this ethernet driver we see the following trace information:
```
_$ cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 1/1   #P:8
#
#                           _—=> irqs-off
#                          / _—-=> need-resched
#                         | / _—=> hardirq/softirq
#                         || / _=> preempt-depth
#                         ||| / delay
#        TASK-PID   CPU#  |||| TIMESTAMP  FUNCTION
#           | |    |   ||||    |      |
kworker/0:0-4             [000] d… 10972.102939:   p_ax88772_reset_0:
(ax88772_reset+0x0/0x230)   arg1=0xffff800064824c80_
```
Here we can see the value of the pointer argument passed in to our probed function. Since we did not use the optional labelling features of kprobes event tracing the information we requested is automatically labeled_arg1_.  Note that this refers to the first value in the list of values we requested that kprobes log for this probe, not the actual position of the argument to the function. In this case it also just happens to be the first argument to the function weve probed.
### Example: Function entry and return probing
The kretprobe feature is used specifically to probe a function return. At function entry the kprobes subsystem will be called and will set up a hook to be called at function return, where it will record the requested event information. For the most common case the return information, typically in the X0 register, is quite useful. The return value in %x0 can also be referred to as _$retval_. The following example also demonstrates how to provide a human-readable label to be displayed with the information of interest.
Example of instrumenting the kernel __do_fork()_ function to record arguments and results using a kprobe and a kretprobe:
```
_$ cd /sys/kernel/debug/tracing
$ cat > kprobe_events <<EOF
p _do_fork %x0 %x1 %x2 %x3 %x4 %x5
r _do_fork pid=%x0
EOF
$ echo 1 > events/kprobes/enable_
```
At this point every call to _do_fork() will produce two kprobe events recorded into the “_trace_” file, one reporting the calling argument values and one reporting the return value. The return value shall be labeled “_pid_” in the trace file. Here are the contents of the trace file after three fork syscalls have been made:
```
_$ cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 6/6   #P:8
#
#                              _—=> irqs-off
#                             / _—-=> need-resched
#                            | / _—=> hardirq/softirq
#                            || / _=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
              bash-1671  [001] d…   204.946007: p__do_fork_0: (_do_fork+0x0/0x3e4) arg1=0x1200011 arg2=0x0 arg3=0x0 arg4=0x0 arg5=0xffff78b690d0 arg6=0x0
              bash-1671  [001] d..1   204.946391: r__do_fork_0: (SyS_clone+0x18/0x20 <- _do_fork) pid=0x724
              bash-1671  [001] d…   208.845749: p__do_fork_0: (_do_fork+0x0/0x3e4) arg1=0x1200011 arg2=0x0 arg3=0x0 arg4=0x0 arg5=0xffff78b690d0 arg6=0x0
              bash-1671  [001] d..1   208.846127: r__do_fork_0: (SyS_clone+0x18/0x20 <- _do_fork) pid=0x725
              bash-1671  [001] d…   214.401604: p__do_fork_0: (_do_fork+0x0/0x3e4) arg1=0x1200011 arg2=0x0 arg3=0x0 arg4=0x0 arg5=0xffff78b690d0 arg6=0x0
              bash-1671  [001] d..1   214.401975: r__do_fork_0: (SyS_clone+0x18/0x20 <- _do_fork) pid=0x726_
```
### Example: Dereferencing pointer arguments
For pointer values the kprobe event processing subsystem also allows dereferencing and printing of desired memory contents, for various base data types. It is necessary to manually calculate the offset into structures in order to display a desired field.
Instrumenting the `_do_wait()` function:
```
_$ cat > kprobe_events <<EOF
p:wait_p do_wait wo_type=+0(%x0):u32 wo_flags=+4(%x0):u32
r:wait_r do_wait $retval
EOF
$ echo 1 > events/kprobes/enable_
```
Note that the argument labels used in the first probe are optional and can be used to more clearly identify the information recorded in the trace log. The signed offset and parentheses indicate that the register argument is a pointer to memory contents to be recorded in the trace log. The “_:u32_” indicates that the memory location contains an unsigned four-byte wide datum (an enum and an int in a locally defined structure in this case).
The probe labels (after the colon) are optional and will be used to identify the probe in the log. The label must be unique for each probe. If unspecified a useful label will be automatically generated from a nearby symbol name, as has been shown in earlier examples.
Also note the “_$retval_” argument could just be specified as “_%x0_“.
Here are the contents of the “_trace_” file after two fork syscalls have been made:
```
_$ cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 4/4   #P:8
#
#                              _—=> irqs-off
#                             / _—-=> need-resched
#                            | / _—=> hardirq/softirq
#                            || / _=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
             bash-1702  [001] d…   175.342074: wait_p: (do_wait+0x0/0x260) wo_type=0x3 wo_flags=0xe
             bash-1702  [002] d..1   175.347236: wait_r: (SyS_wait4+0x74/0xe4 <- do_wait) arg1=0x757
             bash-1702  [002] d…   175.347337: wait_p: (do_wait+0x0/0x260) wo_type=0x3 wo_flags=0xf
             bash-1702  [002] d..1   175.347349: wait_r: (SyS_wait4+0x74/0xe4 <- do_wait) arg1=0xfffffffffffffff6_
```
### Example: Probing arbitrary instruction addresses
In previous examples we have inserted probes for function entry and exit, however it is possible to probe an arbitrary instruction (with a few exceptions). If we are placing a probe inside a C function the first step is to look at the assembler version of the code to identify where we want to place the probe. One way to do this is to use gdb on the vmlinux file and display the instructions in the function where you wish to place the probe. An example of doing this for the _module_alloc_ function in arch/arm64/kernel/modules.c follows. In this case, because gdb seems to prefer using the weak symbol definition and its associated stub code for this function, we get the symbol value from System.map instead:
```
_$ grep module_alloc System.map
ffff2000080951c4 T module_alloc
ffff200008297770 T kasan_module_alloc_
```
In this example were using cross-development tools and we invoke gdb on our host system to examine the instructions comprising our function of interest:
```
_$ ${CROSS_COMPILE}gdb vmlinux
(gdb) x/30i 0xffff2000080951c4
        0xffff2000080951c4 <module_alloc>:    sub    sp, sp, #0x30
        0xffff2000080951c8 <module_alloc+4>:    adrp    x3, 0xffff200008d70000
        0xffff2000080951cc <module_alloc+8>:    add    x3, x3, #0x0
        0xffff2000080951d0 <module_alloc+12>:    mov    x5, #0x713             // #1811
        0xffff2000080951d4 <module_alloc+16>:    mov    w4, #0xc0              // #192
        0xffff2000080951d8 <module_alloc+20>:
              mov    x2, #0xfffffffff8000000    // #-134217728
        0xffff2000080951dc <module_alloc+24>:    stp    x29, x30, [sp,#16]         0xffff2000080951e0 <module_alloc+28>:    add    x29, sp, #0x10
        0xffff2000080951e4 <module_alloc+32>:    movk    x5, #0xc8, lsl #48
        0xffff2000080951e8 <module_alloc+36>:    movk    w4, #0x240, lsl #16
        0xffff2000080951ec <module_alloc+40>:    str    x30, [sp]         0xffff2000080951f0 <module_alloc+44>:    mov    w7, #0xffffffff        // #-1
        0xffff2000080951f4 <module_alloc+48>:    mov    x6, #0x0               // #0
        0xffff2000080951f8 <module_alloc+52>:    add    x2, x3, x2
        0xffff2000080951fc <module_alloc+56>:    mov    x1, #0x8000            // #32768
        0xffff200008095200 <module_alloc+60>:    stp    x19, x20, [sp,#32]         0xffff200008095204 <module_alloc+64>:    mov    x20, x0
        0xffff200008095208 <module_alloc+68>:    bl    0xffff2000082737a8 <__vmalloc_node_range>
        0xffff20000809520c <module_alloc+72>:    mov    x19, x0
        0xffff200008095210 <module_alloc+76>:    cbz    x0, 0xffff200008095234 <module_alloc+112>
        0xffff200008095214 <module_alloc+80>:    mov    x1, x20
        0xffff200008095218 <module_alloc+84>:    bl    0xffff200008297770 <kasan_module_alloc>
        0xffff20000809521c <module_alloc+88>:    tbnz    w0, #31, 0xffff20000809524c <module_alloc+136>
        0xffff200008095220 <module_alloc+92>:    mov    sp, x29
        0xffff200008095224 <module_alloc+96>:    mov    x0, x19
        0xffff200008095228 <module_alloc+100>:    ldp    x19, x20, [sp,#16]         0xffff20000809522c <module_alloc+104>:    ldp    x29, x30, [sp],#32
        0xffff200008095230 <module_alloc+108>:    ret
        0xffff200008095234 <module_alloc+112>:    mov    sp, x29
        0xffff200008095238 <module_alloc+116>:    mov    x19, #0x0               // #0_
```
In this case we are going to display the result from the following source line in this function:
```
_p = __vmalloc_node_range(size, MODULE_ALIGN, VMALLOC_START,
VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_EXEC, 0,
NUMA_NO_NODE, __builtin_return_address(0));_
```
…and also the return value from the function call in this line:
```
_if (p && (kasan_module_alloc(p, size) < 0)) {_
```
We can identify these in the assembler code from the call to the external functions. To display these values we will place probes at 0xffff20000809520c _and _0xffff20000809521c on our target system:
```
_$ cat > kprobe_events <<EOF
p 0xffff20000809520c %x0
p 0xffff20000809521c %x0
EOF
$ echo 1 > events/kprobes/enable_
```
Now after plugging an ethernet adapter dongle into the USB port we see the following written into the trace log:
```
_$ cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 12/12   #P:8
#
#                           _—=> irqs-off
#                          / _—-=> need-resched
#                         | / _—=> hardirq/softirq
#                         || / _=> preempt-depth
#                         ||| / delay
#        TASK-PID   CPU#  |||| TIMESTAMP  FUNCTION
#           | |    |   ||||    |      |
      systemd-udevd-2082  [000] d… 77.200991: p_0xffff20000809520c: (module_alloc+0x48/0x98) arg1=0xffff200001188000
      systemd-udevd-2082  [000] d… 77.201059: p_0xffff20000809521c: (module_alloc+0x58/0x98) arg1=0x0
      systemd-udevd-2082  [000] d… 77.201115: p_0xffff20000809520c: (module_alloc+0x48/0x98) arg1=0xffff200001198000
      systemd-udevd-2082  [000] d… 77.201157: p_0xffff20000809521c: (module_alloc+0x58/0x98) arg1=0x0
      systemd-udevd-2082  [000] d… 77.227456: p_0xffff20000809520c: (module_alloc+0x48/0x98) arg1=0xffff2000011a0000
      systemd-udevd-2082  [000] d… 77.227522: p_0xffff20000809521c: (module_alloc+0x58/0x98) arg1=0x0
      systemd-udevd-2082  [000] d… 77.227579: p_0xffff20000809520c: (module_alloc+0x48/0x98) arg1=0xffff2000011b0000
      systemd-udevd-2082  [000] d… 77.227635: p_0xffff20000809521c: (module_alloc+0x58/0x98) arg1=0x0
      modprobe-2097  [002] d… 78.030643: p_0xffff20000809520c: (module_alloc+0x48/0x98) arg1=0xffff2000011b8000
      modprobe-2097  [002] d… 78.030761: p_0xffff20000809521c: (module_alloc+0x58/0x98) arg1=0x0
      modprobe-2097  [002] d… 78.031132: p_0xffff20000809520c: (module_alloc+0x48/0x98) arg1=0xffff200001270000
      modprobe-2097  [002] d… 78.031187: p_0xffff20000809521c: (module_alloc+0x58/0x98) arg1=0x0_
```
One more feature of the kprobes event system is recording of statistics information, which can be found inkprobe_profile.  After the above trace the contents of that file are:
```
_$ cat kprobe_profile
 p_0xffff20000809520c                                    6            0
p_0xffff20000809521c                                    6            0_
```
This indicates that there have been a total of 8 hits each of the two breakpoints we set, which of course is consistent with the trace log data.  More kprobe_profile features are described in the kprobetrace documentation.
There is also the ability to further filter kprobes events.  The debugfs files used to control this are listed in the kprobetrace documentation while the details of their contents are (mostly) described in the trace events documentation.
### Conclusion
Linux on ARMv8 now is on parity with other architectures supporting the kprobes feature. Work is being done by others to also add uprobes and systemtap support. These features/tools and other already completed features (e.g.: perf, coresight) allow the Linux ARMv8 user to debug and test performance as they would on other, older architectures.
* * *
Bibliography
[[1]][5] Jim Keniston, Prasanna S. Panchamukhi, Masami Hiramatsu. “Kernel Probes (Kprobes).” _GitHub_. GitHub, Inc., 15 Aug. 2016\. Web. 13 Dec. 2016.
[[2]][6] Tso, Theodore, Li Zefan, and Tom Zanussi. “Event Tracing.” _GitHub_. GitHub, Inc., 3 Mar. 2016\. Web. 13 Dec. 2016.
[[3]][7] Hiramatsu, Masami. “Kprobe-based Event Tracing.” _GitHub_. GitHub, Inc., 18 Aug. 2016\. Web. 13 Dec. 2016.
----------------
作者简介 [David Long][8]David works as an engineer in the Linaro Kernel - Core Development team. Before coming to Linaro he spent several years in the commercial and defense industries doing both embedded realtime work, and software development tools for Unix. That was followed by a dozen years at Digital (aka Compaq) doing Unix standards, C compiler, and runtime library work. After that David went to a series of startups doing embedded Linux and Android, embedded custom OS's, and Xen virtualization. He has experience with MIPS, Alpha, and ARM platforms (amongst others). He has used most flavors of Unix starting in 1979 with Bell Labs V6, and has been a long-time Linux user and advocate. He has also occasionally been known to debug a device driver with a soldering iron and digital oscilloscope.
--------------------------------------------------------------------------------
via: http://www.linaro.org/blog/kprobes-event-tracing-armv8/
作者:[ David Long][a]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:http://www.linaro.org/author/david-long/
[1]:http://www.linaro.org/blog/kprobes-event-tracing-armv8/#
[2]:https://github.com/torvalds/linux/blob/master/Documentation/kprobes.txt
[3]:https://github.com/torvalds/linux/blob/master/Documentation/trace/events.txt
[4]:https://github.com/torvalds/linux/blob/master/Documentation/trace/kprobetrace.txt
[5]:https://github.com/torvalds/linux/blob/master/Documentation/kprobes.txt
[6]:https://github.com/torvalds/linux/blob/master/Documentation/trace/events.txt
[7]:https://github.com/torvalds/linux/blob/master/Documentation/trace/kprobetrace.txt
[8]:http://www.linaro.org/author/david-long/
[9]:http://www.linaro.org/blog/kprobes-event-tracing-armv8/#comments
[10]:http://www.linaro.org/blog/kprobes-event-tracing-armv8/#
[11]:http://www.linaro.org/tag/arm64/
[12]:http://www.linaro.org/tag/armv8/
[13]:http://www.linaro.org/tag/jprobes/
[14]:http://www.linaro.org/tag/kernel/
[15]:http://www.linaro.org/tag/kprobes/
[16]:http://www.linaro.org/tag/kretprobes/
[17]:http://www.linaro.org/tag/perf/
[18]:http://www.linaro.org/tag/tracing/

View File

@ -1,331 +0,0 @@
zpl1025
How to take screenshots on Linux using Scrot
============================================================
### On this page
1. [About Scrot][12]
2. [Scrot Installation][13]
3. [Scrot Usage/Features][14]
1. [Get the application version][1]
2. [Capturing current window][2]
3. [Selecting a window][3]
4. [Include window border in screenshots][4]
5. [Delay in taking screenshots][5]
6. [Countdown before screenshot][6]
7. [Image quality][7]
8. [Generating thumbnails][8]
9. [Join multiple displays shots][9]
10. [Executing operations on saved images][10]
11. [Special strings][11]
4. [Conclusion][15]
Recently, we discussed about the [gnome-screenshot][17] utility, which is a good screen grabbing tool. But if you are looking for an even better command line utility for taking screenshots, then you must give Scrot a try. This tool has some extra features that are currently not available in gnome-screenshot. In this tutorial, we will explain Scrot using easy to understand examples.
Please note that all the examples mentioned in this tutorial have been tested on Ubuntu 16.04 LTS, and the scrot version we have used is 0.8.
### About Scrot
[Scrot][18] (**SCR**eensh**OT**) is a screenshot capturing utility that uses the imlib2 library to acquire and save images. Developed by Tom Gilbert, it's written in C programming language and is licensed under the BSD License.
### Scrot Installation
The scrot tool may be pre-installed on your Ubuntu system, but if that's not the case, then you can install it using the following command:
sudo apt-get install scrot
Once the tool is installed, you can launch it by using the following command:
scrot [options] [filename]
**Note**: The parameters in [] are optional.
### Scrot Usage/Features
In this section, we will discuss how the Scrot tool can be used and what all features it provides.
When the tool is run without any command line options, it captures the whole screen.
[
![Using Scrot](https://www.howtoforge.com/images/how-to-take-screenshots-in-linux-with-scrot/scrot.png)
][19]
By default, the captured file is saved with a date-stamped filename in the current directory, although you can also explicitly specify the name of the captured image when the command is run. For example:
scrot [image-name].png
### Get the application version
If you want, you can check the version of scrot using the -v command line option.
scrot -v
Here is an example:
[
![Get scrot version](https://www.howtoforge.com/images/how-to-take-screenshots-in-linux-with-scrot/version.png)
][20]
### Capturing current window
Using the utility, you can limit the screenshot to the currently focused window. This feature can be accessed using the -u command line option.
scrot -u
For example, here's my desktop when I executed the above command on the command line:
[
![capture window in scrot](https://www.howtoforge.com/images/how-to-take-screenshots-in-linux-with-scrot/desktop.png)
][21]
And here's the screenshot captured by scrot: 
[
![Screenshot captured by scrot](https://www.howtoforge.com/images/how-to-take-screenshots-in-linux-with-scrot/active.png)
][22]
### Selecting a window
The utility allows you to capture any window by clicking on it using the mouse. This feature can be accessed using the -s option.
scrot -s
For example, as you can see in the screenshot below, I have a screen with two terminal windows overlapping each other. On the top window, I run the aforementioned command.
[
![select window](https://www.howtoforge.com/images/how-to-take-screenshots-in-linux-with-scrot/select1.png)
][23]
Now suppose, I want to capture the bottom terminal window. For that, I will just click on that window once the command is executed - the command execution won't complete until you click somewhere on the screen.
Here's the screenshot captured after clicking on that terminal:
[
![window screenshot captured](https://www.howtoforge.com/images/how-to-take-screenshots-in-linux-with-scrot/select2.png)
][24]
**Note**: As you can see in the above snapshot, whatever area the bottom window is covering has been captured, even if that includes an overlapping portion of the top window.
### Include window border in screenshots
The -u command line option we discussed earlier doesn't include the window border in screenshots. However, you can include the border of the window if you want. This feature can be accessed using the -b option (in conjunction with the -u option of course).
scrot -ub
Here is an example screenshot:
[
![include window border in screenshot](https://www.howtoforge.com/images/how-to-take-screenshots-in-linux-with-scrot/border-new.png)
][25]
**Note**: Including window border also adds some of the background area to the screenshot.
### Delay in taking screenshots
You can introduce a time delay while taking screenshots. For this, you have to assign a numeric value to the --delay or -d command line option.
scrot --delay [NUM]
scrot --delay 5
Here is an example:
[
![delay taking screenshot](https://www.howtoforge.com/images/how-to-take-screenshots-in-linux-with-scrot/delay.png)
][26]
In this case, scrot will wait for 5 seconds and then take the screenshot.
### Countdown before screenshot
The tool also allows you to display countdown while using delay option. This feature can be accessed using the -c command line option.
scrot delay [NUM] -c
scrot -d 5 -c
Here is an example screenshot:
[
![example delayed screenshot](https://www.howtoforge.com/images/how-to-take-screenshots-in-linux-with-scrot/countdown.png)
][27]
### Image quality
Using the tool, you can adjust the quality of the screenshot image at the scale of 1-100\. High value means high size and low compression. Default value is 75, although effect differs depending on the file format chosen.
This feature can be accessed using --quality or -q option, but you have to assign a numeric value to this option ranging from 1-100.
scrot quality [NUM]
scrot quality 10
Here is an example snapshot:
[
![snapshot quality](https://www.howtoforge.com/images/how-to-take-screenshots-in-linux-with-scrot/img-quality.jpg)
][28]
So you can see that the quality of the image degrades a lot as the -q option is assigned value closer to 1.
### Generating thumbnails
The scrot utility also allows you to generate thumbnail of the screenshot. This feature can be accessed using the --thumb option. This option requires a NUM value, which is basically the percentage of the original screenshot size.
scrot --thumb NUM
scrot --thumb 50
**Note**: The --thumb option makes sure that the screenshot is captured and saved in original size as well.
For example, here is the original screenshot captured in my case:
[
![Original screenshot](https://www.howtoforge.com/images/how-to-take-screenshots-in-linux-with-scrot/orig.png)
][29]
And following is the thumbnail saved:
[
![thumbnail of the screenshot](https://www.howtoforge.com/images/how-to-take-screenshots-in-linux-with-scrot/thmb.png)
][30]
### Join multiple displays shots
In case your machine has multiple displays attached to it, scrot allows you to grab and join screenshots of these displays. This feature can be accessed using the -m command line option. 
scrot -m
Here is an example snapshot:
[
![Join screenshots](https://www.howtoforge.com/images/how-to-take-screenshots-in-linux-with-scrot/multiple.png)
][31]
### Executing operations on saved images
Using the tool, we can execute various operations on saved images - for example, open the screenshot in an image editor like gThumb. This feature can be accessed using the -e command line option. Here's an example:
scrot abc.png -e gthumb abc.png
Here, gthumb is an image editor which will automatically launch after we run the command.
Following is the snapshot of the command:
[
![Execute commands on screenshots](https://www.howtoforge.com/images/how-to-take-screenshots-in-linux-with-scrot/exec1.png)
][32]
And here is the output of the above command:
[
![esample screenshot](https://www.howtoforge.com/images/how-to-take-screenshots-in-linux-with-scrot/exec2.png)
][33]
So you can see that the scrot command grabbed the screenshot and then launched the gThumb image editor with the captured image as argument.
If you dont specify a filename to your screenshot, then the snapshot will be saved with a date-stamped filename in your current directory - this, as we've already mentioned in the beginning, is the default behaviour of scrot.
Here's an -e command line option example where scrot uses the default name for the screenshot: 
scrot -e gthumb $n
[
![scrot running gthumb](https://www.howtoforge.com/images/how-to-take-screenshots-in-linux-with-scrot/exec3.png)
][34]
It's worth mentioning that $n is a special string, which provides access to the screenshot name. For more details on special strings, head to the next section.
### Special strings
The -e (or the --exec ) and filename parameters can take format specifiers when used with scrot. There are two types of format specifiers. First type is characters preceded by % that are used for date and time formats, while the second type is internal to scrot and are prefixed by $
Several specifiers which are recognised by the --exec and filename parameters are discussed below.
**$f**  provides access to screenshot path (including filename).
For example,
scrot ashu.jpg -e mv $f ~/Pictures/Scrot/ashish/
Here is an example snapshot:
[
![example](https://www.howtoforge.com/images/how-to-take-screenshots-in-linux-with-scrot/f.png)
][35]
If you will not specify a filename, then scrot will by-default save the snapshot in a date stamped file format. This is the by-default date-stamped file format used in scrot : %yy-%mm-%dd-%hhmmss_$wx$h_scrot.png.
**$n**  provides snapshot name. Here is an example snapshot:
[
![scrot $n variable](https://www.howtoforge.com/images/how-to-take-screenshots-in-linux-with-scrot/n.png)
][36]
**$s**  gives access to the size of screenshot. This feature, for example, can be accessed in the following way.
scrot abc.jpg -e echo $s
Here is an example snapshot
[
![scrot $s variable](https://www.howtoforge.com/images/how-to-take-screenshots-in-linux-with-scrot/s.png)
][37]
Similarly, you can use the other special strings **$p**, **$w**, **$h**, **$t**, **$$** and **\n** that provide access to image pixel size, image width, image height, image format, $ symbol, and give access to new line respectively. You can, for example, use these strings in the way similar to the **$s** example we have discussed above.
### Conclusion
The utility is easy to install on Ubuntu systems, which is good for beginners. Scrot also provides some advanced features such as special strings that can be used in scripting by professionals. Needless to say, there is a slight learning curve associated in case you want to use them.
![](https://www.howtoforge.com/images/pdficon_small.png)
 [vie][16]
--------------------------------------------------------------------------------
via: https://www.howtoforge.com/tutorial/how-to-take-screenshots-in-linux-with-scrot/
作者:[Himanshu Arora][a]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://www.howtoforge.com/tutorial/how-to-take-screenshots-in-linux-with-scrot/
[1]:https://www.howtoforge.com/tutorial/how-to-take-screenshots-in-linux-with-scrot/#get-the-applicationnbspversion
[2]:https://www.howtoforge.com/tutorial/how-to-take-screenshots-in-linux-with-scrot/#capturing-current-window
[3]:https://www.howtoforge.com/tutorial/how-to-take-screenshots-in-linux-with-scrot/#selecting-a-window
[4]:https://www.howtoforge.com/tutorial/how-to-take-screenshots-in-linux-with-scrot/#includenbspwindow-border-in-screenshots
[5]:https://www.howtoforge.com/tutorial/how-to-take-screenshots-in-linux-with-scrot/#delay-in-taking-screenshots
[6]:https://www.howtoforge.com/tutorial/how-to-take-screenshots-in-linux-with-scrot/#countdown-before-screenshot
[7]:https://www.howtoforge.com/tutorial/how-to-take-screenshots-in-linux-with-scrot/#image-quality
[8]:https://www.howtoforge.com/tutorial/how-to-take-screenshots-in-linux-with-scrot/#generating-thumbnails
[9]:https://www.howtoforge.com/tutorial/how-to-take-screenshots-in-linux-with-scrot/#join-multiple-displays-shots
[10]:https://www.howtoforge.com/tutorial/how-to-take-screenshots-in-linux-with-scrot/#executing-operations-on-saved-images
[11]:https://www.howtoforge.com/tutorial/how-to-take-screenshots-in-linux-with-scrot/#special-strings
[12]:https://www.howtoforge.com/tutorial/how-to-take-screenshots-in-linux-with-scrot/#about-scrot
[13]:https://www.howtoforge.com/tutorial/how-to-take-screenshots-in-linux-with-scrot/#scrot-installation
[14]:https://www.howtoforge.com/tutorial/how-to-take-screenshots-in-linux-with-scrot/#scrot-usagefeatures
[15]:https://www.howtoforge.com/tutorial/how-to-take-screenshots-in-linux-with-scrot/#conclusion
[16]:https://www.howtoforge.com/subscription/
[17]:https://www.howtoforge.com/tutorial/taking-screenshots-in-linux-using-gnome-screenshot/
[18]:https://en.wikipedia.org/wiki/Scrot
[19]:https://www.howtoforge.com/images/how-to-take-screenshots-in-linux-with-scrot/big/scrot.png
[20]:https://www.howtoforge.com/images/how-to-take-screenshots-in-linux-with-scrot/big/version.png
[21]:https://www.howtoforge.com/images/how-to-take-screenshots-in-linux-with-scrot/big/desktop.png
[22]:https://www.howtoforge.com/images/how-to-take-screenshots-in-linux-with-scrot/big/active.png
[23]:https://www.howtoforge.com/images/how-to-take-screenshots-in-linux-with-scrot/big/select1.png
[24]:https://www.howtoforge.com/images/how-to-take-screenshots-in-linux-with-scrot/big/select2.png
[25]:https://www.howtoforge.com/images/how-to-take-screenshots-in-linux-with-scrot/big/border-new.png
[26]:https://www.howtoforge.com/images/how-to-take-screenshots-in-linux-with-scrot/big/delay.png
[27]:https://www.howtoforge.com/images/how-to-take-screenshots-in-linux-with-scrot/big/countdown.png
[28]:https://www.howtoforge.com/images/how-to-take-screenshots-in-linux-with-scrot/big/img-quality.jpg
[29]:https://www.howtoforge.com/images/how-to-take-screenshots-in-linux-with-scrot/big/orig.png
[30]:https://www.howtoforge.com/images/how-to-take-screenshots-in-linux-with-scrot/big/thmb.png
[31]:https://www.howtoforge.com/images/how-to-take-screenshots-in-linux-with-scrot/big/multiple.png
[32]:https://www.howtoforge.com/images/how-to-take-screenshots-in-linux-with-scrot/big/exec1.png
[33]:https://www.howtoforge.com/images/how-to-take-screenshots-in-linux-with-scrot/big/exec2.png
[34]:https://www.howtoforge.com/images/how-to-take-screenshots-in-linux-with-scrot/big/exec3.png
[35]:https://www.howtoforge.com/images/how-to-take-screenshots-in-linux-with-scrot/big/f.png
[36]:https://www.howtoforge.com/images/how-to-take-screenshots-in-linux-with-scrot/big/n.png
[37]:https://www.howtoforge.com/images/how-to-take-screenshots-in-linux-with-scrot/big/s.png

View File

@ -1,3 +1,6 @@
cielong translating
----
Writing a Time Series Database from Scratch
============================================================

View File

@ -1,84 +0,0 @@
translating---geekpi
# Understanding Docker "Container Host" vs. "Container OS" for Linux and Windows Containers
Lets explore the relationship between the “Container Host” and the “Container OS” and how they differ between Linux and Windows containers.
#### Some Definitions:
* Container Host: Also called the Host OS. The Host OS is the operating system on which the Docker client and Docker daemon run. In the case of Linux and non-Hyper-V containers, the Host OS shares its kernel with running Docker containers. For Hyper-V each container has its own Hyper-V kernel.
* Container OS: Also called the Base OS. The base OS refers to an image that contains an operating system such as Ubuntu, CentOS, or windowsservercore. Typically, you would build your own image on top of a Base OS image so that you can take utilize parts of the OS. Note that windows containers require a Base OS, while Linux containers do not.
* Operating System Kernel: The Kernel manages lower level functions such as memory management, file system, network and process scheduling.
#### Now for some pictures:
![Linux Containers](http://floydhilton.com/images/2017/03/2017-03-31_14_50_13-Radom%20Learnings,%20Online%20Whiteboard%20for%20Visual%20Collaboration.png)
In the above example
* The Host OS is Ubuntu.
* The Docker Client and the Docker Daemon (together called the Docker Engine) are running on the Host OS.
* Each container shares the Host OS kernel.
* CentOS and BusyBox are Linux Base OS images.
* The “No OS” container demonstrates that you do not NEED a base OS to run a container in Linux. You can create a Docker file that has a base image of [scratch][1]and then runs a binary that uses the kernel directly.
* Check out [this][2] article for a comparison of Base OS sizes.
![Windows Containers - Non Hyper-V](http://floydhilton.com/images/2017/03/2017-03-31_15_04_03-Radom%20Learnings,%20Online%20Whiteboard%20for%20Visual%20Collaboration.png)
In the above example
* The Host OS is Windows 10 or Windows Server.
* Each container shares the Host OS kernel.
* All windows containers require a Base OS of either [nanoserver][3] or [windowsservercore][4].
![Windows Containers - Hyper-V](http://floydhilton.com/images/2017/03/2017-03-31_15_41_31-Radom%20Learnings,%20Online%20Whiteboard%20for%20Visual%20Collaboration.png)
In the above example
* The Host OS is Windows 10 or Windows Server.
* Each container is hosted in its own light weight Hyper-V VM.
* Each container uses the kernel inside the Hyper-V VM which provides an extra layer of separation between containers.
* All windows containers require a Base OS of either [nanoserver][5] or [windowsservercore][6].
A couple of good links:
* [About windows containers][7]
* [Deep dive into the implementation Windows Containers including multiple User Modes and “copy-on-write” to save resources][8]
* [How Linux containers save resources by using “copy-on-write”][9]
--------------------------------------------------------------------------------
via: http://floydhilton.com/docker/2017/03/31/Docker-ContainerHost-vs-ContainerOS-Linux-Windows.html
作者:[Floyd Hilton ][a]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:http://floydhilton.com/about/
[1]:https://hub.docker.com/_/scratch/
[2]:https://www.brianchristner.io/docker-image-base-os-size-comparison/
[3]:https://hub.docker.com/r/microsoft/nanoserver/
[4]:https://hub.docker.com/r/microsoft/windowsservercore/
[5]:https://hub.docker.com/r/microsoft/nanoserver/
[6]:https://hub.docker.com/r/microsoft/windowsservercore/
[7]:https://docs.microsoft.com/en-us/virtualization/windowscontainers/about/
[8]:http://blog.xebia.com/deep-dive-into-windows-server-containers-and-docker-part-2-underlying-implementation-of-windows-server-containers/
[9]:https://docs.docker.com/engine/userguide/storagedriver/imagesandcontainers/#the-copy-on-write-strategy

View File

@ -1,4 +1,6 @@
Translating by Snapcrafter
Translating by yongshouzhang
A user's guide to links in the Linux filesystem
============================================================

View File

@ -1,3 +1,5 @@
Translating by lonaparte
Lessons from my first year of live coding on Twitch
============================================================

View File

@ -1,4 +1,5 @@
> translating by rieonke
translating by liuxinyu123
Containing System Services in Red Hat Enterprise Linux Part 1
============================================================

View File

@ -1,215 +0,0 @@
translating by penghuster
Designing a Microservices Architecture for Failure
============================================================ 
A Microservices architecture makes it possible to **isolate failures**through well-defined service boundaries. But like in every distributed system, there is a **higher chance** for network, hardware or application level issues. As a consequence of service dependencies, any component can be temporarily unavailable for their consumers. To minimize the impact of partial outages we need to build fault tolerant services that can **gracefully** respond to certain types of outages.
This article introduces the most common techniques and architecture patterns to build and operate a **highly available microservices** system based on [RisingStacks Node.js Consulting & Development experience][3].
_If you are not familiar with the patterns in this article, it doesnt necessarily mean that you do something wrong. Building a reliable system always comes with an extra cost._
### The Risk of the Microservices Architecture
The microservices architecture moves application logic to services and uses a network layer to communicate between them. Communicating over a network instead of in-memory calls brings extra latency and complexity to the system which requires cooperation between multiple physical and logical components. The increased complexity of the distributed system leads to a higher chance of particular **network failures**.
One of the biggest advantage of a microservices architecture over a monolithic one is that teams can independently design, develop and deploy their services. They have full ownership over their service's lifecycle. It also means that teams have no control over their service dependencies as it's more likely managed by a different team. With a microservices architecture, we need to keep in mind that provider **services can be temporarily unavailable** by broken releases, configurations, and other changes as they are controlled by someone else and components move independently from each other.
### Graceful Service Degradation
One of the best advantages of a microservices architecture is that you can isolate failures and achieve graceful service degradation as components fail separately. For example, during an outage customers in a photo sharing application maybe cannot upload a new picture, but they can still browse, edit and share their existing photos.
![Microservices fail separately in theory](https://blog-assets.risingstack.com/2017/08/microservices-fail-separately-in-theory.png)
_Microservices fail separately (in theory)_
In most of the cases, it's hard to implement this kind of graceful service degradation as applications in a distributed system depend on each other, and you need to apply several failover logics  _(some of them will be covered by this article later)_  to prepare for temporary glitches and outages.
![Microservices Depend on Each Other](https://blog-assets.risingstack.com/2017/08/Microservices-depend-on-each-other.png)
_Services depend on each other and fail together without failover logics._
### Change management
Googles site reliability team has found that roughly **70% of the outages are caused by changes** in a live system. When you change something in your service - you deploy a new version of your code or change some configuration - there is always a chance for failure or the introduction of a new bug.
In a microservices architecture, services depend on each other. This is why you should minimize failures and limit their negative effect. To deal with issues from changes, you can implement change management strategies and **automatic rollouts**.
For example, when you deploy new code, or you change some configuration, you should apply these changes to a subset of your instances gradually, monitor them and even automatically revert the deployment if you see that it has a negative effect on your key metrics.
![Microservices Change Management](https://blog-assets.risingstack.com/2017/08/microservices-change-management.png)
_Change Management - Rolling Deployment_
Another solution could be that you run two production environments. You always deploy to only one of them, and you only point your load balancer to the new one after you verified that the new version works as it is expected. This is called blue-green, or red-black deployment.
**Reverting code is not a bad thing.** You shouldnt leave broken code in production and then think about what went wrong. Always revert your changes when its necessary. The sooner the better.
#### Want to learn more about building reliable mircoservices architectures?
##### Check out our upcoming trainings!
[MICROSERVICES TRAININGS ][4]
### Health-check and Load Balancing
Instances continuously start, restart and stop because of failures, deployments or autoscaling. It makes them temporarily or permanently unavailable. To avoid issues, your load balancer should **skip unhealthy instances** from the routing as they cannot serve your customers' or sub-systems' need.
Application instance health can be determined via external observation. You can do it with repeatedly calling a `GET /health`endpoint or via self-reporting. Modern **service discovery** solutions continuously collect health information from instances and configure the load-balancer to route traffic only to healthy components.
### Self-healing
Self-healing can help to recover an application. We can talk about self-healing when an application can **do the necessary steps** to recover from a broken state. In most of the cases, it is implemented by an external system that watches the instances health and restarts them when they are in a broken state for a longer period. Self-healing can be very useful in most of the cases, however, in certain situations it **can cause trouble** by continuously restarting the application. This might happen when your application cannot give positive health status because it is overloaded or its database connection times out.
Implementing an advanced self-healing solution which is prepared for a delicate situation - like a lost database connection - can be tricky. In this case, you need to add extra logic to your application to handle edge cases and let the external system know that the instance is not needed to restart immediately.
### Failover Caching
Services usually fail because of network issues and changes in our system. However, most of these outages are temporary thanks to self-healing and advanced load-balancing we should find a solution to make our service work during these glitches. This is where **failover caching** can help and provide the necessary data to our application.
Failover caches usually use **two different expiration dates**; a shorter that tells how long you can use the cache in a normal situation, and a longer one that says how long can you use the cached data during failure.
![Microservices Failover Caching](https://blog-assets.risingstack.com/2017/08/microservices-failover-caching.png)
_Failover Caching_
Its important to mention that you can only use failover caching when it serves **the outdated data better than nothing**.
To set cache and failover cache, you can use standard response headers in HTTP.
For example, with the `max-age` header you can specify the maximum amount of time a resource will be considered fresh. With the `stale-if-error` header, you can determine how long should the resource be served from a cache in the case of a failure.
Modern CDNs and load balancers provide various caching and failover behaviors, but you can also create a shared library for your company that contains standard reliability solutions.
### Retry Logic
There are certain situations when we cannot cache our data or we want to make changes to it, but our operations eventually fail. In these cases, we can **retry our action** as we can expect that the resource will recover after some time or our load-balancer sends our request to a healthy instance.
You should be careful with adding retry logic to your applications and clients, as a larger amount of **retries can make things even worse** or even prevent the application from recovering.
In distributed system, a microservices system retry can trigger multiple other requests or retries and start a **cascading effect**. To minimize the impact of retries, you should limit the number of them and use an exponential backoff algorithm to continually increase the delay between retries until you reach the maximum limit.
As a retry is initiated by the client  _(browser, other microservices, etc.)_ and the client doesn't know that the operation failed before or after handling the request, you should prepare your application to handle **idempotency**. For example, when you retry a purchase operation, you shouldn't double charge the customer. Using a unique **idempotency-key** for each of your transactions can help to handle retries.
### Rate Limiters and Load Shedders
Rate limiting is the technique of defining how many requests can be received or processed by a particular customer or application during a timeframe. With rate limiting, for example, you can filter out customers and microservices who are responsible for **traffic peaks**, or you can ensure that your application doesnt overload until autoscaling cant come to rescue.
You can also hold back lower-priority traffic to give enough resources to critical transactions.
![Microservices Rate Limiter](https://blog-assets.risingstack.com/2017/08/microservices-rate-limiter.png)
_A rate limiter can hold back traffic peaks_
A different type of rate limiter is called the  _concurrent request limiter_ . It can be useful when you have expensive endpoints that shouldnt be called more than a specified times, while you still want to serve traffic.
A  _fleet usage load shedder_  can ensure that there are always enough resources available to **serve critical transactions**. It keeps some resources for high priority requests and doesnt allow for low priority transactions to use all of them. A load shedder makes its decisions based on the whole state of the system, rather than based on a single users request bucket size. Load shedders **help your system to recover**, since they keep the core functionalities working while you have an ongoing incident.
To read more about rate limiters and load shredders, I recommend checking out [Stripes article][5].
### Fail Fast and Independently
In a microservices architecture we want to prepare our services **to fail fast and separately**. To isolate issues on service level, we can use the  _bulkhead pattern_ . You can read more about bulkheads later in this blog post.
We also want our components to **fail fast** as we don't want to wait for broken instances until they timeout. Nothing is more disappointing than a hanging request and an unresponsive UI. It's not just wasting resources but also screwing up the user experience. Our services are calling each other in a chain, so we should pay an extra attention to prevent hanging operations before these delays sum up.
The first idea that would come to your mind would be applying fine grade timeouts for each service calls. The problem with this approach is that you cannot really know what's a good timeout value as there are certain situations when network glitches and other issues happen that only affect one-two operations. In this case, you probably dont want to reject those requests if theres only a few of them timeouts.
We can say that achieving the fail fast paradigm in microservices by **using timeouts is an anti-pattern** and you should avoid it. Instead of timeouts, you can apply the  _circuit-breaker_  pattern that depends on the success / fail statistics of operations.
#### Want to learn more about building reliable mircoservices architectures?
##### Check out our upcoming trainings!
[MICROSERVICES TRAININGS ][6]
### Bulkheads
Bulkhead is used in the industry to **partition** a ship **into sections**, so that sections can be sealed off if there is a hull breach.
The concept of bulkheads can be applied in software development to **segregate resources**.
By applying the bulkheads pattern, we can **protect limited resources** from being exhausted. For example, we can use two connection pools instead of a shared on if we have two kinds of operations that communicate with the same database instance where we have limited number of connections. As a result of this client - resource separation, the operation that timeouts or overuses the pool won't bring all of the other operations down.
One of the main reasons why Titanic sunk was that its bulkheads had a design failure, and the water could pour over the top of the bulkheads via the deck above and flood the entire hull.
![Titanic Microservices Bulkheads](https://blog-assets.risingstack.com/2017/08/titanic-bulkhead-microservices.png)
_Bulkheads in Titanic (they didn't work)_
### Circuit Breakers
To limit the duration of operations, we can use timeouts. Timeouts can prevent hanging operations and keep the system responsive. However, using static, fine tuned timeouts in microservices communication is an **anti-pattern** as were in a highly dynamic environment where it's almost impossible to come up with the right timing limitations that work well in every case.
Instead of using small and transaction-specific static timeouts, we can use circuit breakers to deal with errors. Circuit breakers are named after the real world electronic component because their behavior is identical. You can **protect resources** and **help them to recover** with circuit breakers. They can be very useful in a distributed system where a repetitive failure can lead to a snowball effect and bring the whole system down.
A circuit breaker opens when a particular type of **error occurs multiple times** in a short period. An open circuit breaker prevents further requests to be made - like the real one prevents electrons from flowing. Circuit breakers usually close after a certain amount of time, giving enough space for underlying services to recover.
Keep in mind that not all errors should trigger a circuit breaker. For example, you probably want to skip client side issues like requests with `4xx` response codes, but include `5xx` server-side failures. Some circuit breakers can have a half-open state as well. In this state, the service sends the first request to check system availability, while letting the other requests to fail. If this first request succeeds, it restores the circuit breaker to a closed state and lets the traffic flow. Otherwise, it keeps it open.
![Microservices Circuit Breakers](https://blog-assets.risingstack.com/2017/08/microservices-circuit-breakers.png)
_Circuit Breaker_
### Testing for Failures
You should continually **test your system against common issues** to make sure that your services can **survive various failures**. You should test for failures frequently to keep your team prepared for incidents.
For testing, you can use an external service that identifies groups of instances and randomly terminates one of the instances in this group. With this, you can prepare for a single instance failure, but you can even shut down entire regions to simulate a cloud provider outage.
One of the most popular testing solutions is the [ChaosMonkey][7]resiliency tool by Netflix.
### Outro
Implementing and running a reliable service is not easy. It takes a lot of effort from your side and also costs money to your company.
Reliability has many levels and aspects, so it is important to find the best solution for your team. You should make reliability a factor in your business decision processes and allocate enough budget and time for it.
### Key Takeways
* Dynamic environments and distributed systems - like microservices - lead to a higher chance of failures.
* Services should fail separately, achieve graceful degradation to improve user experience.
* 70% of the outages are caused by changes, reverting code is not a bad thing.
* Fail fast and independently. Teams have no control over their service dependencies.
* Architectural patterns and techniques like caching, bulkheads, circuit breakers and rate-limiters help to build reliable microservices.
To learn more about running a reliable service check out our free [Node.js Monitoring, Alerting & Reliability 101 e-book][8]. In case you need help with implementing a microservices system, reach out to us at [@RisingStack][9] on Twitter, or enroll in our upcoming [Building Microservices with Node.js][10].
-------------
作者简介
[Péter Márton][2]
CTO at RisingStack, microservices and brewing beer with Node.js
[https://twitter.com/slashdotpeter][1]</footer>
--------------------------------------------------------------------------------
via: https://blog.risingstack.com/designing-microservices-architecture-for-failure/
作者:[ Péter Márton][a]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://blog.risingstack.com/author/peter-marton/
[1]:https://twitter.com/slashdotpeter
[2]:https://blog.risingstack.com/author/peter-marton/
[3]:https://risingstack.com/
[4]:https://blog.risingstack.com/training-building-microservices-node-js/?utm_source=rsblog&utm_medium=roadblock-new&utm_content=/designing-microservices-architecture-for-failure/
[5]:https://stripe.com/blog/rate-limiters
[6]:https://blog.risingstack.com/training-building-microservices-node-js/?utm_source=rsblog&utm_medium=roadblock-new
[7]:https://github.com/Netflix/chaosmonkey
[8]:https://trace.risingstack.com/monitoring-ebook
[9]:https://twitter.com/RisingStack
[10]:https://blog.risingstack.com/training-building-microservices-node-js/
[11]:https://blog.risingstack.com/author/peter-marton/

View File

@ -1,4 +1,5 @@
translate by hwlife
voidpainter is translating
---
[Betting on the Web][27]
============================================================

View File

@ -1,6 +1,8 @@
Create a Clean-Code App with Kotlin Coroutines and Android Architecture Components
============================================================
Translating by S9mtAt
### Full demo weather app included.
Android development is evolving fast. A lot of developers and companies are trying to address common problems and create some great tools or libraries that can totally change the way we structure our apps.

View File

@ -1,517 +0,0 @@
"Translating by syys96"
How to Install Software from Source Code… and Remove it Afterwards
============================================================
![How to install software from source code](https://itsfoss.com/wp-content/uploads/2017/10/install-software-from-source-code-linux-800x450.jpg)
_Brief: This detailed guide explains how to install a program from source code in Linux and how to remove the software installed from the source code._
One of the greatest strength of your Linux distribution is its package manager and the associated software repository. With them, you have all the necessary tools and resources to download and install a new software on your computer in a completely automated manner.
But despite all their efforts, the package maintainers cannot handle each and every use cases. Nor can they package all the software available out there. So there are still situations where you will have to compile and install a new software by yourself. As of myself, the most common reason, by far, I have to compile some software is when I need to run a very specific version. Or because I want to modify the source code or use some fancy compilation options.
If your needs belong to that latter category, there are chances you already know what you do. But for the vast majority of Linux users, compiling and installing a software from the sources for the first time might look like an initiation ceremony: somewhat frightening; but with the promise to enter a new world of possibilities and to be part of a privileged community if you overcome that.
[Suggested readHow To Install And Remove Software In Ubuntu [Complete Guide]][8]
### A. Installing software from source code in Linux
And thats exactly what we will do here. For the purpose of that article, lets say I need to install [NodeJS][9] 8.1.1 on my system. That version exactly. A version which is not available from the Debian repository:
```
sh$ apt-cache madison nodejs | grep amd64
nodejs | 6.11.1~dfsg-1 | http://deb.debian.org/debian experimental/main amd64 Packages
nodejs | 4.8.2~dfsg-1 | http://ftp.fr.debian.org/debian stretch/main amd64 Packages
nodejs | 4.8.2~dfsg-1~bpo8+1 | http://ftp.fr.debian.org/debian jessie-backports/main amd64 Packages
nodejs | 0.10.29~dfsg-2 | http://ftp.fr.debian.org/debian jessie/main amd64 Packages
nodejs | 0.10.29~dfsg-1~bpo70+1 | http://ftp.fr.debian.org/debian wheezy-backports/main amd64 Packages
```
### Step 1: Getting the source code from GitHub
Like many open-source projects, the sources of NodeJS can be found on GitHub: [https://github.com/nodejs/node][10]
So, lets go directly there.
![The NodeJS official GitHub repository](https://itsfoss.com/wp-content/uploads/2017/07/nodejs-github-account.png)
If youre not familiar with [GitHub][11], [git][12] or any other [version control system][13] worth mentioning the repository contains the current source for the software, as well as a history of all the modifications made through the years to that software. Eventually up to the very first line written for that project. For the developers, keeping that history has many advantages. For us today, the main one is we will be able to get the sources from for the project as they were at any given point in time. More precisely, I will be able to get the sources as they were when the 8.1.1 version I want was released. Even if there were many modifications since then.
![Choose the v8.1.1 tag in the NodeJS GitHub repository](https://itsfoss.com/wp-content/uploads/2017/07/nodejs-github-choose-revision-tag.png)
On GitHub, you can use the “branch” button to navigate between different versions of the software. [“Branch” and “tags” are somewhat related concepts in Git][14]. Basically, the developers create “branch” and “tags” to keep track of important events in the project history, like when they start working on a new feature or when they publish a release. I will not go into the details here, all you need to know is Im looking for the version  _tagged_  “v8.1.1”
![The NodeJS GitHub repository as it was at the time the v8.1.1 tag was created](https://itsfoss.com/wp-content/uploads/2017/07/nodejs-github-revision-811.png)
After having chosen on the “v8.1.1” tag, the page is refreshed, the most obvious change being the tag now appears as part of the URL. In addition, you will notice the file change date are different too. The source tree you are now seeing is the one that existed at the time the v8.1.1 tag was created. In some sense, you can think of a version control tool like git as a time travel machine, allowing you to go back and forth into a project history.
![NodeJS GitHub repository download as a ZIP button](https://itsfoss.com/wp-content/uploads/2017/07/nodejs-github-revision-download-zip.png)
At this point, we can download the sources of NodeJS 8.1.1\. You cant miss the big blue button suggesting to download the ZIP archive of the project. As of myself, I will download and extract the ZIP from the command line for the sake of the explanation. But if you prefer using a [GUI][15] tool, dont hesitate to do that instead:
```
wget https://github.com/nodejs/node/archive/v8.1.1.zip
unzip v8.1.1.zip
cd node-8.1.1/
```
Downloading the ZIP archive works great. But if you want to do it “like a pro”, I would suggest using directly the `git` tool to download the sources. It is not complicated at all— and it will be a nice first contact with a tool you will often encounter:
```
# first ensure git is installed on your system
sh$ sudo apt-get install git
# Make a shallow clone the NodeJS repository at v8.1.1
sh$ git clone --depth 1 \
--branch v8.1.1 \
https://github.com/nodejs/node
sh$ cd node/
```
By the way, if you have any issue, just consider that first part of this article as a general introduction. Later I have more detailed explanations for Debian- and ReadHat-based distributions in order to help you troubleshoot common issues.
Anyway, whenever you downloaded the source using `git` or as a ZIP archive, you should now have exactly the same source files in the current directory:
```
sh$ ls
android-configure BUILDING.md common.gypi doc Makefile src
AUTHORS CHANGELOG.md configure GOVERNANCE.md node.gyp test
benchmark CODE_OF_CONDUCT.md CONTRIBUTING.md lib node.gypi tools
BSDmakefile COLLABORATOR_GUIDE.md deps LICENSE README.md vcbuild.bat
```
### Step 2: Understanding the Build System of the program
We usually talk about “compiling the sources”, but the compilation is only one of the phases required to produce a working software from its source. A build system is a set of tool and practices used to automate and articulate those different tasks in order to build entirely the software just by issuing few commands.
If the concept is simple, the reality is somewhat more complicated. Because different projects or programming language may have different requirements. Or because of the programmers tastes. Or the supported platforms. Or for historical reason. Or… or.. there is an almost endless list of reasons to choose or create another build system. All that to say there are many different solutions used out there.
NodeJS uses a [GNU-style build system][16]. This is a popular choice in the open source community. And once again, a good way to start your journey.
Writing and tuning a build system is a pretty complex task. But for the “end user”, GNU-style build systems resume themselves in using two tools: `configure` and `make`.
The `configure` file is a project-specific script that will check the destination system configuration and available feature in order to ensure the project can be built, eventually dealing with the specificities of the current platform.
An important part of a typical `configure` job is to build the `Makefile`. That is the file containing the instructions required to effectively build the project.
The [`make` tool][17]), on the other hand, is a POSIX tool available on any Unix-like system. It will read the project-specific `Makefile` and perform the required operations to build and install your program.
But, as always in the Linux world, you still have some latency to customize the build for your specific needs.
```
./configure --help
```
The `configure -help` command will show you all the available configuration options. Once again, this is very project-specific. And to be honest, it is sometimes required to dig into the project before fully understand the meaning of each and every configure option.
But there is at least one standard GNU Autotools option that you must know: the `--prefix` option. This has to do with the file system hierarchy and the place your software will be installed.
[Suggested read8 Vim Tips And Tricks That Will Make You A Pro User][18]
### Step 3: The FHS
The Linux file system hierarchy on a typical distribution mostly comply with the [Filesystem Hierarchy Standard (FHS)][19]
That standard explains the purpose of the various directories of your system: `/usr`, `/tmp`, `/var` and so on.
When using the GNU Autotools— and most other build systems— the default installation location for your new software will be `/usr/local`. Which is a good choice as according to the FSH  _“The /usr/local hierarchy is for use by the system administrator when installing software locally? It needs to be safe from being overwritten when the system software is updated. It may be used for programs and data that are shareable amongst a group of hosts, but not found in /usr.”_
The `/usr/local` hierarchy somehow replicates the root directory, and you will find there `/usr/local/bin` for the executable programs, `/usr/local/lib` for the libraries, `/usr/local/share` for architecture independent files and so on.
The only issue when using the `/usr/local` tree for custom software installation is the files for all your software will be mixed there. Especially, after having installed a couple of software, it will be hard to track to which file exactly of `/usr/local/bin` and `/usr/local/lib` belongs to which software. That will not cause any issue to the system though. After all, `/usr/bin` is just about the same mess. But that will become an issue the day you will want to remove a manually installed software.
To solve that issue, I usually prefer installing custom software in the `/opt`sub-tree instead. Once again, to quote the FHS:
_”/opt is reserved for the installation of add-on application software packages.
A package to be installed in /opt must locate its static files in a separate /opt/<package> or /opt/<provider> directory tree, where <package> is a name that describes the software package and <provider> is the providers LANANA registered name.”_
So we will create a sub-directory of `/opt` specifically for our custom NodeJS installation. And if someday I want to remove that software, I will simply have to remove that directory:
```
sh$ sudo mkdir /opt/node-v8.1.1
sh$ sudo ln -sT node-v8.1.1 /opt/node
# What is the purpose of the symbolic link above?
# Read the article till the end--then try to answer that
# question in the comment section!
sh$ ./configure --prefix=/opt/node-v8.1.1
sh$ make -j9 && echo ok
# -j9 means run up to 9 parallel tasks to build the software.
# As a rule of thumb, use -j(N+1) where N is the number of cores
# of your system. That will maximize the CPU usage (one task per
# CPU thread/core + a provision of one extra task when a process
# is blocked by an I/O operation.
```
Anything but “ok” after the `make` command has completed would mean there was an error during the build process. As we ran a parallel build because of the `-j` option, it is not always easy to retrieve the error message given the large volume of output produced by the build system.
In the case of issue, just restart `make`, but without the `-j` option this time. And the error should appear near the end of the output:
```
sh$ make
```
Finally, once the compilation has gone to the end, you can install your software to its location by running the command:
```
sh$ sudo make install
```
And test it:
```
sh$ /opt/node/bin/node --version
v8.1.1
```
### B. What if things go wrong while installing from source code?
What Ive explained above is mostly what you can see on the “build instruction” page of a well-documented project. But given this article goal is to let you compile your first software from sources, it might worth taking the time to investigate some common issues. So, I will do the whole procedure again, but this time from a fresh and minimal Debian 9.0 and CentOS 7.0 systems. So you can see the error I encountered and how I solved them.
### From Debian 9.0 “Stretch”
```
itsfoss@debian:~$ git clone --depth 1 \
--branch v8.1.1 \
https://github.com/nodejs/node
-bash: git: command not found
```
This problem is quite easy to diagnosis and solve. Just install the `git` package:
```
itsfoss@debian:~$ sudo apt-get install git
```
```
itsfoss@debian:~$ git clone --depth 1 \
--branch v8.1.1 \
https://github.com/nodejs/node && echo ok
[...]
ok
```
```
itsfoss@debian:~/node$ sudo mkdir /opt/node-v8.1.1
itsfoss@debian:~/node$ sudo ln -sT node-v8.1.1 /opt/node
```
No problem here.
```
itsfoss@debian:~/node$ ./configure --prefix=/opt/node-v8.1.1/
WARNING: failed to autodetect C++ compiler version (CXX=g++)
WARNING: failed to autodetect C compiler version (CC=gcc)
Node.js configure error: No acceptable C compiler found!
Please make sure you have a C compiler installed on your system and/or
consider adjusting the CC environment variable if you installed
it in a non-standard prefix.
```
Obviously, to compile a project, you need a compiler. NodeJS being written using the [C++ language][20], we need a C++ [compiler][21]. Here I will install `g++`, the GNU C++ compiler for that purpose:
```
itsfoss@debian:~/node$ sudo apt-get install g++
itsfoss@debian:~/node$ ./configure --prefix=/opt/node-v8.1.1/ && echo ok
[...]
ok
```
```
itsfoss@debian:~/node$ make -j9 && echo ok
-bash: make: command not found
```
One other missing tool. Same symptoms. Same solution:
```
itsfoss@debian:~/node$ sudo apt-get install make
itsfoss@debian:~/node$ make -j9 && echo ok
[...]
ok
```
```
itsfoss@debian:~/node$ sudo make install
[...]
itsfoss@debian:~/node$ /opt/node/bin/node --version
v8.1.1
```
Success!
Please notice: Ive installed the various tools one by one to show how to diagnosis the compilation issues and to show you the typical solution to solve those issues. But if you search more about that topic or read other tutorials, you will discover that most distributions have “meta-packages” acting as an umbrella to install some or all the typical tools used for compiling a software. On Debian-based systems, you will probably encounter the [build-essentials][22]package for that purpose. And on Red-Hat-based distributions, that will be the  _“Development Tools”_  group.
### From CentOS 7.0
```
[itsfoss@centos ~]$ git clone --depth 1 \
--branch v8.1.1 \
https://github.com/nodejs/node
-bash: git: command not found
```
Command not found? Just install it using the `yum` package manager:
```
[itsfoss@centos ~]$ sudo yum install git
```
```
[itsfoss@centos ~]$ git clone --depth 1 \
--branch v8.1.1 \
https://github.com/nodejs/node && echo ok
[...]
ok
```
```
[itsfoss@centos ~]$ sudo mkdir /opt/node-v8.1.1
[itsfoss@centos ~]$ sudo ln -sT node-v8.1.1 /opt/node
```
```
[itsfoss@centos ~]$ cd node
[itsfoss@centos node]$ ./configure --prefix=/opt/node-v8.1.1/
WARNING: failed to autodetect C++ compiler version (CXX=g++)
WARNING: failed to autodetect C compiler version (CC=gcc)
Node.js configure error: No acceptable C compiler found!
Please make sure you have a C compiler installed on your system and/or
consider adjusting the CC environment variable if you installed
it in a non-standard prefix.
```
You guess it: NodeJS is written using the C++ language, but my system lacks the corresponding compiler. Yum to the rescue. As Im not a regular CentOS user, I actually had to search on the Internet the exact name of the package containing the g++ compiler. Leading me to that page: [https://superuser.com/questions/590808/yum-install-gcc-g-doesnt-work-anymore-in-centos-6-4][23]
```
[itsfoss@centos node]$ sudo yum install gcc-c++
[itsfoss@centos node]$ ./configure --prefix=/opt/node-v8.1.1/ && echo ok
[...]
ok
```
```
[itsfoss@centos node]$ make -j9 && echo ok
[...]
ok
```
```
[itsfoss@centos node]$ sudo make install && echo ok
[...]
ok
```
```
[itsfoss@centos node]$ /opt/node/bin/node --version
v8.1.1
```
Success. Again.
### C. Making changes to the software installed from source code
You may install a software from the source because you need a very specific version not available in your distribution repository. Or because you want to  _modify_  that program. Either to fix a bug or add a feature. After all, open-source is all about that. So I will take that opportunity to give you a taste of the power you have at hand now you are able to compile your own software.
Here, we will make a minor change to the sources of NodeJS. And we will see if our change will be incorporated into the compiled version of the software:
Open the file `node/src/node.cc` in your favorite [text editor][24] (vim, nano, gedit, … ). And try to locate that fragment of code:
```
if (debug_options.ParseOption(argv[0], arg)) {
// Done, consumed by DebugOptions::ParseOption().
} else if (strcmp(arg, "--version") == 0 || strcmp(arg, "-v") == 0) {
printf("%s\n", NODE_VERSION);
exit(0);
} else if (strcmp(arg, "--help") == 0 || strcmp(arg, "-h") == 0) {
PrintHelp();
exit(0);
}
```
It is around [line 3830 of the file][25]. Then modify the line containing `printf` to match that one instead:
```
printf("%s (compiled by myself)\n", NODE_VERSION);
```
Then head back to your terminal. Before going further— and to give you some more insight of the power behind git— you can check if youve modified the right file:
```
diff --git a/src/node.cc b/src/node.cc
index bbce1022..a5618b57 100644
--- a/src/node.cc
+++ b/src/node.cc
@@ -3828,7 +3828,7 @@ static void ParseArgs(int* argc,
if (debug_options.ParseOption(argv[0], arg)) {
// Done, consumed by DebugOptions::ParseOption().
} else if (strcmp(arg, "--version") == 0 || strcmp(arg, "-v") == 0) {
- printf("%s\n", NODE_VERSION);
+ printf("%s (compiled by myself)\n", NODE_VERSION);
exit(0);
} else if (strcmp(arg, "--help") == 0 || strcmp(arg, "-h") == 0) {
PrintHelp();
```
You should see a “-” (minus sign) before the line as it was before you changed it. And a “+” (plus sign) before the line after your changes.
It is now time to recompile and re-install your software:
```
make -j9 && sudo make install && echo ok
[...]
ok
```
This times, the only reason it might fail is that youve made a typo while changing the code. If this is the case, re-open the `node/src/node.cc` file in your text editor and fix the mistake.
Once youve managed to compile and install that new modified NodeJS version, you will be able to check if your modifications were actually incorporated into the software:
```
itsfoss@debian:~/node$ /opt/node/bin/node --version
v8.1.1 (compiled by myself)
```
Congratulations! Youve made your first change to an open-source program!
### D. Let the shell locate our custom build software
You may have noticed until now, I always launched my newly compiled NodeJS software by specifying the absolute path to the binary file.
```
/opt/node/bin/node
```
It works. But this is annoying, to say the least. There are actually two common ways of fixing that. But to understand them, you must first know your shell locates the executable files by looking for them only into the directories specified by the `PATH` [environment variable][26].
```
itsfoss@debian:~/node$ echo $PATH
/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
```
Here, on that Debian system, if you do not specify explicitly any directory as part of a command name, the shell will first look for that executable programs into `/usr/local/bin`, then if not found into `/usr/bin`, then if not found into `/bin` then if not found into `/usr/local/games` then if not found into `/usr/games`, then if not found … the shell will report an error  _“command not found”_ .
Given that, we have two way to make a command accessible to the shell: by adding it to one of the already configured `PATH` directories. Or by adding the directory containing our executable file to the `PATH`.
### Adding a link from /usr/local/bin
Just  _copying_  the node binary executable from `/opt/node/bin` to `/usr/local/bin` would be a bad idea since by doing so, the executable program would no longer be able to locate the other required components belonging to `/opt/node/` (its a common practice for a software to locate its resource files relative to its own location).
So, the traditional way of doing that is by using a symbolic link:
```
itsfoss@debian:~/node$ sudo ln -sT /opt/node/bin/node /usr/local/bin/node
itsfoss@debian:~/node$ which -a node || echo not found
/usr/local/bin/node
itsfoss@debian:~/node$ node --version
v8.1.1 (compiled by myself)
```
This is a simple and effective solution, especially if a software package is made of just few well known executable programs— since you have to create a symbolic link for each and every user-invokable commands. For example, if youre familiar with NodeJS, you know the `npm` companion application I should symlink from `/usr/local/bin` too. But I let that to you as an exercise.
### Modifying the PATH
First, if you tried the preceding solution, remove the node symbolic link created previously to start from a clear state:
```
itsfoss@debian:~/node$ sudo rm /usr/local/bin/node
itsfoss@debian:~/node$ which -a node || echo not found
not found
```
And now, here is the magic command to change your `PATH`:
```
itsfoss@debian:~/node$ export PATH="/opt/node/bin:${PATH}"
itsfoss@debian:~/node$ echo $PATH
/opt/node/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
```
Simply said, I replaced the content of the `PATH` environment variable by its previous content, but prefixed by `/opt/node/bin`. So, as you can imagine it now, the shell will look first into the `/opt/node/bin` directory for executable programs. We can confirm that using the `which` command:
```
itsfoss@debian:~/node$ which -a node || echo not found
/opt/node/bin/node
itsfoss@debian:~/node$ node --version
v8.1.1 (compiled by myself)
```
Whereas the “link” solution is permanent as soon as youve created the symbolic link into `/usr/local/bin`, the `PATH` change is effective only into the current shell. I let you do some researches by yourself to know how to make changes of the `PATH` permanents. As a hint, it has to do with your “profile”. If you find the solution, dont hesitate to share that with the other readers by using the comment section below!
### E. How to remove that newly installed software from source code
Since our custom compiled NodeJS software sits completely in the `/opt/node-v8.1.1` directory, removing that software is not more work than using the `rm` command to remove that directory:
```
sudo rm -rf /opt/node-v8.1.1
```
BEWARE: `sudo` and `rm -rf` are a dangerous cocktail! Always check your command twice before pressing the “enter” key. You wont have any confirmation message and no undelete if you remove the wrong directory…
Then, if youve modified your `PATH`, you will have to revert those changes. Which is not complicated at all.
And if youve created links from `/usr/local/bin` you will have to remove them all:
```
itsfoss@debian:~/node$ sudo find /usr/local/bin \
-type l \
-ilname "/opt/node/*" \
-print -delete
/usr/local/bin/node
```
### Wait? Where was the Dependency Hell?
As a final comment, if you read about compiling your own custom software, you might have heard about the [dependency hell][27]. This is a nickname for that annoying situation where before being able to successfully compile a software, you must first compile a pre-requisite library, which in its turn requires another library that might in its turn be incompatible with some other software youve already installed.
Part of the job of the package maintainers of your distribution is to actually resolve that dependency hell and to ensure the various software of your system are using compatible libraries and are installed in the right order.
In that article, I chose on purpose to install NodeJS as it virtually doesnt have dependencies. I said “virtually” because, in fact, it  _has_  dependencies. But the source code of those dependencies are present in the source repository of the project (in the `node/deps` subdirectory), so you dont have to download and install them manually before hand.
But if youre interested in understanding more about that problem and learn how to deal with it, let me know that using the comment section below: that would be a great topic for a more advanced article!
--------------------------------------------------------------------------------
作者简介:
Engineer by Passion, Teacher by Vocation. My goals : to share my enthusiasm for what I teach and prepare my students to develop their skills by themselves. You can find me on my website as well.
--------------------
via: https://itsfoss.com/install-software-from-source-code/
作者:[Sylvain Leroux ][a]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://itsfoss.com/author/sylvain/
[1]:https://itsfoss.com/author/sylvain/
[2]:https://itsfoss.com/install-software-from-source-code/#comments
[3]:https://www.facebook.com/share.php?u=https%3A%2F%2Fitsfoss.com%2Finstall-software-from-source-code%2F%3Futm_source%3Dfacebook%26utm_medium%3Dsocial%26utm_campaign%3DSocialWarfare
[4]:https://twitter.com/share?original_referer=/&text=How+to+Install+Software+from+Source+Code%E2%80%A6+and+Remove+it+Afterwards&url=https://itsfoss.com/install-software-from-source-code/%3Futm_source%3Dtwitter%26utm_medium%3Dsocial%26utm_campaign%3DSocialWarfare&via=Yes_I_Know_IT
[5]:https://plus.google.com/share?url=https%3A%2F%2Fitsfoss.com%2Finstall-software-from-source-code%2F%3Futm_source%3DgooglePlus%26utm_medium%3Dsocial%26utm_campaign%3DSocialWarfare
[6]:https://www.linkedin.com/cws/share?url=https%3A%2F%2Fitsfoss.com%2Finstall-software-from-source-code%2F%3Futm_source%3DlinkedIn%26utm_medium%3Dsocial%26utm_campaign%3DSocialWarfare
[7]:https://www.reddit.com/submit?url=https://itsfoss.com/install-software-from-source-code/&title=How+to+Install+Software+from+Source+Code%E2%80%A6+and+Remove+it+Afterwards
[8]:https://itsfoss.com/remove-install-software-ubuntu/
[9]:https://nodejs.org/en/
[10]:https://github.com/nodejs/node
[11]:https://en.wikipedia.org/wiki/GitHub
[12]:https://en.wikipedia.org/wiki/Git
[13]:https://en.wikipedia.org/wiki/Version_control
[14]:https://stackoverflow.com/questions/1457103/how-is-a-tag-different-from-a-branch-which-should-i-use-here
[15]:https://en.wikipedia.org/wiki/Graphical_user_interface
[16]:https://en.wikipedia.org/wiki/GNU_Build_System
[17]:https://en.wikipedia.org/wiki/Make_%28software
[18]:https://itsfoss.com/pro-vim-tips/
[19]:http://www.pathname.com/fhs/
[20]:https://en.wikipedia.org/wiki/C%2B%2B
[21]:https://en.wikipedia.org/wiki/Compiler
[22]:https://packages.debian.org/sid/build-essential
[23]:https://superuser.com/questions/590808/yum-install-gcc-g-doesnt-work-anymore-in-centos-6-4
[24]:https://en.wikipedia.org/wiki/List_of_text_editors
[25]:https://github.com/nodejs/node/blob/v8.1.1/src/node.cc#L3830
[26]:https://en.wikipedia.org/wiki/Environment_variable
[27]:https://en.wikipedia.org/wiki/Dependency_hell

View File

@ -1,3 +1,4 @@
fuzheng1998 translating
A Large-Scale Study of Programming Languages and Code Quality in GitHub
============================================================

View File

@ -1,3 +1,5 @@
HankChow Translating
In Device We Trust: Measure Twice, Compute Once with Xen, Linux, TPM 2.0 and TXT
============================================================

View File

@ -1,4 +1,5 @@
translating by sugarfillet
Translating by FelixYFZ
Linux Networking Hardware for Beginners: Think Software
============================================================

View File

@ -1,89 +0,0 @@
translating by 2ephaniah
Proxy Models in Container Environments
============================================================
### Most of us are familiar with how proxies work, but is it any different in a container-based environment? See what's changed.
Inline, side-arm, reverse, and forward. These used to be the terms we used to describe the architectural placement of proxies in the network.
Today, containers use some of the same terminology, but they are introducing new ones. Thats an opportunity for me to extemporaneously expound* on my favorite of all topics: the proxy.
One of the primary drivers of cloud (once we all got past the pipedream of cost containment) has been scalability. Scale has challenged agility (and sometimes won) in various surveys over the past five years as the number one benefit organizations seek by deploying apps in cloud computing environments.
Thats in part because in a digital economy (in which we now operate), apps have become the digital equivalent of brick-and-mortar “open/closed” signs and the manifestation of digital customer assistance. Slow, unresponsive apps have the same effect as turning out the lights or understaffing the store.
Apps need to be available and responsive to meet demand. Scale is the technical response to achieving that business goal. Cloud not only provides the ability to scale, but offers the ability to scale  _automatically_ . To do that requires a load balancer. Because thats how we scale apps with proxies that load balance traffic/requests.
Containers are no different with respect to expectations around scale. Containers must scale and scale automatically and that means the use of load balancers (proxies).
If youre using native capabilities, youre doing primitive load balancing based on TCP/UDP. Generally speaking, container-based proxy implementations arent fluent in HTTP or other application layer protocols and dont offer capabilities beyond plain old load balancing ([POLB][1]). Thats often good enough, as container scale operates on a cloned, horizontal premise to scale an app, add another copy and distribute requests across it. Layer 7 (HTTP) routing capabilities are found at the ingress (in [ingress controllers][2] and API gateways) and are used as much (or more) for app routing as they are to scale applications.
In some cases, however, this is not enough. If you want (or need) more application-centric scale or the ability to insert additional services, youll graduate to more robust offerings that can provide programmability or application-centric scalability or both.
To do that means [plugging-in proxies][3]. The container orchestration environment youre working in largely determines the deployment model of the proxy in terms of whether its a reverse proxy or a forward proxy. Just to keep things interesting, theres also a third model sidecar that is the foundation of scalability supported by emerging service mesh implementations.
### Reverse Proxy
[![Image title](https://devcentral.f5.com/Portals/0/Users/038/38/38/unavailable_is_closed_thumb.png?ver=2017-09-12-082119-957 "Image title")][4]
A reverse proxy is closest to a traditional model in which a virtual server accepts all incoming requests and distributes them across a pool (farm, cluster) of resources.
There is one proxy per application. Any client that wants to connect to the application is instead connected to the proxy, which then chooses and forwards the request to an appropriate instance. If the green app wants to communicate with the blue app, it sends a request to the blue proxy, which determines which of the two instances of the blue app should respond to the request.
In this model, the proxy is only concerned with the app it is managing. The blue proxy doesnt care about the instances associated with the orange proxy, and vice-versa.
### Forward Proxy
[![Image title](https://devcentral.f5.com/Portals/0/Users/038/38/38/per-node_forward_proxy_thumb.jpg?ver=2017-09-14-072422-213)][5]
This mode more closely models that of a traditional outbound firewall.
In this model, each container **node** has an associated proxy. If a client wants to connect to a particular application or service, it is instead connected to the proxy local to the container node where the client is running. The proxy then chooses an appropriate instance of that application and forwards the client's request.
Both the orange and the blue app connect to the same proxy associated with its node. The proxy then determines which instance of the requested app instance should respond.
In this model, every proxy must know about every application to ensure it can forward requests to the appropriate instance.
### Sidecar Proxy
[![Image title](https://devcentral.f5.com/Portals/0/Users/038/38/38/per-pod_sidecar_proxy_thumb.jpg?ver=2017-09-14-072425-620)][6]
This mode is also referred to as a service mesh router. In this model, each **container **has its own proxy.
If a client wants to connect to an application, it instead connects to the sidecar proxy, which chooses an appropriate instance of that application and forwards the client's request. This behavior is the same as a  _forward proxy _ model.
The difference between a sidecar and forward proxy is that sidecar proxies do not need to modify the container orchestration environment. For example, in order to plug-in a forward proxy to k8s, you need both the proxy  _and _ a replacement for kube-proxy. Sidecar proxies do not require this modification because it is the app that automatically connects to its “sidecar” proxy instead of being routed through the proxy.
### Summary
Each model has its advantages and disadvantages. All three share a reliance on environmental data (telemetry and changes in configuration) as well as the need to integrate into the ecosystem. Some models are pre-determined by the environment you choose, so careful consideration as to future needs service insertion, security, networking complexity need to be evaluated before settling on a model.
Were still in early days with respect to containers and their growth in the enterprise. As they continue to stretch into production environments its important to understand the needs of the applications delivered by containerized environments and how their proxy models differ in implementation.
*It was extemporaneous when I wrote it down. Now, not so much.
--------------------------------------------------------------------------------
via: https://dzone.com/articles/proxy-models-in-container-environments
作者:[Lori MacVittie ][a]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://dzone.com/users/307701/lmacvittie.html
[1]:https://f5.com/about-us/blog/articles/go-beyond-polb-plain-old-load-balancing
[2]:https://f5.com/about-us/blog/articles/ingress-controllers-new-name-familiar-function-27388
[3]:http://clouddocs.f5.com/products/asp/v1.0/
[4]:https://devcentral.f5.com/Portals/0/Users/038/38/38/unavailable_is_closed.png?ver=2017-09-12-082118-160
[5]:https://devcentral.f5.com/Portals/0/Users/038/38/38/per-node_forward_proxy.jpg?ver=2017-09-14-072419-667
[6]:https://devcentral.f5.com/Portals/0/Users/038/38/38/per-pod_sidecar_proxy.jpg?ver=2017-09-14-072424-073
[7]:https://dzone.com/users/307701/lmacvittie.html
[8]:https://dzone.com/users/307701/lmacvittie.html
[9]:https://dzone.com/articles/proxy-models-in-container-environments#
[10]:https://dzone.com/cloud-computing-tutorials-tools-news
[11]:https://dzone.com/articles/proxy-models-in-container-environments#
[12]:https://dzone.com/go?i=243221&u=https%3A%2F%2Fget.platform9.com%2Fjzlp-kubernetes-deployment-models-the-ultimate-guide%2F

View File

@ -1,98 +0,0 @@
XYenChi is translating
Image Processing on Linux
============================================================
I've covered several scientific packages in this space that generate nice graphical representations of your data and work, but I've not gone in the other direction much. So in this article, I cover a popular image processing package called ImageJ. Specifically, I am looking at [Fiji][4], an instance of ImageJ bundled with a set of plugins that are useful for scientific image processing.
The name Fiji is a recursive acronym, much like GNU. It stands for "Fiji Is Just ImageJ". ImageJ is a useful tool for analyzing images in scientific research—for example, you may use it for classifying tree types in a landscape from aerial photography. ImageJ can do that type categorization. It's built with a plugin architecture, and a very extensive collection of plugins is available to increase the available functionality.
The first step is to install ImageJ (or Fiji). Most distributions will have a package available for ImageJ. If you wish, you can install it that way and then install the individual plugins you need for your research. The other option is to install Fiji and get the most commonly used plugins at the same time. Unfortunately, most Linux distributions will not have a package available within their package repositories for Fiji. Luckily, however, an easy installation file is available from the main website. It's a simple zip file, containing a directory with all of the files required to run Fiji. When you first start it, you get only a small toolbar with a list of menu items (Figure 1).
![](http://www.linuxjournal.com/files/linuxjournal.com/ufiles/imagecache/large-550px-centered/u1000009/12172fijif1.png)
Figure 1\. You get a very minimal interface when you first start Fiji.
If you don't already have some images to use as you are learning to work with ImageJ, the Fiji installation includes several sample images. Click the File→Open Samples menu item for a dropdown list of sample images (Figure 2). These samples cover many of the potential tasks you might be interested in working on.
![](http://www.linuxjournal.com/files/linuxjournal.com/ufiles/imagecache/large-550px-centered/u1000009/12172fijif2.jpg)
Figure 2\. Several sample images are available that you can use as you learn how to work with ImageJ.
If you installed Fiji, rather than ImageJ alone, a large set of plugins already will be installed. The first one of note is the autoupdater plugin. This plugin checks the internet for updates to ImageJ, as well as the installed plugins, each time ImageJ is started.
All of the installed plugins are available under the Plugins menu item. Once you have installed a number of plugins, this list can become a bit unwieldy, so you may want to be judicious in your plugin selection. If you want to trigger the updates manually, click the Help→Update Fiji menu item to force the check and get a list of available updates (Figure 3).
![](http://www.linuxjournal.com/files/linuxjournal.com/ufiles/imagecache/large-550px-centered/u1000009/12172fijif3.png)
Figure 3\. You can force a manual check of what updates are available.
Now, what kind of work can you do with Fiji/ImageJ? One example is doing counts of objects within an image. You can load a sample by clicking File→Open Samples→Embryos.
![](http://www.linuxjournal.com/files/linuxjournal.com/ufiles/imagecache/large-550px-centered/u1000009/12172fijif4.jpg)
Figure 4\. With ImageJ, you can count objects within an image.
The first step is to set a scale to the image so you can tell ImageJ how to identify objects. First, select the line button on the toolbar and draw a line over the length of the scale legend on the image. You then can select Analyze→Set Scale, and it will set the number of pixels that the scale legend occupies (Figure 5). You can set the known distance to be 100 and the units to be "um".
![](http://www.linuxjournal.com/files/linuxjournal.com/ufiles/imagecache/large-550px-centered/u1000009/12172fijif5.png)
Figure 5\. For many image analysis tasks, you need to set a scale to the image.
The next step is to simplify the information within the image. Click Image→Type→8-bit to reduce the information to an 8-bit gray-scale image. To isolate the individual objects, click Process→Binary→Make Binary to threshold the image automatically (Figure 6).
![](http://www.linuxjournal.com/files/linuxjournal.com/ufiles/imagecache/large-550px-centered/u1000009/12172fijif6.png)
Figure 6\. There are tools to do automatic tasks like thresholding.
Before you can count the objects within the image, you need to remove artifacts like the scale legend. You can do that by using the rectangular selection tool to select it and then click Edit→Clear. Now you can analyze the image and see what objects are there.
Making sure that there are no areas selected in the image, click Analyze→Analyze Particles to pop up a window where you can select the minimum size, what results to display and what to show in the final image (Figure 7).
![](http://www.linuxjournal.com/files/linuxjournal.com/ufiles/imagecache/large-550px-centered/u1000009/12172fijif7.png)
Figure 7\. You can generate a reduced image with identified particles.
Figure 8 shows an overall look at what was discovered in the summary results window. There is also a detailed results window for each individual particle.
![](http://www.linuxjournal.com/files/linuxjournal.com/ufiles/imagecache/large-550px-centered/u1000009/12172fijif8.png)
Figure 8\. One of the output results includes a summary list of the particles identified.
Once you have an analysis worked out for a given image type, you often need to apply the exact same analysis to a series of images. This series may number into the thousands, so it's typically not something you will want to repeat manually for each image. In such cases, you can collect the required steps together into a macro so that they can be reapplied multiple times. Clicking Plugins→Macros→Record pops up a new window where all of your subsequent commands will be recorded. Once all of the steps are finished, you can save them as a macro file and rerun them on other images by clicking Plugins→Macros→Run.
If you have a very specific set of steps for your workflow, you simply can open the macro file and edit it by hand, as it is a simple text file. There is actually a complete macro language available to you to control the process that is being applied to your images more fully.
If you have a really large set of images that needs to be processed, however, this still might be too tedious for your workflow. In that case, go to Process→Batch→Macro to pop up a new window where you can set up your batch processing workflow (Figure 9).
![](http://www.linuxjournal.com/files/linuxjournal.com/ufiles/imagecache/large-550px-centered/u1000009/12172fijif9.png)
Figure 9\. You can run a macro on a batch of input image files with a single command.
From this window, you can select which macro file to apply, the source directory where the input images are located and the output directory where you want the output images to be written. You also can set the output file format and filter the list of images being used as input based on what the filename contains. Once everything is done, start the batch run by clicking the Process button at the bottom of the window.
If this is a workflow that will be repeated over time, you can save the batch process to a text file by clicking the Save button at the bottom of the window. You then can reload the same workflow by clicking the Open button, also at the bottom of the window. All of this functionality allows you to automate the most tedious parts of your research so you can focus on the actual science.
Considering that there are more than 500 plugins and more than 300 macros available from the main ImageJ website alone, it is an understatement that I've been able to touch on only the most basic of topics in this short article. Luckily, many domain-specific tutorials are available, along with the very good documentation for the core of ImageJ from the main project website. If you think this tool could be of use to your research, there is a wealth of information to guide you in your particular area of study.
--------------------------------------------------------------------------------
作者简介:
Joey Bernard has a background in both physics and computer science. This serves him well in his day job as a computational research consultant at the University of New Brunswick. He also teaches computational physics and parallel programming.
--------------------------------
via: https://www.linuxjournal.com/content/image-processing-linux
作者:[Joey Bernard][a]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://www.linuxjournal.com/users/joey-bernard
[1]:https://www.linuxjournal.com/tag/science
[2]:https://www.linuxjournal.com/tag/statistics
[3]:https://www.linuxjournal.com/users/joey-bernard
[4]:https://imagej.net/Fiji

View File

@ -0,0 +1,492 @@
Translating by qhwdw [Concurrent Servers: Part 4 - libuv][17]
============================================================
This is part 4 of a series of posts on writing concurrent network servers. In this part we're going to use libuv to rewrite our server once again, and also talk about handling time-consuming tasks in callbacks using a thread pool. Finally, we're going to look under the hood of libuv for a bit to study how it wraps blocking file-system operations with an asynchronous API.
All posts in the series:
* [Part 1 - Introduction][7]
* [Part 2 - Threads][8]
* [Part 3 - Event-driven][9]
* [Part 4 - libuv][10]
### Abstracting away event-driven loops with libuv
In [part 3][11], we've seen how similar select-based and epoll-based servers are, and I mentioned it's very tempting to abstract away the minor differences between them. Numerous libraries are already doing this, however, so in this part I'm going to pick one and use it. The library I'm picking is [libuv][12], which was originally designed to serve as the underlying portable platform layer for Node.js, and has since found use in additional projects. libuv is written in C, which makes it highly portable and very suitable for tying into high-level languages like JavaScript and Python.
While libuv has grown to be a fairly large framework for abstracting low-level platform details, it remains centered on the concept of an  _event loop_ . In our event-driven servers in part 3, the event loop was explicit in the main function; when using libuv, the loop is usually hidden inside the library itself, and user code just registers event handlers (as callback functions) and runs the loop. Furthermore, libuv will use the fastest event loop implementation for a given platform: for Linux this is epoll, etc.
![libuv loop](https://eli.thegreenplace.net/images/2017/libuvloop.png)
libuv supports multiple event loops, and thus an event loop is a first class citizen within the library; it has a handle - uv_loop_t, and functions for creating/destroying/starting/stopping loops. That said, I will only use the "default" loop in this post, which libuv makes available via uv_default_loop(); multiple loops are mosly useful for multi-threaded event-driven servers, a more advanced topic I'll leave for future parts in the series.
### A concurrent server using libuv
To get a better feel for libuv, let's jump to our trusty protocol server that we've been vigorously reimplementing throughout the series. The structure of this server is going to be somewhat similar to the select and epoll-based servers of part 3, since it also relies on callbacks. The full [code sample is here][13]; we start with setting up the server socket bound to a local port:
```
int portnum = 9090;
if (argc >= 2) {
portnum = atoi(argv[1]);
}
printf("Serving on port %d\n", portnum);
int rc;
uv_tcp_t server_stream;
if ((rc = uv_tcp_init(uv_default_loop(), &server_stream)) < 0) {
die("uv_tcp_init failed: %s", uv_strerror(rc));
}
struct sockaddr_in server_address;
if ((rc = uv_ip4_addr("0.0.0.0", portnum, &server_address)) < 0) {
die("uv_ip4_addr failed: %s", uv_strerror(rc));
}
if ((rc = uv_tcp_bind(&server_stream, (const struct sockaddr*)&server_address, 0)) < 0) {
die("uv_tcp_bind failed: %s", uv_strerror(rc));
}
```
Fairly standard socket fare here, except that it's all wrapped in libuv APIs. In return we get a portable interface that should work on any platform libuv supports.
This code also demonstrates conscientious error handling; most libuv functions return an integer status, with a negative number meaning an error. In our server we treat these errors as fatals, but one may imagine a more graceful recovery.
Now that the socket is bound, it's time to listen on it. Here we run into our first callback registration:
```
// Listen on the socket for new peers to connect. When a new peer connects,
// the on_peer_connected callback will be invoked.
if ((rc = uv_listen((uv_stream_t*)&server_stream, N_BACKLOG, on_peer_connected)) < 0) {
die("uv_listen failed: %s", uv_strerror(rc));
}
```
uv_listen registers a callback that the event loop will invoke when new peers connect to the socket. Our callback here is called on_peer_connected, and we'll examine it soon.
Finally, main runs the libuv loop until it's stopped (uv_run only returns when the loop has stopped or some error occurred).
```
// Run the libuv event loop.
uv_run(uv_default_loop(), UV_RUN_DEFAULT);
// If uv_run returned, close the default loop before exiting.
return uv_loop_close(uv_default_loop());
```
Note that only a single callback was registered by main prior to running the event loop; we'll soon see how additional callbacks are added. It's not a problem to add and remove callbacks throughout the runtime of the event loop - in fact, this is how most servers are expected to be written.
This is on_peer_connected, which handles new client connections to the server:
```
void on_peer_connected(uv_stream_t* server_stream, int status) {
if (status < 0) {
fprintf(stderr, "Peer connection error: %s\n", uv_strerror(status));
return;
}
// client will represent this peer; it's allocated on the heap and only
// released when the client disconnects. The client holds a pointer to
// peer_state_t in its data field; this peer state tracks the protocol state
// with this client throughout interaction.
uv_tcp_t* client = (uv_tcp_t*)xmalloc(sizeof(*client));
int rc;
if ((rc = uv_tcp_init(uv_default_loop(), client)) < 0) {
die("uv_tcp_init failed: %s", uv_strerror(rc));
}
client->data = NULL;
if (uv_accept(server_stream, (uv_stream_t*)client) == 0) {
struct sockaddr_storage peername;
int namelen = sizeof(peername);
if ((rc = uv_tcp_getpeername(client, (struct sockaddr*)&peername,
&namelen)) < 0) {
die("uv_tcp_getpeername failed: %s", uv_strerror(rc));
}
report_peer_connected((const struct sockaddr_in*)&peername, namelen);
// Initialize the peer state for a new client: we start by sending the peer
// the initial '*' ack.
peer_state_t* peerstate = (peer_state_t*)xmalloc(sizeof(*peerstate));
peerstate->state = INITIAL_ACK;
peerstate->sendbuf[0] = '*';
peerstate->sendbuf_end = 1;
peerstate->client = client;
client->data = peerstate;
// Enqueue the write request to send the ack; when it's done,
// on_wrote_init_ack will be called. The peer state is passed to the write
// request via the data pointer; the write request does not own this peer
// state - it's owned by the client handle.
uv_buf_t writebuf = uv_buf_init(peerstate->sendbuf, peerstate->sendbuf_end);
uv_write_t* req = (uv_write_t*)xmalloc(sizeof(*req));
req->data = peerstate;
if ((rc = uv_write(req, (uv_stream_t*)client, &writebuf, 1,
on_wrote_init_ack)) < 0) {
die("uv_write failed: %s", uv_strerror(rc));
}
} else {
uv_close((uv_handle_t*)client, on_client_closed);
}
}
```
This code is well commented, but there are a couple of important libuv idioms I'd like to highlight:
* Passing custom data into callbacks: since C has no closures, this can be challenging. libuv has a void* datafield in all its handle types; these fields can be used to pass user data. For example, note how client->data is made to point to a peer_state_t structure so that the callbacks registered by uv_write and uv_read_start can know which peer data they're dealing with.
* Memory management: event-driven programming is much easier in languages with garbage collection, because callbacks usually run in a completely different stack frame from where they were registered, making stack-based memory management difficult. It's almost always necessary to pass heap-allocated data to libuv callbacks (except in main, which remains alive on the stack when all callbacks run), and to avoid leaks much care is required about when these data are safe to free(). This is something that comes with a bit of practice [[1]][6].
The peer state for this server is:
```
typedef struct {
ProcessingState state;
char sendbuf[SENDBUF_SIZE];
int sendbuf_end;
uv_tcp_t* client;
} peer_state_t;
```
It's fairly similar to the state in part 3; we no longer need sendptr, since uv_write will make sure to send the whole buffer it's given before invoking the "done writing" callback. We also keep a pointer to the client for other callbacks to use. Here's on_wrote_init_ack:
```
void on_wrote_init_ack(uv_write_t* req, int status) {
if (status) {
die("Write error: %s\n", uv_strerror(status));
}
peer_state_t* peerstate = (peer_state_t*)req->data;
// Flip the peer state to WAIT_FOR_MSG, and start listening for incoming data
// from this peer.
peerstate->state = WAIT_FOR_MSG;
peerstate->sendbuf_end = 0;
int rc;
if ((rc = uv_read_start((uv_stream_t*)peerstate->client, on_alloc_buffer,
on_peer_read)) < 0) {
die("uv_read_start failed: %s", uv_strerror(rc));
}
// Note: the write request doesn't own the peer state, hence we only free the
// request itself, not the state.
free(req);
}
```
Then we know for sure that the initial '*' was sent to the peer, we start listening to incoming data from this peer by calling uv_read_start, which registers a callback (on_peer_read) that will be invoked by the event loop whenever new data is received on the socket from the client:
```
void on_peer_read(uv_stream_t* client, ssize_t nread, const uv_buf_t* buf) {
if (nread < 0) {
if (nread != uv_eof) {
fprintf(stderr, "read error: %s\n", uv_strerror(nread));
}
uv_close((uv_handle_t*)client, on_client_closed);
} else if (nread == 0) {
// from the documentation of uv_read_cb: nread might be 0, which does not
// indicate an error or eof. this is equivalent to eagain or ewouldblock
// under read(2).
} else {
// nread > 0
assert(buf->len >= nread);
peer_state_t* peerstate = (peer_state_t*)client->data;
if (peerstate->state == initial_ack) {
// if the initial ack hasn't been sent for some reason, ignore whatever
// the client sends in.
free(buf->base);
return;
}
// run the protocol state machine.
for (int i = 0; i < nread; ++i) {
switch (peerstate->state) {
case initial_ack:
assert(0 && "can't reach here");
break;
case wait_for_msg:
if (buf->base[i] == '^') {
peerstate->state = in_msg;
}
break;
case in_msg:
if (buf->base[i] == '$') {
peerstate->state = wait_for_msg;
} else {
assert(peerstate->sendbuf_end < sendbuf_size);
peerstate->sendbuf[peerstate->sendbuf_end++] = buf->base[i] + 1;
}
break;
}
}
if (peerstate->sendbuf_end > 0) {
// we have data to send. the write buffer will point to the buffer stored
// in the peer state for this client.
uv_buf_t writebuf =
uv_buf_init(peerstate->sendbuf, peerstate->sendbuf_end);
uv_write_t* writereq = (uv_write_t*)xmalloc(sizeof(*writereq));
writereq->data = peerstate;
int rc;
if ((rc = uv_write(writereq, (uv_stream_t*)client, &writebuf, 1,
on_wrote_buf)) < 0) {
die("uv_write failed: %s", uv_strerror(rc));
}
}
}
free(buf->base);
}
```
The runtime behavior of this server is very similar to the event-driven servers of part 3: all clients are handled concurrently in a single thread. Also similarly, a certain discipline has to be maintained in the server's code: the server's logic is implemented as an ensemble of callbacks, and long-running operations are a big no-no since they block the event loop. Let's explore this issue a bit further.
### Long-running operations in event-driven loops
The single-threaded nature of event-driven code makes it very susceptible to a common issue: long-running code blocks the entire loop. Consider this program:
```
void on_timer(uv_timer_t* timer) {
uint64_t timestamp = uv_hrtime();
printf("on_timer [%" PRIu64 " ms]\n", (timestamp / 1000000) % 100000);
// "Work"
if (random() % 5 == 0) {
printf("Sleeping...\n");
sleep(3);
}
}
int main(int argc, const char** argv) {
uv_timer_t timer;
uv_timer_init(uv_default_loop(), &timer);
uv_timer_start(&timer, on_timer, 0, 1000);
return uv_run(uv_default_loop(), UV_RUN_DEFAULT);
}
```
It runs a libuv event loop with a single registered callback: on_timer, which is invoked by the loop every second. The callback reports a timestamp, and once in a while simulates some long-running task by sleeping for 3 seconds. Here's a sample run:
```
$ ./uv-timer-sleep-demo
on_timer [4840 ms]
on_timer [5842 ms]
on_timer [6843 ms]
on_timer [7844 ms]
Sleeping...
on_timer [11845 ms]
on_timer [12846 ms]
Sleeping...
on_timer [16847 ms]
on_timer [17849 ms]
on_timer [18850 ms]
...
```
on_timer dutifully fires every second, until the random sleep hits in. At that point, on_timer is not invoked again until the sleep is over; in fact,  _no other callbacks_  will be invoked in this time frame. The sleep call blocks the current thread, which is the only thread involved and is also the thread the event loop uses. When this thread is blocked, the event loop is blocked.
This example demonstrates why it's so important for callbacks to never block in event-driven calls, and applies equally to Node.js servers, client-side Javascript, most GUI programming frameworks, and many other asynchronous programming models.
But sometimes running time-consuming tasks is unavoidable. Not all tasks have asynchronous APIs; for example, we may be dealing with some library that only has a synchronous API, or just have to perform a potentially long computation. How can we combine such code with event-driven programming? Threads to the rescue!
### Threads for "converting" blocking calls into asynchronous calls
A thread pool can be used to turn blocking calls into asynchronous calls, by running alongside the event loop and posting events onto it when tasks are completed. Here's how it works, for a given blocking function do_work():
1. Instead of directly calling do_work() in a callback, we package it into a "task" and ask the thread pool to execute the task. We also register a callback for the loop to invoke when the task has finished; let's call iton_work_done().
2. At this point our callback can return and the event loop keeps spinning; at the same time, a thread in the pool is executing the task.
3. Once the task has finished executing, the main thread (the one running the event loop) is notified and on_work_done() is invoked by the event loop.
Let's see how this solves our previous timer/sleep example, using libuv's work scheduling API:
```
void on_after_work(uv_work_t* req, int status) {
free(req);
}
void on_work(uv_work_t* req) {
// "Work"
if (random() % 5 == 0) {
printf("Sleeping...\n");
sleep(3);
}
}
void on_timer(uv_timer_t* timer) {
uint64_t timestamp = uv_hrtime();
printf("on_timer [%" PRIu64 " ms]\n", (timestamp / 1000000) % 100000);
uv_work_t* work_req = (uv_work_t*)malloc(sizeof(*work_req));
uv_queue_work(uv_default_loop(), work_req, on_work, on_after_work);
}
int main(int argc, const char** argv) {
uv_timer_t timer;
uv_timer_init(uv_default_loop(), &timer);
uv_timer_start(&timer, on_timer, 0, 1000);
return uv_run(uv_default_loop(), UV_RUN_DEFAULT);
}
```
Instead of calling sleep directly in on_timer, we enqueue a task, represented by a handle of type work_req [[2]][14], the function to run in the task (on_work) and the function to invoke once the task is completed (on_after_work). on_workis where the "work" (the blocking/time-consuming operation) happens. Note a crucial difference between the two callbacks passed into uv_queue_work: on_work runs in the thread pool, while on_after_work runs on the main thread which also runs the event loop - just like any other callback.
Let's see this version run:
```
$ ./uv-timer-work-demo
on_timer [89571 ms]
on_timer [90572 ms]
on_timer [91573 ms]
on_timer [92575 ms]
Sleeping...
on_timer [93576 ms]
on_timer [94577 ms]
Sleeping...
on_timer [95577 ms]
on_timer [96578 ms]
on_timer [97578 ms]
...
```
The timer ticks every second, even though the sleeping function is still invoked; sleeping is now done on a separate thread and doesn't block the event loop.
### A primality-testing server, with exercises
Since sleep isn't a very exciting way to simulate work, I've prepared a more comprehensive example - a server that accepts numbers from clients over a socket, checks whether these numbers are prime and sends back either "prime" or "composite". The full [code for this server is here][15] - I won't post it here since it's long, but will rather give readers the opportunity to explore it on their own with a couple of exercises.
The server deliberatly uses a naive primality test algorithm, so for large primes it can take quite a while to return an answer. On my machine it takes ~5 seconds to compute the answer for 2305843009213693951, but YMMV.
Exercise 1: the server has a setting (via an environment variable named MODE) to either run the primality test in the socket callback (meaning on the main thread) or in the libuv work queue. Play with this setting to observe the server's behavior when multiple clients are connecting simultaneously. In blocking mode, the server will not answer other clients while it's computing a big task; in non-blocking mode it will.
Exercise 2: libuv has a default thread-pool size, and it can be configured via an environment variable. Can you use multiple clients to discover experimentally what the default size is? Having found the default thread-pool size, play with different settings to see how it affects the server's responsiveness under heavy load.
### Non-blocking file-system operations using work queues
Delegating potentially-blocking operations to a thread pool isn't good for just silly demos and CPU-intensive computations; libuv itself makes heavy use of this capability in its file-system APIs. This way, libuv accomplishes the superpower of exposing the file-system with an asynchronous API, in a portable way.
Let's take uv_fs_read(), for example. This function reads from a file (represented by a uv_fs_t handle) into a buffer [[3]][16], and invokes a callback when the reading is completed. That is, uv_fs_read() always returns immediately, even if the file sits on an NFS-like system and it may take a while for the data to get to the buffer. In other words, this API is asynchronous in the way other libuv APIs are. How does this work?
At this point we're going to look under the hood of libuv; the internals are actually fairly straightforward, and it's a good exercise. Being a portable library, libuv has different implementations of many of its functions for Windows and Unix systems. We're going to be looking at src/unix/fs.c in the libuv source tree.
The code for uv_fs_read is:
```
int uv_fs_read(uv_loop_t* loop, uv_fs_t* req,
uv_file file,
const uv_buf_t bufs[],
unsigned int nbufs,
int64_t off,
uv_fs_cb cb) {
if (bufs == NULL || nbufs == 0)
return -EINVAL;
INIT(READ);
req->file = file;
req->nbufs = nbufs;
req->bufs = req->bufsml;
if (nbufs > ARRAY_SIZE(req->bufsml))
req->bufs = uv__malloc(nbufs * sizeof(*bufs));
if (req->bufs == NULL) {
if (cb != NULL)
uv__req_unregister(loop, req);
return -ENOMEM;
}
memcpy(req->bufs, bufs, nbufs * sizeof(*bufs));
req->off = off;
POST;
}
```
It may seem puzzling at first, because it defers the real work to the INIT and POST macros, with some local variable setup for POST. This is done to avoid too much code duplication within the file.
The INIT macro is:
```
#define INIT(subtype) \
do { \
req->type = UV_FS; \
if (cb != NULL) \
uv__req_init(loop, req, UV_FS); \
req->fs_type = UV_FS_ ## subtype; \
req->result = 0; \
req->ptr = NULL; \
req->loop = loop; \
req->path = NULL; \
req->new_path = NULL; \
req->cb = cb; \
} \
while (0)
```
It sets up the request, and most importantly sets the req->fs_type field to the actual FS request type. Since uv_fs_read invokes INIT(READ), it means req->fs_type gets assigned the constant UV_FS_READ.
The POST macro is:
```
#define POST \
do { \
if (cb != NULL) { \
uv__work_submit(loop, &req->work_req, uv__fs_work, uv__fs_done); \
return 0; \
} \
else { \
uv__fs_work(&req->work_req); \
return req->result; \
} \
} \
while (0)
```
What it does depends on whether the callback is NULL. In libuv file-system APIs, a NULL callback means we actually want to perform the operation  _synchronously_ . In this case POST invokes uv__fs_work directly (we'll get to what this function does in just a bit), whereas for a non-NULL callback, it submits uv__fs_work as a work item to the work queue (which is the thread pool), and registers uv__fs_done as the callback; that function does a bit of book-keeping and invokes the user-provided callback.
If we look at the code of uv__fs_work, we'll see it uses more macros to route work to the actual file-system call as needed. In our case, for UV_FS_READ the call will be made to uv__fs_read, which (at last!) does the reading using regular POSIX APIs. This function can be safely implemented in a  _blocking_  manner, since it's placed on a thread-pool when called through the asynchronous API.
In Node.js, the fs.readFile function is mapped to uv_fs_read. Thus, reading files can be done in a non-blocking fashion even though the underlying file-system API is blocking.
* * *
[[1]][1] To ensure that this server doesn't leak memory, I ran it under Valgrind with the leak checker enabled. Since servers are often designed to run forever, this was a bit challenging; to overcome this issue I've added a "kill switch" to the server - a special sequence received from a client makes it stop the event loop and exit. The code for this is in theon_wrote_buf handler.
[[2]][2] Here we don't use work_req for much; the primality testing server discussed next will show how it's used to pass context information into the callback.
[[3]][3] uv_fs_read() provides a generalized API similar to the preadv Linux system call: it takes multiple buffers which it fills in order, and supports an offset into the file. We can ignore these features for the sake of our discussion.
--------------------------------------------------------------------------------
via: https://eli.thegreenplace.net/2017/concurrent-servers-part-4-libuv/
作者:[Eli Bendersky ][a]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://eli.thegreenplace.net/
[1]:https://eli.thegreenplace.net/2017/concurrent-servers-part-4-libuv/#id1
[2]:https://eli.thegreenplace.net/2017/concurrent-servers-part-4-libuv/#id2
[3]:https://eli.thegreenplace.net/2017/concurrent-servers-part-4-libuv/#id3
[4]:https://eli.thegreenplace.net/tag/concurrency
[5]:https://eli.thegreenplace.net/tag/c-c
[6]:https://eli.thegreenplace.net/2017/concurrent-servers-part-4-libuv/#id4
[7]:http://eli.thegreenplace.net/2017/concurrent-servers-part-1-introduction/
[8]:http://eli.thegreenplace.net/2017/concurrent-servers-part-2-threads/
[9]:http://eli.thegreenplace.net/2017/concurrent-servers-part-3-event-driven/
[10]:http://eli.thegreenplace.net/2017/concurrent-servers-part-4-libuv/
[11]:http://eli.thegreenplace.net/2017/concurrent-servers-part-3-event-driven/
[12]:http://libuv.org/
[13]:https://github.com/eliben/code-for-blog/blob/master/2017/async-socket-server/uv-server.c
[14]:https://eli.thegreenplace.net/2017/concurrent-servers-part-4-libuv/#id5
[15]:https://github.com/eliben/code-for-blog/blob/master/2017/async-socket-server/uv-isprime-server.c
[16]:https://eli.thegreenplace.net/2017/concurrent-servers-part-4-libuv/#id6
[17]:https://eli.thegreenplace.net/2017/concurrent-servers-part-4-libuv/

View File

@ -1,99 +0,0 @@
translating----geekpi
# [File better bugs with coredumpctl][1]
![](https://fedoramagazine.org/wp-content/uploads/2017/11/coredump.png-945x400.jpg)
An unfortunate fact of life is that all software has bugs, and some bugs can make the system crash. When it does, it often leaves a data file called a  _core dump_  on disk. This file contains data about your system when it crashed, and may help determine why the crash occurred. Often developers request data in the form of a  _backtrace_ , showing the flow of instructions that led to the crash. The developer can use it to fix the bug and improve the system. Heres how to easily generate a backtrace if you have a system crash.
### Getting started with coredumpctl
Most Fedora systems use the [Automatic Bug Reporting Tool (ABRT)][2] to automatically capture dumps and file bugs for crashes. However, if you have disabled this service or removed the package, this method may be helpful.
If you experience a system crash, first ensure that youre running the latest updated software. Updates often contain fixes that have already been found to fix a bug that causes critical errors and crashes. Once you update, try to recreate the situation that led to the bug.
If the crash still happens, or if youre already running the latest software, its time to use the helpful  _coredumpctl_  utility. This utility helps locate and process crashes. To see a list of all core dumps on your system, run this command:
```
coredumpctl list
```
Dont be surprised if you see a longer list than expected. Sometimes system components crash silently behind the scenes, and recover on their own. An easy way to quickly find a dump from today is to use the  _since_  option:
```
coredumpctl list --since=today
```
The  _PID_  column contains the process ID used to identify the dump. Note that number, since youll use it again along the way. Or, if you dont want to remember it, assign it to a variable you can use in the rest of the commands below:
```
MYPID=<PID>
```
To see information about the core dump, use this command (either use the  _$MYPID_  variable, or substitute the PID number):
```
coredumpctl info $MYPID
```
### Install debuginfo packages
Debugging symbols translate between data in the core dump and the instructions found in original source code. This symbol data can be quite large. Therefore, symbols are shipped in  _debuginfo_  packages separately from the packages most users run on Fedora systems. To determine which debuginfo packages you must install, start by running this command:
```
coredumpctl gdb $MYPID
```
This may result in a large amount of information to the screen. The last line may tell you to use  _dnf_  to install more debuginfo packages. Run that command [with sudo][3] to continue:
```
sudo dnf debuginfo-install <packages...>
```
Then try the  _coredumpctl gdb $MYPID_  command again. **You may need to do this repeatedly,** as other symbols are unwound in the trace.
### Capturing the backtrace
Run the following commands to log information in the debugger:
```
set logging file mybacktrace.txt
set logging on
```
You may find it helpful to turn off the pagination. For long backtraces this saves time.
```
set pagination off
```
Now run the backtrace:
```
thread apply all bt full
```
Now you can type  _quit_  to quit the debugger. The  _mybacktrace.txt_  file includes backtrace information you can attach to a bug or issue. Or if youre working with someone in real time, you can upload the text to a pastebin. Either way, you can now provide more assistance to the developer to fix the problem.
---------------------------------
作者简介:
Paul W. Frields
Paul W. Frields has been a Linux user and enthusiast since 1997, and joined the Fedora Project in 2003, shortly after launch. He was a founding member of the Fedora Project Board, and has worked on documentation, website publishing, advocacy, toolchain development, and maintaining software. He joined Red Hat as Fedora Project Leader from February 2008 to July 2010, and remains with Red Hat as an engineering manager. He currently lives with his wife and two children in Virginia.
--------------------------------------------------------------------------------
via: https://fedoramagazine.org/file-better-bugs-coredumpctl/
作者:[Paul W. Frields ][a]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://fedoramagazine.org/author/pfrields/
[1]:https://fedoramagazine.org/file-better-bugs-coredumpctl/
[2]:https://github.com/abrt/abrt
[3]:https://fedoramagazine.org/howto-use-sudo/

View File

@ -0,0 +1,130 @@
### Unleash Your Creativity Linux Programs for Drawing and Image Editing
By: [chabowski][1]
The following article is part of a series of articles that provide tips and tricks for Linux newbies or Desktop users that are not yet experienced with regard to certain topics. This series intends to complement the special edition #30 “[Getting Started with Linux][2]” based on [openSUSE Leap][3], recently published by the [Linux Magazine,][4] with valuable additional information.
![](https://www.suse.com/communities/blog/files/2017/11/DougDeMaio-450x450.jpeg)
This article has been contributed by Douglas DeMaio, openSUSE PR Expert at SUSE.
Both Mac OS or Window offer several popular programs for graphics editing, vector drawing and creating and manipulating Portable Document Format (PDF). The good news: users familiar with the Adobe Suite can transition with ease to free, open-source programs available on Linux.
Programs like [GIMP][5], [InkScape][6] and [Okular][7] are cross platform programs that are available by default in Linux/GNU distributions and are persuasive alternatives to expensive Adobe programs like [Photoshop][8], [Illustrator][9] and [Acrobat][10].
These creativity programs on Linux distributions are just as powerful as those for macOS or Window. This article will explain some of the differences and how the programs can be used to make your transition to Linux comfortable.
### Krita
The KDE desktop environment comes with tons of cool applications. [Krita][11] is a professional open source painting program. It gives users the freedom to create any artistic image they desire. Krita features tools that are much more extensive than the tool sets of most proprietary programs you might be familiar with. From creating textures to comics, Krita is a must have application for Linux users.
![](https://www.suse.com/communities/blog/files/2017/11/krita-450x267.png)
### GIMP
GNU Image Manipulation Program (GIMP) is a cross-platform image editor. Users of Photoshop will find the User Interface of GIMP to be similar to that of Photoshop. The drop down menu offers colors, layers, filters and tools to help the user with editing graphics. Rulers are located both horizontal and vertical and guide can be dragged across the screen to give exact measurements. The drop down menu gives tool options for resizing or cropping photos; adjustments can be made to the color balance, color levels, brightness and contrast as well as hue and saturation.
![](https://www.suse.com/communities/blog/files/2017/11/gimp-450x281.png)
There are multiple filters in GIMP to enhance or distort your images. Filters for artistic expression and animation are available and are more powerful tool options than those found in some proprietary applications. Gradients can be applied through additional layers and the Text Tool offers many fonts, which can be altered in shape and size through the Perspective Tool.
The cloning tool works exactly like those in other graphics editors, so manipulating images is simple and acurrate given the selection of brush sizes to do the job.
Perhaps one of the best options available with GIMP is that the images can be saved in a variety of formats like .jpg, .png, .pdf, .eps and .svg. These image options provide high-quality images in a small file.
### InkScape
Designing vector imagery with InkScape is simple and free. This cross platform allows for the creation of logos and illustrations that are highly scalable. Whether designing cartoons or creating images for branding, InkScape is a powerful application to get the job done. Like GIMP, InkScape lets you save files in various formats and allows for object manipulation like moving, rotating and skewing text and objects. Shape tools are available with InkScape so making stars, hexagons and other elements will meet the needs of your creative mind.
![](https://www.suse.com/communities/blog/files/2017/11/inkscape-450x273.png)
InkScape offers a comprehensive tool set, including a drawing tool, a pen tool and the freehand calligraphy tool that allows for object creation with your own personal style. The color selector gives you the choice of RGB, CMYK and RGBA using specific colors for branding logos, icons and advertisement is definitely convincing.
Short cut commands are similar to what users experience in Adobe Illustrator. Making layers and grouping or ungrouping the design elements can turn a blank page into a full-fledged image that can be used for designing technical diagrams for presentations, importing images into a multimedia program or for creating web graphics and software design.
Inkscape can import vector graphics from multiple other programs. It can even import bitmap images. Inkscape is one of those cross platform, open-source programs that allow users to operate across different operating systems, no matter if they work with macOS, Windows or Linux.
### Okular and LibreOffice
LibreOffice, which is a free, open-source Office Suite, allows users to collaborate and interact with documents and important files on Linux, but also on macOS and Window. You can also create PDF files via LibreOffice, and LibreOffice Draw lets you view (and edit) PDF files as images.
![](https://www.suse.com/communities/blog/files/2017/11/draw-450x273.png)
However, the Portable Document Format (PDF) is quite different on the three Operating Systems. MacOS offers [Preview][12] by default; Windows has [Edge][13]. Of course, also Adobe Reader can be used for both MacOS and Window. With Linux, and especially the desktop selection of KDE, [Okular][14] is the default program for viewing PDF files.
![](https://www.suse.com/communities/blog/files/2017/11/okular-450x273.png)
The functionality of Okular supports different types of documents, like PDF, Postscript, [DjVu][15], [CHM][16], [XPS][17], [ePub][18] and others. Yet the universal document viewer also offers some powerful features that make interacting with a document different from other programs on MacOS and Windows. Okular gives selection and search tools that make accessing the text in PDFs fluid for how users interact with documents. Viewing documents with Okular is also accommodating with the magnification tool that allows for a quick look at small text in a document.
Okular also provides users with the option to configure it to use more memory if the document is too large and freezes the Operating System. This functionality is convenient for users accessing high-quality print documents for example for advertising.
For those who want to change locked images and documents, its rather easy to do so with LibreOffice Draw. A hypothetical situation would be to take a locked IRS (or tax) form and change it to make the uneditable document editable. Imagine how much fun it could be to transform it to some humorous kind of tax form …
And indeed, the skys the limit on how creative a user wants to be when using programs that are available on Linux distributions.
![2 votes, average: 5.00 out of 5](https://www.suse.com/communities/blog/wp-content/plugins/wp-postratings/images/stars_crystal/rating_on.gif)
![2 votes, average: 5.00 out of 5](https://www.suse.com/communities/blog/wp-content/plugins/wp-postratings/images/stars_crystal/rating_on.gif)
![2 votes, average: 5.00 out of 5](https://www.suse.com/communities/blog/wp-content/plugins/wp-postratings/images/stars_crystal/rating_on.gif)
![2 votes, average: 5.00 out of 5](https://www.suse.com/communities/blog/wp-content/plugins/wp-postratings/images/stars_crystal/rating_on.gif)
![2 votes, average: 5.00 out of 5](https://www.suse.com/communities/blog/wp-content/plugins/wp-postratings/images/stars_crystal/rating_on.gif)
(
_**2** votes, average: **5.00** out of 5_
)
_You need to be a registered member to rate this post._
Tags: [drawing][19], [Getting Started with Linux][20], [GIMP][21], [image editing][22], [Images][23], [InkScape][24], [KDE][25], [Krita][26], [Leap 42.3][27], [LibreOffice][28], [Linux Magazine][29], [Okular][30], [openSUSE][31], [PDF][32] Categories: [Desktop][33], [Expert Views][34], [LibreOffice][35], [openSUSE][36]
--------------------------------------------------------------------------------
via: https://www.suse.com/communities/blog/unleash-creativity-linux-programs-drawing-image-editing/
作者:[chabowski ][a]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[1]:https://www.suse.com/communities/blog/author/chabowski/
[2]:http://www.linux-magazine.com/Resources/Special-Editions/30-Getting-Started-with-Linux
[3]:https://en.opensuse.org/Portal:42.3
[4]:http://www.linux-magazine.com/
[5]:https://www.gimp.org/
[6]:https://inkscape.org/en/
[7]:https://okular.kde.org/
[8]:http://www.adobe.com/products/photoshop.html
[9]:http://www.adobe.com/products/illustrator.html
[10]:https://acrobat.adobe.com/us/en/acrobat/acrobat-pro-cc.html
[11]:https://krita.org/en/
[12]:https://en.wikipedia.org/wiki/Preview_(macOS)
[13]:https://en.wikipedia.org/wiki/Microsoft_Edge
[14]:https://okular.kde.org/
[15]:http://djvu.org/
[16]:https://fileinfo.com/extension/chm
[17]:https://fileinfo.com/extension/xps
[18]:http://idpf.org/epub
[19]:https://www.suse.com/communities/blog/tag/drawing/
[20]:https://www.suse.com/communities/blog/tag/getting-started-with-linux/
[21]:https://www.suse.com/communities/blog/tag/gimp/
[22]:https://www.suse.com/communities/blog/tag/image-editing/
[23]:https://www.suse.com/communities/blog/tag/images/
[24]:https://www.suse.com/communities/blog/tag/inkscape/
[25]:https://www.suse.com/communities/blog/tag/kde/
[26]:https://www.suse.com/communities/blog/tag/krita/
[27]:https://www.suse.com/communities/blog/tag/leap-42-3/
[28]:https://www.suse.com/communities/blog/tag/libreoffice/
[29]:https://www.suse.com/communities/blog/tag/linux-magazine/
[30]:https://www.suse.com/communities/blog/tag/okular/
[31]:https://www.suse.com/communities/blog/tag/opensuse/
[32]:https://www.suse.com/communities/blog/tag/pdf/
[33]:https://www.suse.com/communities/blog/category/desktop/
[34]:https://www.suse.com/communities/blog/category/expert-views/
[35]:https://www.suse.com/communities/blog/category/libreoffice/
[36]:https://www.suse.com/communities/blog/category/opensuse/

View File

@ -0,0 +1,59 @@
### System Logs: Understand Your Linux System
![chabowski](https://www.suse.com/communities/blog/files/2016/03/chabowski_avatar_1457537819-100x100.jpg)
By: [chabowski][1]
The following article is part of a series of articles that provide tips and tricks for Linux newbies or Desktop users that are not yet experienced with regard to certain topics). This series intends to complement the special edition #30 “[Getting Started with Linux][2]” based on [openSUSE Leap][3], recently published by the [Linux Magazine,][4] with valuable additional information.
This article has been contributed by Romeo S. Romeo is a PDX-based enterprise Linux professional specializing in scalable solutions for innovative corporations looking to disrupt the marketplace.
System logs are incredibly important files in Linux. Special programs that run in the background (usually called daemons or servers) handle most of the tasks on your Linux system. Whenever these daemons do anything, they write the details of the task to a log file as a sort of “history” of what theyve been up to. These daemons perform actions ranging from syncing your clock with an atomic clock to managing your network connection. All of this is written to log files so that if something goes wrong, you can look into the specific log file and see what happened.
![](https://www.suse.com/communities/blog/files/2017/11/markus-spiske-153537-300x450.jpg)
Photo by Markus Spiske on Unsplash
There are many different logs on your Linux computer. Historically, they were mostly stored in the /var/log directory in a plain text format. Quite a few still are, and you can read them easily with the less pager. On your freshly installed openSUSE Leap 42.3 system, and on most modern systems, important logs are stored by the systemd init system. This is the system that handles starting up daemons and getting the computer ready for use on startup. The logs handled by systemd are stored in a binary format, which means that they take up less space and can more easily be viewed or exported in various formats, but the downside is that you need a special tool to view them. Luckily, this tool comes installed on your system: its called journalctl and by default, it records all of the logs from every daemon to one location.
To take a look at your systemd log, just run the journalctl command. This will open up the combined logs in the less pager. To get a better idea of what youre looking at, see a single log entry from journalctl here:
```
Jul 06 11:53:47 aaathats3as pulseaudio[2216]: [pulseaudio] alsa-util.c: Disabling timer-based scheduling because running inside a VM.
```
This individual log entry contains (in order) the date and time of the entry, the hostname of the computer, the name of the process that logged the entry, the PID (process ID number) of the process that logged the entry, and then the log entry itself.
If a program running on your system is misbehaving, look at the log file and search (with the “/” key followed by the search term) for the name of the program. Chances are that if the program is reporting errors that are causing it to malfunction, then the errors will show up in the system log. Sometimes errors are verbose enough for you to be able to fix them yourself. Other times, you have to search for a solution on the Web. Google is usually the most convenient search engine to use for weird Linux problems
![](https://www.suse.com/communities/blog/files/2017/09/Sunglasses_Emoji-450x450.png)
. However, be sure that you only enter the actual log entry, because the rest of the information at the beginning of the line (date, host name, PID) is unnecessary and could return false positives.
After you search for the problem, the first few results are usually pages containing various things that you can try for solutions. Of course, you shouldnt just follow random instructions that you find on the Internet: always be sure to do additional research into what exactly you will be doing and what the effects of it are before following any instructions. With that being said, the results for a specific entry from the systems log file are usually much more useful than results from searching more generic terms that describe the malfunctioning of the program directly. This is because many different things could cause a program to misbehave, and multiple problems could cause identical misbehaviors.
For example, a lack of audio on the system could be due to a massive amount of different reasons, ranging from speakers not being plugged in, to back end sound systems misbehaving, to a lack of the proper drivers. If you search for a general problem, youre likely to see a lot of irrelevant solutions and youll end up wasting your time on a wild goose chase. With a specific search of an actual line from a log file, you can see other people who have had the same log entry. See Picture 1 and Picture 2 to compare and contrast between the two types of searching.
![](https://www.suse.com/communities/blog/files/2017/11/picture1-450x450.png)
Picture 1 shows generic, unspecific Google results for a general misbehavior of the system. This type of searching generally doesnt help much.
![](https://www.suse.com/communities/blog/files/2017/11/picture2-450x450.png)
Picture 2 shows more specific, helpful Google results for a particular log file line. This type of searching is generally very helpful.
There are some systems that log their actions outside of journalctl. The most important ones that you may find yourself dealing with on a desktop system are /var/log/zypper.log for openSUSEs package manager, /var/log/boot.log for those messages that scroll by too fast to be read when you turn your system on, and /var/log/ntp if your Network Time Protocol Daemon is having troubles syncing time. One more important place to look for errors if youre having problems with specific hardware is the Kernel Ring Buffer, which you can read by typing the dmesg -H command (this opens in the less pager as well). The Kernel Ring Buffer is stored in RAM, so you lose it when you reboot your system, but it contains important messages from the Linux kernel about important events, such as hardware being added, modules being loaded, or strange network errors.
Hopefully you are prepared now to understand your Linux system better! Have a lot of fun!
--------------------------------------------------------------------------------
via: https://www.suse.com/communities/blog/system-logs-understand-linux-system/
作者:[chabowski]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[1]:https://www.suse.com/communities/blog/author/chabowski/
[2]:http://www.linux-magazine.com/Resources/Special-Editions/30-Getting-Started-with-Linux
[3]:https://en.opensuse.org/Portal:42.3
[4]:http://www.linux-magazine.com/

View File

@ -1,3 +1,4 @@
Translating by ValoniaKim
Language engineering for great justice
============================================================

View File

@ -0,0 +1,75 @@
translating by zrszrszrs
# [Mark McIntyre: How Do You Fedora?][1]
![](https://fedoramagazine.org/wp-content/uploads/2017/11/mock-couch-945w-945x400.jpg)
We recently interviewed Mark McIntyre on how he uses Fedora. This is [part of a series][2] on the Fedora Magazine. The series profiles Fedora users and how they use Fedora to get things done. Contact us on the [feedback form][3] to express your interest in becoming a interviewee.
### Who is Mark McIntyre?
Mark McIntyre is a geek by birth and Linux by choice. “I started coding at the early age of 13 learning BASIC on my own and finding the excitement of programming which led me down a path of becoming a professional coder,” he says. McIntyre and his niece are big fans of pizza. “My niece and I started a quest last fall to try as many of the pizza joints in Knoxville. You can read about our progress at [https://knox-pizza-quest.blogspot.com/][4]” Mark is also an amateur photographer and [publishes his images][5] on Flickr.
![](https://fedoramagazine.org/wp-content/uploads/2017/11/31456893222_553b3cac4d_k-1024x575.jpg)
Mark has a diverse background as a developer. He has worked with Visual Basic for Applications, LotusScript, Oracles PL/SQL, Tcl/Tk and Python with Django as the framework. His strongest skill is Python which he uses in his current job as a systems engineer. “I am using Python on a regular basis. As my job is morphing into more of an automation engineer, that became more frequent.”
McIntyre is a self-described nerd and loves sci-fi movies, but his favorite movie falls out of that genre. “As much as I am a nerd and love the Star Trek and Star Wars and related movies, the movie Glory is probably my favorite of all time.” He also mentioned that Serenity was a fantastic follow-up to a great TV series.
Mark values humility, knowledge and graciousness in others. He appreciates people who act based on understanding the situation that other people are in. “If you add a decision to serve another, you have the basis for someone youd want to be around instead of someone who you have to tolerate.”
McIntyre works for [Scripps Networks Interactive][6], which is the parent company for HGTV, Food Network, Travel Channel, DIY, GAC, and several other cable channels. “Currently, I function as a systems engineer for the non-linear video content, which is all the media purposed for online consumption.” He supports a few development teams who write applications to publish the linear video from cable TV into the online formats such as Amazon and Hulu. The systems include both on-premise and cloud systems. Mark also develops automation tools for deploying these applications primarily to a cloud infrastructure.
### The Fedora community
Mark describes the Fedora community as an active community filled with people who enjoy life as Fedora users. “From designers to packagers, this group is still very active and feels alive.” McIntyre continues, “That gives me a sense of confidence in the operating system.”
He started frequenting the #fedora channel on IRC around 2002: “Back then, Wi-Fi functionality was still done a lot by hand in starting the adapter and configuring the modules.” In order to get his Wi-Fi working he had to recompile the Fedora kernel. Shortly after, he started helping others in the #fedora channel.
McIntyre encourages others to get involved in the Fedora Community. “There are many different areas of opportunity in which to be involved. Front-end design, testing deployments, development, packaging of applications, and new technology implementation.” He recommends picking an area of interest and asking questions of that group. “There are many opportunities available to jump in to contribute.”
He credits a fellow community member with helping him get started: “Ben Williams was very helpful in my first encounters with Fedora, helping me with some of my first installation rough patches in the #fedora support channel.” Ben also encouraged Mark to become an [Ambassador][7].
### What hardware and software?
McIntyre uses Fedora Linux on all his laptops and desktops. On servers he chooses CentOS, due to the longer support lifecycle. His current desktop is self-built and equipped with an Intel Core i5 processor, 32 GB of RAM and 2 TB of disk space. “I have a 4K monitor attached which gives me plenty of room for viewing all my applications at once.” His current work laptop is a Dell Inspiron 2-in-1 13-inch laptop with 16 GB RAM and a 525 GB m.2 SSD.
![](https://fedoramagazine.org/wp-content/uploads/2017/11/Screenshot-from-2017-10-26-08-51-41-1024x640.png)
Mark currently runs Fedora 26 on any box he setup in the past few months. When it comes to new versions he likes to avoid the rush when the version is officially released. “I usually try to get the latest version as soon as it goes gold, with the exception of one of my workstations running the next versions beta when it is closer to release.” He usually upgrades in place: “The in-place upgrade using  _dnf system-upgrade_  works very well these days.”
To handle his photography, McIntyre uses [GIMP][8] and [Darktable][9], along with a few other photo viewing and quick editing packages. When not using web-based email, he uses [Geary][10] along with [GNOME Calendar][11]. Marks IRC client of choice is [HexChat][12] connecting to a [ZNC bouncer][13]running on a Fedora Server instance. His departments communication is handled via Slack.
“I have never really been a big IDE fan, so I spend time in [vim][14] for most of my editing.” Occasionally, he opens up a simple text editor like [gedit][15] or [xed][16]. Mark uses [GPaste][17] for  copying and pasting. “I have become a big fan of [Tilix][18] for my terminal choice.” McIntyre manages the podcasts he likes with [Rhythmbox][19], and uses [Epiphany][20] for quick web lookups.
--------------------------------------------------------------------------------
via: https://fedoramagazine.org/mark-mcintyre-fedora/
作者:[Charles Profitt][a]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://fedoramagazine.org/author/cprofitt/
[1]:https://fedoramagazine.org/mark-mcintyre-fedora/
[2]:https://fedoramagazine.org/tag/how-do-you-fedora/
[3]:https://fedoramagazine.org/submit-an-idea-or-tip/
[4]:https://knox-pizza-quest.blogspot.com/
[5]:https://www.flickr.com/photos/mockgeek/
[6]:http://www.scrippsnetworksinteractive.com/
[7]:https://fedoraproject.org/wiki/Ambassadors
[8]:https://www.gimp.org/
[9]:http://www.darktable.org/
[10]:https://wiki.gnome.org/Apps/Geary
[11]:https://wiki.gnome.org/Apps/Calendar
[12]:https://hexchat.github.io/
[13]:https://wiki.znc.in/ZNC
[14]:http://www.vim.org/
[15]:https://wiki.gnome.org/Apps/Gedit
[16]:https://github.com/linuxmint/xed
[17]:https://github.com/Keruspe/GPaste
[18]:https://fedoramagazine.org/try-tilix-new-terminal-emulator-fedora/
[19]:https://wiki.gnome.org/Apps/Rhythmbox
[20]:https://wiki.gnome.org/Apps/Web

View File

@ -0,0 +1,73 @@
translating---geekpi
# LibreOffice Is Now Available on Flathub, the Flatpak App Store
![LibreOffice on Flathub](http://www.omgubuntu.co.uk/wp-content/uploads/2017/11/libroffice-on-flathub-750x250.jpeg)
LibreOffice is now available to install from [Flathub][3], the centralised Flatpak app store.
Its arrival allows anyone running a modern Linux distribution to install the latest stable release of LibreOffice in a click or two, without having to hunt down a PPA, tussle with tarballs or wait for a distro provider to package it up.
A [LibreOffice Flatpak][5] has been available for users to download and install since August of last year and the [LibreOffice 5.2][6] release.
Whats “new” here is the distribution method. Rather than release updates through their own dedicated server The Document Foundation has opted to use Flathub.
This is  _great_  news for end users as it means theres one less repo to worry about adding on a fresh install, but its also good news for Flatpak advocates too: LibreOffice is open-source softwares most popular productivity suite. Its support for both format and app store is sure to be warmly welcomed.
At the time of writing you can install LibreOffice 5.4.2 from Flathub. New stable releases will be added as and when theyre released.
### Enable Flathub on Ubuntu
![](http://www.omgubuntu.co.uk/wp-content/uploads/2017/11/flathub-750x495.png)
Fedora, Arch, and Linux Mint 18.3 users have Flatpak installed, ready to go, out of the box. Mint even comes with the Flathub remote pre-enabled.
[Install LibreOffice from Flathub][7]
To get Flatpak up and running on Ubuntu you first have to install it:
```
sudo apt install flatpak gnome-software-plugin-flatpak
```
To be able to install apps from Flathub you need to add the Flathub remote server:
```
flatpak remote-add --if-not-exists flathub https://flathub.org/repo/flathub.flatpakrepo
```
Thats pretty much it. Just log out and back in (so that Ubuntu Software refreshes its cache) and you  _should_  be able to find any Flatpak apps available on Flathub through the Ubuntu Software app.
In this instance, search for “LibreOffice” and locate the result that has a line of text underneath mentioning Flathub. (Do bear in mind that Ubuntu has tweaked the Software client to shows Snap app results above everything else, so you may need scroll down the list of results to see it).
There is a [bug with installing Flatpak apps][8] from a flatpakref file, so if the above method doesnt work you can also install Flatpak apps form Flathub using the command line.
The Flathub website lists the command needed to install each app. Switch to the “Command Line” tab to see them.
#### More apps on Flathub
If you read this site regularly enough youll know that I  _love_  Flathub. Its home to some of my favourite apps (Corebird, Parlatype, GNOME MPV, Peek, Audacity, GIMP… etc). I get the latest, stable versions of these apps (plus any dependencies they need) without compromise.
And, as I tweeted a week or so back, most Flatpak apps now look great with GTK themes — no more [workarounds][9]required!
--------------------------------------------------------------------------------
via: http://www.omgubuntu.co.uk/2017/11/libreoffice-now-available-flathub-flatpak-app-store
作者:[ JOEY SNEDDON ][a]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://plus.google.com/117485690627814051450/?rel=author
[1]:https://plus.google.com/117485690627814051450/?rel=author
[2]:http://www.omgubuntu.co.uk/category/news
[3]:http://www.flathub.org/
[4]:http://www.omgubuntu.co.uk/2017/11/libreoffice-now-available-flathub-flatpak-app-store
[5]:http://www.omgubuntu.co.uk/2016/08/libreoffice-5-2-released-whats-new
[6]:http://www.omgubuntu.co.uk/2016/08/libreoffice-5-2-released-whats-new
[7]:https://flathub.org/repo/appstream/org.libreoffice.LibreOffice.flatpakref
[8]:https://bugs.launchpad.net/ubuntu/+source/gnome-software/+bug/1716409
[9]:http://www.omgubuntu.co.uk/2017/05/flatpak-theme-issue-fix

View File

@ -0,0 +1,143 @@
HankChow Translating
How do groups work on Linux?
============================================================
Hello! Last week, I thought I knew how users and groups worked on Linux. Here is what I thought:
1. Every process belongs to a user (like `julia`)
2. When a process tries to read a file owned by a group, Linux a) checks if the user `julia` can access the file, and b) checks which groups `julia` belongs to, and whether any of those groups owns & can access that file
3. If either of those is true (or if the any bits are set right) then the process can access the file
So, for example, if a process is owned by the `julia` user and `julia` is in the `awesome` group, then the process would be allowed to read this file.
```
r--r--r-- 1 root awesome 6872 Sep 24 11:09 file.txt
```
I had not thought carefully about this, but if pressed I would have said that it probably checks the `/etc/group` file at runtime to see what groups youre in.
### that is not how groups work
I found out at work last week that, no, what I describe above is not how groups work. In particular Linux does **not** check which groups a processs user belongs to every time that process tries to access a file.
Here is how groups actually work! I learned this by reading Chapter 9 (“Process Credentials”) of [The Linux Programming Interface][1] which is an incredible book. As soon as I realized that I did not understand how users and groups worked, I opened up the table of contents with absolute confidence that it would tell me whats up, and I was right.
### how users and groups checks are done
They key new insight for me was pretty simple! The chapter starts out by saying that user and group IDs are **attributes of the process**:
* real user ID and group ID;
* effective user ID and group ID;
* saved set-user-ID and saved set-group-ID;
* file-system user ID and group ID (Linux-specific); and
* supplementary group IDs.
This means that the way Linux **actually** does group checks to see a process can read a file is:
* look at the processs group IDs & supplementary group IDs (from the attributes on the process, **not** by looking them up in `/etc/group`)
* look at the group on the file
* see if they match
Generally when doing access control checks it uses the **effective** user/group ID, not the real user/group ID. Technically when accessing a file it actually uses the **file-system** ids but those are usually the same as the effective uid/gid.
### Adding a user to a group doesnt put existing processes in that group
Heres another fun example that follows from this: if I create a new `panda` group and add myself (bork) to it, then run `groups` to check my group memberships Im not in the panda group!
```
bork@kiwi~> sudo addgroup panda
Adding group `panda' (GID 1001) ...
Done.
bork@kiwi~> sudo adduser bork panda
Adding user `bork' to group `panda' ...
Adding user bork to group panda
Done.
bork@kiwi~> groups
bork adm cdrom sudo dip plugdev lpadmin sambashare docker lxd
```
no `panda` in that list! To double check, lets try making a file owned by the `panda`group and see if I can access it:
```
$ touch panda-file.txt
$ sudo chown root:panda panda-file.txt
$ sudo chmod 660 panda-file.txt
$ cat panda-file.txt
cat: panda-file.txt: Permission denied
```
Sure enough, I cant access `panda-file.txt`. No big surprise there. My shell didnt have the `panda` group as a supplementary GID before, and running `adduser bork panda` didnt do anything to change that.
### how do you get your groups in the first place?
So this raises kind of a confusing question, right if processes have groups baked into them, how do you get assigned your groups in the first place? Obviously you cant assign yourself more groups (that would defeat the purpose of access control).
Its relatively clear how processes I **execute** from my shell (bash/fish) get their groups my shell runs as me, and it has a bunch of group IDs on it. Processes I execute from my shell are forked from the shell so they get the same groups as the shell had.
So there needs to be some “first” process that has your groups set on it, and all the other processes you set inherit their groups from that. That process is called your **login shell** and its run by the `login` program (`/bin/login`) on my laptop. `login` runs as root and calls a C function called `initgroups` to set up your groups (by reading `/etc/group`). Its allowed to set up your groups because it runs as root.
### lets try logging in again!
So! Lets say I am running in a shell, and I want to refresh my groups! From what weve learned about how groups are initialized, I should be able to run `login` to refresh my groups and start a new login shell!
Lets try it:
```
$ sudo login bork
$ groups
bork adm cdrom sudo dip plugdev lpadmin sambashare docker lxd panda
$ cat panda-file.txt # it works! I can access the file owned by `panda` now!
```
Sure enough, it works! Now the new shell that `login` spawned is part of the `panda` group! Awesome! This wont affect any other shells I already have running. If I really want the new `panda` group everywhere, I need to restart my login session completely, which means quitting my window manager and logging in again.
### newgrp
Somebody on Twitter told me that if you want to start a new shell with a new group that youve been added to, you can use `newgrp`. Like this:
```
sudo addgroup panda
sudo adduser bork panda
newgrp panda # starts a new shell, and you don't have to be root to run it!
```
You can accomplish the same(ish) thing with `sg panda bash` which will start a `bash` shell that runs with the `panda` group.
### setuid sets the effective user ID
Ive also always been a little vague about what it means for a process to run as “setuid root”. It turns out that setuid sets the effective user ID! So if I (`julia`) run a setuid root process (like `passwd`), then the **real** user ID will be set to `julia`, and the **effective** user ID will be set to `root`.
`passwd` needs to run as root, but it can look at its real user ID to see that `julia`started the process, and prevent `julia` from editing any passwords except for `julia`s password.
### thats all!
There are a bunch more details about all the edge cases and exactly how everything works in The Linux Programming Interface so I will not get into all the details here. That book is amazing. Everything I talked about in this post is from Chapter 9, which is a 17-page chapter inside a 1300-page book.
The thing I love most about that book is that reading 17 pages about how users and groups work is really approachable, self-contained, super useful, and I dont have to tackle all 1300 pages of it at once to learn helpful things :)
--------------------------------------------------------------------------------
via: https://jvns.ca/blog/2017/11/20/groups/
作者:[Julia Evans ][a]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://jvns.ca/
[1]:http://man7.org/tlpi/

View File

@ -0,0 +1,185 @@
Translating by filefi
How to Install and Use Wireshark on Debian 9 / Ubuntu 16.04 / 17.10
============================================================
by [Pradeep Kumar][1] · Published November 29, 2017 · Updated November 29, 2017
[![wireshark-Debian-9-Ubuntu 16.04 -17.10](https://www.linuxtechi.com/wp-content/uploads/2017/11/wireshark-Debian-9-Ubuntu-16.04-17.10.jpg)][2]
Wireshark is free and open source, cross platform, GUI based Network packet analyzer that is available for Linux, Windows, MacOS, Solaris etc. It captures network packets in real time & presents them in human readable format. Wireshark allows us to monitor the network packets up to microscopic level. Wireshark also has a command line utility called tshark that performs the same functions as Wireshark but through terminal & not through GUI.
Wireshark can be used for network troubleshooting, analyzing, software & communication protocol development & also for education purposed. Wireshark uses a library called pcap for capturing the network packets.
Wireshark comes with a lot of features & some those features are;
* Support for a hundreds of protocols for inspection,
* Ability to capture packets in real time & save them for later offline analysis,
* A number of filters to analyzing data,
* Data captured can be compressed & uncompressed on the fly,
* Various file formats for data analysis supported, output can also be saved to XML, CSV, plain text formats,
* data can be captured from a number of interfaces like ethernet, wifi, bluetooth, USB, Frame relay , token rings etc.
In this article, we will discuss how to install Wireshark on Ubuntu/Debain machines & will also learn to use Wireshark for capturing network packets.
#### Installation of Wireshark on Ubuntu 16.04 / 17.10
Wireshark is available with default Ubuntu repositories & can be simply installed using the following command. But there might be chances that you will not get the latest version of wireshark.
```
linuxtechi@nixworld:~$ sudo apt-get update
linuxtechi@nixworld:~$ sudo apt-get install wireshark -y
```
So to install latest version of wireshark we have to enable or configure official wireshark repository.
Use the beneath commands one after the another to configure repository and to install latest version of Wireshark utility
```
linuxtechi@nixworld:~$ sudo add-apt-repository ppa:wireshark-dev/stable
linuxtechi@nixworld:~$ sudo apt-get update
linuxtechi@nixworld:~$ sudo apt-get install wireshark -y
```
Once the Wireshark is installed execute the below command so that non-root users can capture live packets of interfaces,
```
linuxtechi@nixworld:~$ sudo setcap 'CAP_NET_RAW+eip CAP_NET_ADMIN+eip' /usr/bin/dumpcap
```
#### Installation of Wireshark on Debian 9
Wireshark package and its dependencies are already present in the default debian 9 repositories, so to install latest and stable version of Wireshark on Debian 9, use the following command:
```
linuxtechi@nixhome:~$ sudo apt-get update
linuxtechi@nixhome:~$ sudo apt-get install wireshark -y
```
During the installation, it will prompt us to configure dumpcap for non-superusers,
Select yes and then hit enter.
[![Configure-Wireshark-Debian9](https://www.linuxtechi.com/wp-content/uploads/2017/11/Configure-Wireshark-Debian9-1024x542.jpg)][3]
Once the Installation is completed, execute the below command so that non-root users can also capture the live packets of the interfaces.
```
linuxtechi@nixhome:~$ sudo chmod +x /usr/bin/dumpcap
```
We can also use the latest source package to install the wireshark on Ubuntu/Debain & many other Linux distributions.
#### Installing Wireshark using source code on Debian / Ubuntu Systems
Firstly download the latest source package (which is 2.4.2 at the time for writing this article), use the following command,
```
linuxtechi@nixhome:~$ wget https://1.as.dl.wireshark.org/src/wireshark-2.4.2.tar.xz
```
Next extract the package & enter into the extracted directory,
```
linuxtechi@nixhome:~$ tar -xf wireshark-2.4.2.tar.xz -C /tmp
linuxtechi@nixhome:~$ cd /tmp/wireshark-2.4.2
```
Now we will compile the code with the following commands,
```
linuxtechi@nixhome:/tmp/wireshark-2.4.2$ ./configure --enable-setcap-install
linuxtechi@nixhome:/tmp/wireshark-2.4.2$ make
```
Lastly install the compiled packages to install Wireshark on the system,
```
linuxtechi@nixhome:/tmp/wireshark-2.4.2$ sudo make install
linuxtechi@nixhome:/tmp/wireshark-2.4.2$ sudo ldconfig
```
Upon installation a separate group for Wireshark will also be created, we will now add our user to the group so that it can work with wireshark otherwise you might get permission denied error when starting wireshark.
To add the user to the wireshark group, execute the following command,
```
linuxtechi@nixhome:~$ sudo usermod -a -G wireshark linuxtechi
```
Now we can start wireshark either from GUI Menu or from terminal with this command,
```
linuxtechi@nixhome:~$ wireshark
```
#### Access Wireshark on Debian 9 System
[![Access-wireshark-debian9](https://www.linuxtechi.com/wp-content/uploads/2017/11/Access-wireshark-debian9-1024x664.jpg)][4]
Click on Wireshark icon
[![Wireshark-window-debian9](https://www.linuxtechi.com/wp-content/uploads/2017/11/Wireshark-window-debian9-1024x664.jpg)][5]
#### Access Wireshark on Ubuntu 16.04 / 17.10
[![Access-wireshark-Ubuntu](https://www.linuxtechi.com/wp-content/uploads/2017/11/Access-wireshark-Ubuntu-1024x664.jpg)][6]
Click on Wireshark icon
[![Wireshark-window-Ubuntu](https://www.linuxtechi.com/wp-content/uploads/2017/11/Wireshark-window-Ubuntu-1024x664.jpg)][7]
#### Capturing and Analyzing packets
Once the wireshark has been started, we should be presented with the wireshark window, example is shown above for Ubuntu and Debian system.
[![wireshark-Linux-system](https://www.linuxtechi.com/wp-content/uploads/2017/11/wireshark-Linux-system.jpg)][8]
All these are the interfaces from where we can capture the network packets. Based on the interfaces you have on your system, this screen might be different for you.
We are selecting enp0s3 for capturing the network traffic for that inteface. After selecting the inteface, network packets for all the devices on our network start to populate (refer to screenshot below)
[![Capturing-Packet-from-enp0s3-Ubuntu-Wireshark](https://www.linuxtechi.com/wp-content/uploads/2017/11/Capturing-Packet-from-enp0s3-Ubuntu-Wireshark-1024x727.jpg)][9]
First time we see this screen we might get overwhelmed by the data that is presented in this screen & might have thought how to sort out this data but worry not, one the best features of Wireshark is its filters.
We can sort/filter out the data based on IP address, Port number, can also used source & destination filters, packet size etc & can also combine 2 or more filters together to create more comprehensive searches. We can either write our filters in Apply a Display Filter tab , or we can also select one of already created rules. To select pre-built filter, click on flag icon , next to Apply a Display Filter tab,
[![Filter-in-wireshark-Ubuntu](https://www.linuxtechi.com/wp-content/uploads/2017/11/Filter-in-wireshark-Ubuntu-1024x727.jpg)][10]
We can also filter data based on the color coding, By default, light purple is TCP traffic, light blue is UDP traffic, and black identifies packets with errors , to see what these codes mean, click View -> Coloring Rules, also we can change these codes.
[![Packet-Colouring-Wireshark](https://www.linuxtechi.com/wp-content/uploads/2017/11/Packet-Colouring-Wireshark-1024x682.jpg)][11]
After we have the results that we need, we can then click on any of the captured packets to get more details about that packet, this will show all the data about that network packet.
Wireshark is an extremely powerful tool takes some time to getting used to & make a command over it, this tutorial will help you get started. Please feel free to drop in your queries or suggestions in the comment box below.
--------------------------------------------------------------------------------
via: https://www.linuxtechi.com
作者:[Pradeep Kumar][a]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://www.linuxtechi.com/author/pradeep/
[1]:https://www.linuxtechi.com/author/pradeep/
[2]:https://www.linuxtechi.com/wp-content/uploads/2017/11/wireshark-Debian-9-Ubuntu-16.04-17.10.jpg
[3]:https://www.linuxtechi.com/wp-content/uploads/2017/11/Configure-Wireshark-Debian9.jpg
[4]:https://www.linuxtechi.com/wp-content/uploads/2017/11/Access-wireshark-debian9.jpg
[5]:https://www.linuxtechi.com/wp-content/uploads/2017/11/Wireshark-window-debian9.jpg
[6]:https://www.linuxtechi.com/wp-content/uploads/2017/11/Access-wireshark-Ubuntu.jpg
[7]:https://www.linuxtechi.com/wp-content/uploads/2017/11/Wireshark-window-Ubuntu.jpg
[8]:https://www.linuxtechi.com/wp-content/uploads/2017/11/wireshark-Linux-system.jpg
[9]:https://www.linuxtechi.com/wp-content/uploads/2017/11/Capturing-Packet-from-enp0s3-Ubuntu-Wireshark.jpg
[10]:https://www.linuxtechi.com/wp-content/uploads/2017/11/Filter-in-wireshark-Ubuntu.jpg
[11]:https://www.linuxtechi.com/wp-content/uploads/2017/11/Packet-Colouring-Wireshark.jpg

View File

@ -0,0 +1,115 @@
Excellent Business Software Alternatives For Linux
-------
Many business owners choose to use Linux as the operating system for their operations for a variety of reasons.
1. Firstly, they don't have to pay anything for the privilege, and that is a massive bonus during the early stages of a company where money is tight.
2. Secondly, Linux is a light alternative compared to Windows and other popular operating systems available today.
Of course, lots of entrepreneurs worry they won't have access to some of the essential software packages if they make that move. However, as you will discover throughout this post, there are plenty of similar tools that will cover all the bases.
[![](https://4.bp.blogspot.com/-xwLuDRdB6sw/Whxx0Z5pI5I/AAAAAAAADhU/YWHID8GU9AgrXRfeTz4HcDZkG-XWZNbSgCLcBGAs/s400/4444061098_6eeaa7dc1a_z.jpg)][3]
### Alternatives to Microsoft Word
All company bosses will require access to a word processing tool if they want to ensure the smooth running of their operation according to
[the latest article from Fareed Siddiqui][4]
. You'll need that software to write business plans, letters, and many other jobs within your firm. Thankfully, there are a variety of alternatives you might like to select if you opt for the Linux operating system. Some of the most popular ones include:
* LibreOffice Writer
* AbiWord
* KWord
* LaTeX
So, you just need to read some online reviews and then download the best word processor based on your findings. Of course, if you're not satisfied with the solution, you should take a look at some of the other ones on that list. In many instances, any of the programs mentioned above should work well.
### Alternatives to Microsoft Excel
[![](https://4.bp.blogspot.com/-XdS6bSLQbOU/WhxyeWZeeCI/AAAAAAAADhc/C3hGY6rgzX4m2emunot80-4URu9-aQx8wCLcBGAs/s400/28929069495_e85d2626ba_z.jpg)][5]
You need a spreadsheet tool if you want to ensure your business doesn't get into trouble when it comes to bookkeeping and inventory control. There are specialist software packages on the market for both of those tasks, but
[open-source alternatives][6]
to Microsoft Excel will give you the most amount of freedom when creating your spreadsheets and editing them. While there are other packages out there, some of the best ones for Linux users include:
* [LibreOffice Calc][1]
* KSpread
* Gnumeric
Those programs work in much the same way as Microsoft Excel, and so you can use them for issues like accounting and stock control. You might also use that software to monitor employee earnings or punctuality. The possibilities are endless and only limited by your imagination.
### Alternatives to Adobe Photoshop
[![](https://3.bp.blogspot.com/-Id9Dm3CIXmc/WhxzGIlv3zI/AAAAAAAADho/VfIRCAbJMjMZzG2M97-uqLV9mOhqN7IWACLcBGAs/s400/32206185926_c69accfcef_z.jpg)][7]
Company bosses require access to design programs when developing their marketing materials and creating graphics for their websites. You might also use software of that nature to come up with a new business logo at some point. Lots of entrepreneurs spend a fortune on
[Training Connections Photoshop classes][8]
and those available from other providers. They do that in the hope of educating their teams and getting the best results. However, people who use Linux can still benefit from that expertise if they select one of the following
[alternatives][9]
:
* GIMP
* Krita
* Pixel
* LightZone
The last two suggestions on that list require a substantial investment. Still, they function in much the same way as Adobe Photoshop, and so you should manage to achieve the same quality of work.
### Other software solutions that you'll want to consider
Alongside those alternatives to some of the most widely-used software packages around today, business owners should take a look at the full range of products they could use with the Linux operating system. Here are some tools you might like to research and consider:
* Inkscape - similar to Coreldraw
* LibreOffice Base - similar to Microsoft Access
* LibreOffice Impress - similar to Microsoft PowerPoint
* File Roller - siThis is a contributed postmilar to WinZip
* Linphone - similar to Skype
There are
[lots of other programs][10]
you'll also want to research, and so the best solution is to use the internet to learn more. You will find lots of reviews from people who've used the software in the past, and many of them will compare the tool to its Windows or iOS alternative. So, you shouldn't have to work too hard to identify the best ones and sort the wheat from the chaff.
Now you have all the right information; it's time to weigh all the pros and cons of Linux and work out if it's suitable for your operation. In most instances, that operating system does not place any limits on your business activities. It's just that you need to use different software compared to some of your competitors. People who use Linux tend to benefit from improved security, speed, and performance. Also, the solution gets regular updates, and so it's growing every single day. Unlike Windows and other solutions; you can customize Linux to meet your requirements. With that in mind, do not make the mistake of overlooking this fantastic system!
--------------------------------------------------------------------------------
via: http://linuxblog.darkduck.com/2017/11/excellent-business-software.html
作者:[DarkDuck][a]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:http://linuxblog.darkduck.com/
[1]:http://linuxblog.darkduck.com/2015/08/pivot-tables-in-libreoffice-calc.html
[3]:https://4.bp.blogspot.com/-xwLuDRdB6sw/Whxx0Z5pI5I/AAAAAAAADhU/YWHID8GU9AgrXRfeTz4HcDZkG-XWZNbSgCLcBGAs/s1600/4444061098_6eeaa7dc1a_z.jpg
[4]:https://www.linkedin.com/pulse/benefits-using-microsoft-word-fareed/
[5]:https://4.bp.blogspot.com/-XdS6bSLQbOU/WhxyeWZeeCI/AAAAAAAADhc/C3hGY6rgzX4m2emunot80-4URu9-aQx8wCLcBGAs/s1600/28929069495_e85d2626ba_z.jpg
[6]:http://linuxblog.darkduck.com/2014/03/why-open-software-and-what-are-benefits.html
[7]:https://3.bp.blogspot.com/-Id9Dm3CIXmc/WhxzGIlv3zI/AAAAAAAADho/VfIRCAbJMjMZzG2M97-uqLV9mOhqN7IWACLcBGAs/s1600/32206185926_c69accfcef_z.jpg
[8]:https://www.trainingconnection.com/photoshop-training.php
[9]:http://linuxblog.darkduck.com/2011/10/photoshop-alternatives-for-linux.html
[10]:http://www.makeuseof.com/tag/best-linux-software/

View File

@ -0,0 +1,156 @@
translating---geekpi
Undistract-me : Get Notification When Long Running Terminal Commands Complete
============================================================
by [sk][2] · November 30, 2017
![Undistract-me](https://www.ostechnix.com/wp-content/uploads/2017/11/undistract-me-2-720x340.png)
A while ago, we published how to [get notification when a Terminal activity is done][3]. Today, I found out a similar utility called “undistract-me” that notifies you when long running terminal commands complete. Picture this scenario. You run a command that takes a while to finish. In the mean time, you check your facebook and get so involved in it. After a while, you remembered that you ran a command few minutes ago. You go back to the Terminal and notice that the command has already finished. But you have no idea when the command is completed. Have you ever been in this situation? I bet most of you were in this situation many times. This is where “undistract-me” comes in help. You dont need to constantly check the terminal to see if a command is completed or not. Undistract-me utility will notify you when a long running command is completed. It will work on Arch Linux, Debian, Ubuntu and other Ubuntu-derivatives.
#### Installing Undistract-me
Undistract-me is available in the default repositories of Debian and its variants such as Ubuntu. All you have to do is to run the following command to install it.
```
sudo apt-get install undistract-me
```
The Arch Linux users can install it from AUR using any helper programs.
Using [Pacaur][4]:
```
pacaur -S undistract-me-git
```
Using [Packer][5]:
```
packer -S undistract-me-git
```
Using [Yaourt][6]:
```
yaourt -S undistract-me-git
```
Then, run the following command to add “undistract-me” to your Bash.
```
echo 'source /etc/profile.d/undistract-me.sh' >> ~/.bashrc
```
Alternatively you can run this command to add it to your Bash:
```
echo "source /usr/share/undistract-me/long-running.bash\nnotify_when_long_running_commands_finish_install" >> .bashrc
```
If you are in Zsh shell, run this command:
```
echo "source /usr/share/undistract-me/long-running.bash\nnotify_when_long_running_commands_finish_install" >> .zshrc
```
Finally update the changes:
For Bash:
```
source ~/.bashrc
```
For Zsh:
```
source ~/.zshrc
```
#### Configure Undistract-me
By default, Undistract-me will consider any command that takes more than 10 seconds to complete as a long-running command. You can change this time interval by editing /usr/share/undistract-me/long-running.bash file.
```
sudo nano /usr/share/undistract-me/long-running.bash
```
Find “LONG_RUNNING_COMMAND_TIMEOUT” variable and change the default value (10 seconds) to something else of your choice.
[![](http://www.ostechnix.com/wp-content/uploads/2017/11/undistract-me-1.png)][7]
Save and close the file. Do not forget to update the changes:
```
source ~/.bashrc
```
Also, you can disable notifications for particular commands. To do so, find the “LONG_RUNNING_IGNORE_LIST” variable and add the commands space-separated like below.
By default, the notification will only show if the active window is not the window the command is running in. That means, it will notify you only if the command is running in the background Terminal window. If the command is running in active window Terminal, you will not be notified. If you want undistract-me to send notifications either the Terminal window is visible or in the background, you can set IGNORE_WINDOW_CHECK to 1 to skip the window check.
The other cool feature of Undistract-me is you can set audio notification along with visual notification when a command is done. By default, it will only send a visual notification. You can change this behavior by setting the variable UDM_PLAY_SOUND to a non-zero integer on the command line. However, your Ubuntu system should have pulseaudio-utils and sound-theme-freedesktop utilities installed to enable this functionality.
Please remember that you need to run the following command to update the changes made.
For Bash:
```
source ~/.bashrc
```
For Zsh:
```
source ~/.zshrc
```
It is time to verify if this really works.
#### Get Notification When Long Running Terminal Commands Complete
Now, run any command that takes longer than 10 seconds or the time duration you defined in Undistract-me script.
I ran the following command on my Arch Linux desktop.
```
sudo pacman -Sy
```
This command took 32 seconds to complete. After the completion of the above command, I got the following notification.
[![](http://www.ostechnix.com/wp-content/uploads/2017/11/undistract-me-2.png)][8]
Please remember Undistract-me script notifies you only if the given command took more than 10 seconds to complete. If the command is completed in less than 10 seconds, you will not be notified. Of course, you can change this time interval settings as I described in the Configuration section above.
I find this tool very useful. It helped me to get back to the business after I completely lost in some other tasks. I hope this tool will be helpful to you too.
More good stuffs to come. Stay tuned!
Cheers!
Resource:
* [Undistract-me GitHub Repository][1]
--------------------------------------------------------------------------------
via: https://www.ostechnix.com/undistract-get-notification-long-running-terminal-commands-complete/
作者:[sk][a]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://www.ostechnix.com/author/sk/
[1]:https://github.com/jml/undistract-me
[2]:https://www.ostechnix.com/author/sk/
[3]:https://www.ostechnix.com/get-notification-terminal-task-done/
[4]:https://www.ostechnix.com/install-pacaur-arch-linux/
[5]:https://www.ostechnix.com/install-packer-arch-linux-2/
[6]:https://www.ostechnix.com/install-yaourt-arch-linux/
[7]:http://www.ostechnix.com/wp-content/uploads/2017/11/undistract-me-1.png
[8]:http://www.ostechnix.com/wp-content/uploads/2017/11/undistract-me-2.png

View File

@ -0,0 +1,132 @@
Wake up and Shut Down Linux Automatically
============================================================
### [banner.jpg][1]
![time keeper](https://www.linux.com/sites/lcom/files/styles/rendered_file/public/banner.jpg?itok=zItspoSb)
Learn how to configure your Linux computers to watch the time for you, then wake up and shut down automatically.
[Creative Commons Attribution][6][The Observatory at Delhi][7]
Don't be a watt-waster. If your computers don't need to be on then shut them down. For convenience and nerd creds, you can configure your Linux computers to wake up and shut down automatically.
### Precious Uptimes
Some computers need to be on all the time, which is fine as long as it's not about satisfying an uptime compulsion. Some people are very proud of their lengthy uptimes, and now that we have kernel hot-patching that leaves only hardware failures requiring shutdowns. I think it's better to be practical. Save electricity as well as wear on your moving parts, and shut them down when they're not needed. For example, you can wake up a backup server at a scheduled time, run your backups, and then shut it down until it's time for the next backup. Or, you can configure your Internet gateway to be on only at certain times. Anything that doesn't need to be on all the time can be configured to turn on, do a job, and then shut down.
### Sleepies
For computers that don't need to be on all the time, good old cron will shut them down reliably. Use either root's cron, or /etc/crontab. This example creates a root cron job to shut down every night at 11:15 p.m.
```
# crontab -e -u root
# m h dom mon dow command
15 23 * * * /sbin/shutdown -h now
```
```
15 23 * * 1-5 /sbin/shutdown -h now
```
You may also use /etc/crontab, which is fast and easy, and everything is in one file. You have to specify the user:
```
15 23 * * 1-5 root shutdown -h now
```
Auto-wakeups are very cool; most of my SUSE colleagues are in Nuremberg, so I am crawling out of bed at 5 a.m. to have a few hours of overlap with their schedules. My work computer turns itself on at 5:30 a.m., and then all I have to do is drag my coffee and myself to my desk to start work. It might not seem like pressing a power button is a big deal, but at that time of day every little thing looms large.
Waking up your Linux PC can be less reliable than shutting it down, so you may want to try different methods. You can use wakeonlan, RTC wakeups, or your PC's BIOS to set scheduled wakeups. These all work because, when you power off your computer, it's not really all the way off; it is in an extremely low-power state and can receive and respond to signals. You need to use the power supply switch to turn it off completely.
### BIOS Wakeup
A BIOS wakeup is the most reliable. My system BIOS has an easy-to-use wakeup scheduler (Figure 1). Chances are yours does, too. Easy peasy.
### [fig-1.png][2]
![wake up](https://www.linux.com/sites/lcom/files/styles/floated_images/public/fig-1_11.png?itok=8qAeqo1I)
Figure 1: My system BIOS has an easy-to-use wakeup scheduler.
[Used with permission][8]
### wakeonlan
wakeonlan is the next most reliable method. This requires sending a signal from a second computer to the computer you want to power on. You could use an Arduino or Raspberry Pi to send the wakeup signal, a Linux-based router, or any Linux PC. First, look in your system BIOS to see if wakeonlan is supported -- which it should be -- and then enable it, as it should be disabled by default.
Then, you'll need an Ethernet network adapter that supports wakeonlan; wireless adapters won't work. You'll need to verify that your Ethernet card supports wakeonlan:
```
# ethtool eth0 | grep -i wake-on
Supports Wake-on: pumbg
Wake-on: g
```
* d -- all wake ups disabled
* p -- wake up on physical activity
* u -- wake up on unicast messages
* m -- wake up on multicast messages
* b -- wake up on broadcast messages
* a -- wake up on ARP messages
* g -- wake up on magic packet
* s -- set the Secure On password for the magic packet
man ethtool is not clear on what the p switch does; it suggests that any signal will cause a wake up. In my testing, however, it doesn't do that. The one that must be enabled is g -- wake up on magic packet, and the Wake-on line shows that it is already enabled. If it is not enabled, you can use ethtool to enable it, using your own device name, of course:
```
# ethtool -s eth0 wol g
```
```
@reboot /usr/bin/ethtool -s eth0 wol g
```
### [fig-2.png][3]
![wakeonlan](https://www.linux.com/sites/lcom/files/styles/floated_images/public/fig-2_7.png?itok=XQAwmHoQ)
Figure 2: Enable Wake on LAN.
[Used with permission][9]
Another option is recent Network Manager versions have a nice little checkbox to enable wakeonlan (Figure 2).
There is a field for setting a password, but if your network interface doesn't support the Secure On password, it won't work.
Now you need to configure a second PC to send the wakeup signal. You don't need root privileges, so create a cron job for your user. You need the MAC address of the network interface on the machine you're waking up:
```
30 08 * * * /usr/bin/wakeonlan D0:50:99:82:E7:2B
```
Using the real-time clock for wakeups is the least reliable method. Check out [Wake Up Linux With an RTC Alarm Clock][4]; this is a bit outdated as most distros use systemd now. Come back next week to learn more about updated ways to use RTC wakeups.
Learn more about Linux through the free ["Introduction to Linux" ][5]course from The Linux Foundation and edX.
--------------------------------------------------------------------------------
via: https://www.linux.com/learn/intro-to-linux/2017/11/wake-and-shut-down-linux-automatically
作者:[Carla Schroder]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[1]:https://www.linux.com/files/images/bannerjpg
[2]:https://www.linux.com/files/images/fig-1png-11
[3]:https://www.linux.com/files/images/fig-2png-7
[4]:https://www.linux.com/learn/wake-linux-rtc-alarm-clock
[5]:https://training.linuxfoundation.org/linux-courses/system-administration-training/introduction-to-linux
[6]:https://www.linux.com/licenses/category/creative-commons-attribution
[7]:http://www.columbia.edu/itc/mealac/pritchett/00routesdata/1700_1799/jaipur/delhijantarearly/delhijantarearly.html
[8]:https://www.linux.com/licenses/category/used-permission
[9]:https://www.linux.com/licenses/category/used-permission

View File

@ -0,0 +1,71 @@
### [Fedora Classroom Session: Ansible 101][2]
### By Sachin S Kamath
![](https://fedoramagazine.org/wp-content/uploads/2017/07/fedora-classroom-945x400.jpg)
Fedora Classroom sessions continue this week with an Ansible session. The general schedule for sessions appears [on the wiki][3]. You can also find [resources and recordings from previous sessions][4] there. Here are details about this weeks session on [Thursday, 30th November at 1600 UTC][5]. That link allows you to convert the time to your timezone.
### Topic: Ansible 101
As the Ansible [documentation][6] explains, Ansible is an IT automation tool. Its primarily used to configure systems, deploy software, and orchestrate more advanced IT tasks. Examples include continuous deployments or zero downtime rolling updates.
This Classroom session covers the topics listed below:
1. Introduction to SSH
2. Understanding different terminologies
3. Introduction to Ansible
4. Ansible installation and setup
5. Establishing password-less connection
6. Ad-hoc commands
7. Managing inventory
8. Playbooks examples
There will also be a follow-up Ansible 102 session later. That session will cover complex playbooks, roles, dynamic inventory files, control flow and Galaxy.
### Instructors
We have two experienced instructors handling this session.
[Geoffrey Marr][7], also known by his IRC name as “coremodule,” is a Red Hat employee and Fedora contributor with a background in Linux and cloud technologies. While working, he spends his time lurking in the [Fedora QA][8] wiki and test pages. Away from work, he enjoys RaspberryPi projects, especially those focusing on software-defined radio.
[Vipul Siddharth][9] is an intern at Red Hat who also works on Fedora. He loves to contribute to open source and seeks opportunities to spread the word of free and open source software.
### Joining the session
This session takes place on [BlueJeans][10]. The following information will help you join the session:
* URL: [https://bluejeans.com/3466040121][1]
* Meeting ID (for Desktop App): 3466040121
We hope you attend, learn from, and enjoy this session! If you have any feedback about the sessions, have ideas for a new one or want to host a session, please feel free to comment on this post or edit the [Classroom wiki page][11].
--------------------------------------------------------------------------------
via: https://fedoramagazine.org/fedora-classroom-session-ansible-101/
作者:[Sachin S Kamath]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[1]:https://bluejeans.com/3466040121
[2]:https://fedoramagazine.org/fedora-classroom-session-ansible-101/
[3]:https://fedoraproject.org/wiki/Classroom
[4]:https://fedoraproject.org/wiki/Classroom#Previous_Sessions
[5]:https://www.timeanddate.com/worldclock/fixedtime.html?msg=Fedora+Classroom+-+Ansible+101&iso=20171130T16&p1=%3A
[6]:http://docs.ansible.com/ansible/latest/index.html
[7]:https://fedoraproject.org/wiki/User:Coremodule
[8]:https://fedoraproject.org/wiki/QA
[9]:https://fedoraproject.org/wiki/User:Siddharthvipul1
[10]:https://www.bluejeans.com/downloads
[11]:https://fedoraproject.org/wiki/Classroom

View File

@ -0,0 +1,168 @@
translating---imquanquan
How to Manage Users with Groups in Linux
============================================================
### [group-of-people-1645356_1920.jpg][1]
![groups](https://www.linux.com/sites/lcom/files/styles/rendered_file/public/group-of-people-1645356_1920.jpg?itok=rJlAxBSV)
Learn how to work with users, via groups and access control lists in this tutorial.
[Creative Commons Zero][4]
Pixabay
When you administer a Linux machine that houses multiple users, there might be times when you need to take more control over those users than the basic user tools offer. This idea comes to the fore especially when you need to manage permissions for certain users. Say, for example, you have a directory that needs to be accessed with read/write permissions by one group of users and only read permissions for another group. With Linux, this is entirely possible. To make this happen, however, you must first understand how to work with users, via groups and access control lists (ACLs).
Well start from the beginning with users and work our way to the more complex ACLs. Everything you need to make this happen will be included in your Linux distribution of choice. We wont touch on the basics of users, as the focus on this article is about groups.
For the purpose of this piece, Im going to assume the following:
You need to create two users with usernames:
* olivia
* nathan
You need to create two groups:
* readers
* editors
Olivia needs to be a member of the group editors, while nathan needs to be a member of the group readers. The group readers needs to only have read permission to the directory /DATA, whereas the group editors needs to have both read and write permission to the /DATA directory. This, of course, is very minimal, but it will give you the basic information you need to expand the tasks to fit your much larger needs.
Ill be demonstrating on the Ubuntu 16.04 Server platform. The commands will be universal—the only difference would be if your distribution of choice doesnt make use of sudo. If this is the case, youll have to first su to the root user to issue the commands that require sudo in the demonstrations.
### Creating the users
The first thing we need to do is create the two users for our experiment. User creation is handled with the useradd command. Instead of just simply creating the users we need to create them both with their own home directories and then give them passwords.
The first thing we do is create the users. To do this, issue the commands:
```
sudo useradd -m olivia
sudo useradd -m nathan
```
Next each user must have a password. To add passwords into the mix, youd issue the following commands:
```
sudo passwd olivia
sudo passwd nathan
```
Thats it, your users are created.
### Creating groups and adding users
Now were going to create the groups readers and editors and then add users to them. The commands to create our groups are:
```
addgroup readers
addgroup editors
```
### [groups_1.jpg][2]
![groups](https://www.linux.com/sites/lcom/files/styles/rendered_file/public/groups_1.jpg?itok=BKwL89BB)
Figure 1: Our new groups ready to be used.
[Used with permission][5]
With our groups created, we need to add our users. Well add user nathan to group readers with the command:
```
sudo usermod -a -G readers nathan
```
```
sudo usermod -a -G editors olivia
```
### Giving groups permissions to directories
Lets say you have the directory /READERS and you need to allow all members of the readers group access to that directory. First, change the group of the folder with the command:
```
sudo chown -R :readers /READERS
```
```
sudo chmod -R g-w /READERS
```
```
sudo chmod -R o-x /READERS
```
Lets say you have the directory /EDITORS and you need to give members of the editors group read and write permission to its contents. To do that, the following command would be necessary:
```
sudo chown -R :editors /EDITORS
sudo chmod -R g+w /EDITORS
sudo chmod -R o-x /EDITORS
```
The problem with using this method is you can only add one group to a directory at a time. This is where access control lists come in handy.
### Using access control lists
Now, lets get tricky. Say you have a single folder—/DATA—and you want to give members of the readers group read permission and members of the group editors read/write permissions. To do that, you must take advantage of the setfacl command. The setfacl command sets file access control lists for files and folders.
The structure of this command looks like this:
```
setfacl OPTION X:NAME:Y /DIRECTORY
```
```
sudo setfacl -m g:readers:rx -R /DATA
```
To give members of the editors group read/write permissions (while retaining read permissions for the readers group), wed issue the command;
```
sudo setfacl -m g:editors:rwx -R /DATA
```
### All the control you need
And there you have it. You can now add members to groups and control those groups access to various directories with all the power and flexibility you need. To read more about the above tools, issue the commands:
* man usradd
* man addgroup
* man usermod
* man sefacl
* man chown
* man chmod
Learn more about Linux through the free ["Introduction to Linux" ][3]course from The Linux Foundation and edX.
--------------------------------------------------------------------------------
via: https://www.linux.com/learn/intro-to-linux/2017/12/how-manage-users-groups-linux
作者:[Jack Wallen ]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[1]:https://www.linux.com/files/images/group-people-16453561920jpg
[2]:https://www.linux.com/files/images/groups1jpg
[3]:https://training.linuxfoundation.org/linux-courses/system-administration-training/introduction-to-linux
[4]:https://www.linux.com/licenses/category/creative-commons-zero
[5]:https://www.linux.com/licenses/category/used-permission

View File

@ -0,0 +1,72 @@
Translating by filefi
# Scrot: Linux command-line screen grabs made simple
by [Scott Nesbitt][a] · November 30, 2017
> Scrot is a basic, flexible tool that offers a number of handy options for taking screen captures from the Linux command line.
[![Original photo by Rikki Endsley. CC BY-SA 4.0](https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/community-penguins-osdc-lead.png?itok=BmqsAF4A)][1]
There are great tools on the Linux desktop for taking screen captures, such as [KSnapshot][2] and [Shutter][3]. Even the simple utility that comes with the GNOME desktop does a pretty good job of capturing screens. But what if you rarely need to take screen captures? Or you use a Linux distribution without a built-in capture tool, or an older computer with limited resources?
Turn to the command line and a little utility called [Scrot][4]. It does a fine job of taking simple screen captures, and it includes a few features that might surprise you.
### Getting started with Scrot
Many Linux distributions come with Scrot already installed—to check, type `which scrot`. If it isn't there, you can install Scrot using your distro's package manager. If you're willing to compile the code, grab it [from GitHub][5].
To take a screen capture, crack open a terminal window and type `scrot [filename]`, where `[filename]` is the name of file to which you want to save the image (for example, `desktop.png`). If you don't include a name for the file, Scrot will create one for you, such as `2017-09-24-185009_1687x938_scrot.png`. (That filename isn't as descriptive it could be, is it? That's why it's better to add one to the command.)
Running Scrot with no options takes a screen capture of your entire desktop. If you don't want to do that, Scrot lets you focus on smaller portions of your screen.
### Taking a screen capture of a single window
Tell Scrot to take a screen capture of a single window by typing `scrot -u [filename]`.
The `-u` option tells Scrot to grab the window currently in focus. That's usually the terminal window you're working in, which might not be the one you want.
To grab another window on your desktop, type `scrot -s [filename]`.
The `-s` option lets you do one of two things:
* select an open window, or
* draw a rectangle around a window or a portion of a window to capture it.
You can also set a delay, which gives you a little more time to select the window you want to capture. To do that, type `scrot -u -d [num] [filename]`.
The `-d` option tells Scrot to wait before grabbing the window, and `[num]` is the number of seconds to wait. Specifying `-d 5` (wait five seconds) should give you enough time to choose a window.
### More useful options
Scrot offers a number of additional features (most of which I never use). The ones I find most useful include:
* `-b` also grabs the window's border
* `-t` grabs a window and creates a thumbnail of it. This can be useful when you're posting screen captures online.
* `-c` creates a countdown in your terminal when you use the `-d` option.
To learn about Scrot's other options, check out the its documentation by typing `man scrot` in a terminal window, or [read it online][6]. Then start snapping images of your screen.
It's basic, but Scrot gets the job done nicely.
--------------------------------------------------------------------------------
via: https://opensource.com/article/17/11/taking-screen-captures-linux-command-line-scrot
作者:[Scott Nesbitt][a]
译者:[filefi](https://github.com/filefi)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://opensource.com/users/scottnesbitt
[1]:https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/community-penguins-osdc-lead.png?itok=BmqsAF4A
[2]:https://www.kde.org/applications/graphics/ksnapshot/
[3]:https://launchpad.net/shutter
[4]:https://github.com/dreamer/scrot
[5]:http://manpages.ubuntu.com/manpages/precise/man1/scrot.1.html
[6]:https://github.com/dreamer/scrot

View File

@ -0,0 +1,128 @@
成为你所在社区的美好力量
============================================================
>明白如何传递美好,了解积极意愿的力量,以及更多。
![Be a force for good in your community](https://opensource.com/sites/default/files/styles/image-full-size/public/images/life/people_remote_teams_world.png?itok=wI-GW8zX "Be a force for good in your community")
>图片来自opensource.com
激烈的争论是开源社区和开放组织的标志特征之一。在我们最好的日子里,这些争论充满活力和建设性。他们面红耳赤的背后其实是幽默和善意。各方实事求是,共同解决问题,推动持续改进。对我们中的许多人来说,他们只是单纯的娱乐而已。
然而在我们最糟糕的日子里,这些争论演变成了对旧话题的反复争吵。或者我们用各种方式来传递伤害和相互攻击,或是使用卑劣的手段,而这些侵蚀着我们社区的激情、信任和生产力。
我们茫然四顾,束手无策,因为社区的对话开始变得有毒。然而,正如 [DeLisa Alexander最近的分享][1],我们每个人都有很多方法可以成为我们社区的一种力量。
在这个“开源文化”系列的第一篇文章中,我将分享一些策略,教你如何在这个关键时刻进行干预,引导每个人走向更积极、更有效率的方向。
### 不要将人推开,而是将人推向前方
最近,我和我的朋友和同事 [Mark Rumbles][2] 一起吃午饭。多年来,我们在许多支持开源文化和引领 Red Hat 的项目中合作。在这一天,马克问我是怎么坚持的,当我看到辩论变得越来越丑陋的时候,他看到我最近介入了一个邮件列表的对话。
幸运的是,这事早已尘埃落定,事实上我几乎忘记了谈话的内容。然而,它让我们开始讨论如何在一个拥有数千名成员的社区里,公开和坦率的辩论。
>在我们的社区里,我们成为一种美好力量的最好的方法之一就是:在回应冲突时,以一种迫使每个人提升他们的行为,而不是使冲突升级的方式。
Mark 说了一些让我印象深刻的话。他说:“你知道,作为一个社区,我们真的很擅长将人推开。但我想看到的是,我们更多的是互相扶持 _向前_ 。”
Mark 是绝对正确的。在我们的社区里,我们成为一种美好力量的最好的方法之一就是:在回应冲突时,以一种迫使每个人提升他们的行为,而不是使冲突升级的方式。
### 积极意愿假想
我们可以从一个简单的假想开始,当我们在一个激烈的对话中观察到不良行为时:完全有可能该不良行为其实有着积极意愿。
诚然这不是一件容易的事情。当我看到一场辩论正在变得肮脏的迹象时我停下来问自己史蒂芬·科维Steven Covey所说的人性化问题是什么
“为什么一个理性、正直的人会做这样的事情?”
现在,如果他是你的一个“普通的观察对象”——一个有消极行为倾向的社区成员——也许你的第一个想法是,“嗯,也许这个人是个不靠谱,不理智的人”
回过头来说。我并不是说你让你自欺欺人。这其实就是人性化的问题,不仅是因为它让你理解别人的立场,它还让你变得人性化。
而这反过来又能帮助你做出反应,或者从最有效率的地方进行干预。
### 寻求了解社区异议的原因
当我再一次问自己为什么一个理性的、正直的人可能会做这样的事情时,归结为几个原因:
* 他认为没人聆听他
* 他认为没人尊重他
* 他认为没人理解他
一个简单的积极意愿假想,我们可以适用于几乎所有的不良行为,其实就是那个人想要被聆听,被尊重,或被理解。我想这是相当合理的。
通过站在这个更客观、更有同情心的角度,我们可以看到他们的行为几乎肯定 **_不_** 会帮助他们得到他们想要的东西,而社区也会因此而受到影响。如果没有我们的帮助的话。
对我来说,这激发了一个愿望:帮助每个人从我们所处的这个丑陋的地方“摆脱困境”。
在我介入之前,我问自己一个后续的问题:是否有其他积极的意图可能会驱使这种行为
容易想到的例子包括:
* 他们担心我们错过了一些重要的东西,或者我们犯了一个错误,没有人能看到它。
* 他们想为自己的贡献感到有价值。
* 他们精疲力竭,因为在社区里工作过度或者在他们的个人生活中发生了一些事情。
* 他们讨厌一些东西被破坏,并感到沮丧,因为没有人能看到造成的伤害或不便。
* ……诸如此类。
有了这些,我就有了丰富的积极的意图假想,我可以为他们的行为找到原因。我准备伸出援助之手,向他们提供一些帮助。
### 传递美好,挣脱泥潭
什么是 an out类似与佛家“解脱法门”的意思把它想象成一个逃跑的门。这是一种退出对话的方式或者放弃不良的行为恢复表现得像一个体面的人而不是丢面子。是叫某人振作向上而不是叫他走开。
你可能经历过这样的事情,在你的生活中,当 _你_ 在一次谈话中表现不佳时,咆哮着,大喊大叫,对某事大惊小怪,而有人慷慨地给 _你_ 提供了一个台阶下。也许他们选择不去和你“抬杠”,相反,他们说了一些表明他们相信你是一个理性、正直的人,他们采用积极意愿假想,比如:
> _所以我听到的是你真的很担心你很沮丧因为似乎没有人在听。或者你担心我们忽略了它的重要性。是这样对吧?_
于是乎:即使这不是完全正确的(也许你的意图不那么高尚),在那一刻,你可能抓住了他们提供给你的台阶,并欣然接受了重新定义你的不良行为的机会。你几乎可以肯定地转向一个更富有成效的角度,甚至你自己可能都没有意识到。
也许你这样说,“哦,虽然不完全是这样,但我只是担心,我们这样会走向歧途,我明白你说的,作为社区,我们不能同时解决所有问题,但如果我们不尽快解决这个问题,会有更多不好的事情要发生……”
最后,谈话几乎可以肯定地开始转移到一个更有效率的方向。
我们都有机会让一个沮丧的人挣脱泥潭,而这就是方法。
### 坏行为还是坏人?
如果这个人特别激动,他们可能不会听到或者接受你给出的第一台阶。没关系。最可能的是,他们迟钝的大脑已经被史前曾经对人类生存至关重要的杏仁核接管了,他们需要更多的时间来认识到你并不是一个威胁。只是需要你保持温和的态度,坚定地对待他们,就好像他们 _曾经是_ 一个理性、正直的人,看看会发生什么。
根据我的经验,这些社区干预以三种方式结束:
大多数情况下,这个人实际上 _是_ 一个理性的人,很快,他们就感激地接受了这个事实。在这个过程中,每个人都跳出了“黑与白”,“赢或输”的心态。人们开始思考创造性的选择和“双赢”的结果,每个人都将受益。
> 为什么一个理性、正直的人会做这样的事呢?
有时候,这个人天生不是特别理性或正直的,但当他被你以如此一致的、不知疲倦的、耐心的慷慨和善良的对待的时候,他们就会羞愧地从谈话中撤退。这听起来像是,“嗯,我想我已经说了所有要说的了。谢谢你听我的意见”。或者,对于不那么开明的人来说,“嗯,我厌倦了这种谈话。让我们结束吧。”(好的,谢谢)。
更少的情况是这个人是一个“_坏人_”或者在社区管理圈子里是一个“搅屎棍”。这些人确实存在而且他们在演戏方面很有发展。你猜怎么着通过持续地以一种友善、慷慨、以社区为中心的方式完全无视所有试图使局势升级的尝试你有效地将谈话变成了一个对他们没有兴趣的领域。他们别无选择只能放弃它。你成为赢家。
这就是积极意愿假想的力量。通过对愤怒和充满敌意的言辞做出回应,优雅而有尊严地回应,你就能化解一场战争,理清混乱,解决棘手的问题,而且在这个过程中很有可能会交到一个新朋友。
我每次应用这个原则都成功吗?见鬼,不。但我从不后悔选择了积极意愿。但是我能生动的回想起,当我采用消极意愿假想时,将问题变得更糟糕的场景。
现在轮到你了。我很乐意听到你提出的一些策略和原则,当你的社区里的对话变得激烈的时候,要成为一股好力量。在下面的评论中分享你的想法。
下次,我们将探索更多的方法,在你的社区里成为一个美好力量,我将分享一些处理“坏脾气先生”的技巧。
--------------------------------------------------------------------------------
作者简介:
![](https://opensource.com/sites/default/files/styles/profile_pictures/public/pictures/headshot-square_0.jpg?itok=FS97b9YD)
丽贝卡·费尔南德斯Rebecca Fernandez是红帽公司Red Hat的首席就业品牌 + 通讯专家是《开源组织》书籍的贡献者也是开源决策框架的维护者。她的兴趣是开源和业务管理模型的开源方式。Twitter@ruhbehka
--------------------------------------------------------------------------------
via: https://opensource.com/open-organization/17/1/force-for-good-community
作者:[Rebecca Fernandez][a]
译者:[chao-zhi](https://github.com/chao-zhi)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://opensource.com/users/rebecca
[1]:https://opensource.com/business/15/5/5-ways-promote-inclusive-environment
[2]:https://twitter.com/leadership_365

View File

@ -0,0 +1,167 @@
程序员的学习之路
============================================================
*2016 年 10 月,当我从微软离职时,我已经在微软工作了近 21 年,在工业界也快 35 年了。我花了一些时间反思我这些年来学到的东西,这些文字是那篇帖子稍加修改后得到。请见谅,文章有一点长。*
要成为一名专业的程序员你需要知道的事情多得令人吃惊语言的细节API算法数据结构系统和工具。这些东西一直在随着时间变化——新的语言和编程环境不断出现似乎总有热门的新工具或新语言是“每个人”都在使用的。紧跟潮流保持专业这很重要。木匠需要知道如何为工作选择合适的锤子和钉子并且要有能力笔直精准地钉入钉子。
与此同时,我也发现有一些理论和方法有着广泛的应用场景,它们能使用几十年。底层设备的性能和容量在这几十年来增长了几个数量级,但系统设计的思考方式还是互相有关联的,这些思考方式比具体的实现更根本。理解这些重复出现的主题对分析与设计我们所负责的系统大有帮助。
谦卑和自我
这不仅仅局限于编程但在编程这个持续发展的领域一个人需要在谦卑和自我中保持平衡。总有新的东西需要学习并且总有人能帮助你学习——如果你愿意学习的话。一个人即需要保持谦卑认识到自己不懂并承认它也要保持自我相信自己能掌握一个新的领域并且能运用你已经掌握的知识。我见过的最大的挑战就是一些人在某个领域深入专研了很长时间“忘记”了自己擅长学习新的东西。最好的学习来自放手去做建造一些东西即便只是一个原型或者 hack。我知道的最好的程序员对技术有广泛的认识但同时他们对某个技术深入研究成为了专家。而深入的学习来自努力解决真正困难的问题。
端到端观点
1981 年Jerry Saltzer, Dave Reed 和 Dave Clark 在做因特网和分布式系统的早期工作,他们提出了端到端观点,并作出了[经典的阐述][4]。网络上的文章有许多误传,所以更应该阅读论文本身。论文的作者很谦虚,没有声称这是他们自己的创造——从他们的角度看,这只是一个常见的工程策略,不只在通讯领域中,在其他领域中也有运用。他们只是将其写下来并收集了一些例子。下面是文章的一个小片段:
> 当我们设计系统的一个功能时,仅依靠端点的知识和端点的参与,就能正确地完整地实现这个功能。在一些情况下,系统的内部模块局部实现这个功能,可能会对性能有重要的提升。
论文称这是一个“观点”,虽然在维基百科和其他地方它已经被上升成“原则”。实际上,还是把它看作一个观点比较好,正如作者们所说,系统设计者面临的最难的问题之一就是如何在系统组件之间划分责任,这会引发不断的讨论:怎样在划分功能时权衡利弊,怎样隔离复杂性,怎样设计一个灵活的高性能系统来满足不断变化的需求。没有简单的原则可以直接遵循。
互联网上的大部分讨论集中在通信系统上,但端到端观点的适用范围其实更广泛。分布式系统中的“最终一致性”就是一个例子。一个满足“最终一致性”的系统,可以让系统中的元素暂时进入不一致的状态,从而简化系统,优化性能,因为有一个更大的端到端过程来解决不一致的状态。我喜欢横向拓展的订购系统的例子(例如亚马逊),它不要求每个请求都通过中央库存的控制点。缺少中央控制点可能允许两个终端出售相同的最后一本书,所以系统需要用某种方法来解决这个问题,如通知客户该书会延期交货。不论怎样设计,想购买的最后一本书在订单完成前都有可能被仓库中的叉车运出厍(译者注:比如被其他人下单购买)。一旦你意识到你需要一个端到端的解决方案,并实现了这个方案,那系统内部的设计就可以被优化,并利用这个解决方案。
事实上,这种设计上的灵活性可以优化系统的性能,或者提供其他的系统功能,从而使得端到端的方法变得如此强大。端到端的思考往往允许内部进行灵活的操作,使整个系统更加健壮,并且能适应每个组件特性的变化。这些都让端到端的方法变得健壮,并能适应变化。
端到端方法意味着添加会牺牲整体性能灵活性的抽象层和功能时要非常小心也可能是其他的灵活性但性能特别是延迟往往是特殊的。如果你展示出底层的原始性能performance, 也可能指操作),端到端的方法可以根据这个性能(操作)来优化,实现特定的需求。如果你破坏了底层性能(操作),即使你实现了重要的有附加价值的功能,你也牺牲了设计灵活性。
如果系统足够庞大而且足够复杂,需要把整个开发团队分配给系统内部的组件,那么端到端观点可以和团队组织相结合。这些团队自然要扩展这些组件的功能,他们通常从牺牲设计上的灵活性开始,尝试在组件上实现端到端的功能。
应用端到端方法面临的挑战之一是确定端点在哪里。 俗话说,“大跳蚤上有小跳蚤,小跳蚤上有更少的跳蚤……等等”。
关注复杂性
编程是一门精确的艺术,每一行代码都要确保程序的正确执行。但这是带有误导的。编程的复杂性不在于各个部分的整合,也不在于各个部分之间如何相互交互。最健壮的程序将复杂性隔离开,让最重要的部分变的简单直接,通过简单的方式与其他部分交互。虽然隐藏复杂性和信息隐藏、数据抽象等其他设计方法一样,但我仍然觉得,如果你真的要定位出系统的复杂所在,并将其隔离开,那你需要对设计特别敏锐。
在我的[文章][5]中反复提到的例子是早期的终端编辑器 VI 和 Emacs 中使用的屏幕重绘算法。早期的视频终端实现了控制序列,来控制绘制字符核心操作,也实现了附加的显示功能,来优化重新绘制屏幕,如向上向下滚动当前行,或者插入新行,或在当前行中移动字符。这些命令都具有不同的开销,并且这些开销在不同制造商的设备中也是不同的。(参见[TERMCAP][6]以获取代码链接和更完整的历史记录。)像文本编辑器这样的全屏应用程序希望尽快更新屏幕,因此需要优化使用这些控制序列来转换屏幕从一个状态到另一个状态。
这些程序在设计上隐藏了底层的复杂性。系统中修改文本缓冲区的部分(功能上大多数创新都在这里)完全忽略了这些改变如何被转换成屏幕更新命令。这是可以接受的,因为针对*任何*内容的改变计算最佳命令所消耗的性能代价,远不及被终端本身实际执行这些更新命令的性能代价。在确定如何隐藏复杂性,以及隐藏哪些复杂性时,性能分析扮演着重要的角色,这一点在系统设计中非常常见。屏幕的更新与底层文本缓冲区的更改是异步的,并且可以独立于缓冲区的实际历史变化顺序。缓冲区*怎样*改变的并不重要,重要的是改变了*什么*。异步耦合,在组件交互时消除组件对历史路径依赖的组合,以及用自然的交互方式以有效地将组件组合在一起是隐藏耦合复杂度的常见特征。
隐藏复杂性的成功不是由隐藏复杂性的组件决定的,而是由使用该模块的使用者决定的。这就是为什么组件的提供者至少要为组件的某些端到端过程负责。他们需要清晰的知道系统的其他部分如何与组件相互作用,复杂性是如何泄漏出来的(以及是否泄漏出来)。这常常表现为“这个组件很难使用”这样的反馈——这通常意味着它不能有效地隐藏内部复杂性,或者没有选择一个隐藏复杂性的功能边界。
分层与组件化
系统设计人员的一个基本工作是确定如何将系统分解成组件和层;决定自己要开发什么,以及从别的地方获取什么。开源项目在决定自己开发组件还是购买服务时,大多会选择自己开发,但组件之间交互的过程是一样的。在大规模工程中,理解这些决策将如何随着时间的推移而发挥作用是非常重要的。从根本上说,变化是程序员所做的一切的基础,所以这些设计决定不仅在当下被评估,还要随着产品的不断发展而在未来几年得到评估。
以下是关于系统分解的一些事情,它们最终会占用大量的时间,因此往往需要更长的时间来学习和欣赏。
* 层泄漏。层(或抽象)[基本上是泄漏的][1]。这些泄漏会立即产生后果,也会随着时间的推移而产生两方面的后果。其中一方面就是该抽象层的特性渗透到了系统的其他部分,渗透的程度比你意识到得更深入。这些渗透可能是关于具体的性能特征的假设,以及抽象层的文档中没有明确的指出的行为发生的顺序。这意味着假如内部组件的行为发生变化,你的系统会比想象中更加脆弱。第二方面是你比表面上看起来更依赖组件内部的行为,所以如果你考虑改变这个抽象层,后果和挑战可能超出你的想象。
* 层具有太多功能了。您所采用的组件具有比实际需要更多的功能这几乎是一个真理。在某些情况下你决定采用这个组件是因为你想在将来使用那些尚未用到的功能。有时你采用组件是想“上快车”利用组件完成正在进行的工作。在功能强大的抽象层上开发会带来一些后果。1) 组件往往会根据你并不需要的功能作出取舍。 2) 为了实现那些你并不没有用到的功能组件引入了复杂性和约束这些约束将阻碍该组件的未来的演变。3) 层泄漏的范围更大。一些泄漏是由于真正的“抽象泄漏”另一些是由于明显的逐渐增加的对组件全部功能的依赖但这些依赖通常都没有处理好。Office 太大了我们发现对于我们建立的任何抽象层我们最终都在系统的某个部分完全运用了它的功能。虽然这看起来是积极的我们完全地利用了这个组件但并不是所用的使用都有同样的价值。所以我们最终要付出巨大的代价才能从一个抽象层往另一个抽象层迁移这种“长尾巴”没什么价值并且对使用场景认识不足。4) 附加的功能会增加复杂性,并增加功能滥用的可能。如果将验证 XML 的 API 指定为 XML 树的一部分,那这个 API 可以选择动态下载 XML 的模式定义。这在我们的基本文件解析代码中被错误地执行,导致 w3c.org 服务器上的大量性能下降以及无意分布式拒绝服务攻击。这些被通俗地称为“地雷”API
* 抽象层被更换。需求发展,系统发展,组件被放弃。您最终需要更换该抽象层或组件。不管是对外部组件的依赖还是对内部组件的依赖都是如此。这意味着上述问题将变得重要起来。
* 自己构建还是购买的决定将会改变。这是上面几方面的必然结果。这并不意味着自己构建还是购买的决定在当时是错误的。一开始时往往没有合适的组件,一段时间之后才有合适的组件出现。或者,也可能你使用了一个组件,但最终发现它不符合您不断变化的要求,而且你的要求非常窄,能被理解,或着对你的价值体系来说是非常重要的,以至于拥有自己的模块是有意义的。这意味着你像关心自己构造的模块一样,关心购买的模块,关心它们是怎样泄漏并深入你的系统中的。
* 抽象层会变臃肿。一旦你定义了一个抽象层,它就开始增加功能。层是对使用模式优化的自然分界点。臃肿的层的困难在于,它往往会降低您利用底层的不断创新的能力。从某种意义上说,这就是操作系统公司憎恨构建在其核心功能之上的臃肿的层的原因——采用创新的速度放缓了。避免这种情况的一种比较规矩的方法是禁止在适配器层中进行任何额外的状态存储。微软基础类在 Win32 上采用这个一般方法。在短期内,将功能集成到现有层(最终会导致上述所有问题)而不是重构和重新推导是不可避免的。理解这一点的系统设计人员寻找分解和简化组件的方法,而不是在其中增加越来越多的功能。
爱因斯坦宇宙
几十年来我一直在设计异步分布式系统但是在微软内部的一次演讲中SQL 架构师 Pat Helland 的一句话震惊了我。 “我们生活在爱因斯坦的宇宙中,没有同时性。”在构建分布式系统时(基本上我们构建的都是分布式系统),你无法隐藏系统的分布式特性。这是物理的。我一直感到远程过程调用在根本上错误的,这是一个原因,尤其是那些“透明的”远程过程调用,它们就是想隐藏分布式的交互本质。你需要拥抱系统的分布式特性,因为这些意义几乎总是需要通过系统设计和用户体验来完成。
拥抱分布式系统的本质则要遵循以下几个方面:
* 一开始就要思考设计对用户体验的影响,而不是试图在处理错误,取消请求和报告状态上打补丁。
* 使用异步技术来耦合组件。同步耦合是*不可能*的。如果某些行为看起来是同步的,是因为某些内部层尝试隐藏异步,这样做会遮蔽(但绝对不隐藏)系统运行时的基本行为特征。
* 认识到并且明确设计了交互状态机,这些状态表示长期的可靠的内部系统状态(而不是由深度调用堆栈中的变量值编码的临时,短暂和不可发现的状态)。
* 认识到失败是在所难免的。要保证能检测出分布式系统中的失败,唯一的办法就是直接看你的等待时间是否“太长”。这自然意味着[取消的等级最高][2]。系统的某一层(可能直接通向用户)需要决定等待时间是否过长,并取消操作。取消只是为了重建局部状态,回收局部的资源——没有办法在系统内广泛使用取消机制。有时用一种低成本,不可靠的方法广泛使用取消机制对优化性能可能有用。
* 认识到取消不是回滚,因为它只是回收本地资源和状态。如果回滚是必要的,它必须实现成一个端到端的功能。
* 承认永远不会真正知道分布式组件的状态。只要你发现一个状态,它可能就已经改变了。当你发送一个操作时,请求可能在传输过程中丢失,也可能被处理了但是返回的响应丢失了,或者请求需要一定的时间来处理,这样远程状态最终会在未来的某个任意的时间转换。这需要像幂等操作这样的方法,并且要能够稳健有效地重新发现远程状态,而不是期望可靠地跟踪分布式组件的状态。“[最终一致性][3]”的概念简洁地捕捉了这其中大多数想法。
我喜欢说你应该“陶醉在异步”。与其试图隐藏异步,不如接受异步,为异步而设计。当你看到像幂等性或不变性这样的技术时,你就认识到它们是拥抱宇宙本质的方法,而不仅仅是工具箱中的一个设计工具。
性能
我确信 Don Knuth 会对人们怎样误解他的名言“过早的优化是一切罪恶的根源”而感到震惊。事实上性能及性能持续超过60年的指数增长或超过10年取决于您是否愿意将晶体管真空管和机电继电器的发展算入其中为所有行业内的惊人创新和影响经济的“软件吃遍世界”的变化打下了基础。
要认识到这种指数变化的一个关键是,虽然系统的所有组件正在经历指数变化,但这些指数是不同的。硬盘容量的增长速度与内存容量的增长速度不同,与 CPU 的增长速度不同,与内存 CPU 之间的延迟的性能改善速度也不用。即使性能发展的趋势是由相同的基础技术驱动的,增长的指数也会有分歧。[延迟的改进从根本上改善了带宽][7]。指数变化在近距离或者短期内看起来是线性的,但随着时间的推移可能是压倒性的。系统不同组件的性能的增长不同,会出现压倒性的变化,并迫使对设计决策定期进行重新评估。
这样做的结果是,几年后,一度有意义的设计决定就不再有意义了。或者在某些情况下,二十年前有意义的方法又开始变成一个好的决定。现代内存映射的特点看起来更像是早期分时的进程切换,而不像分页那样。 (这样做有时会让我这样的老人说“这就是我们在 1975 年时用的方法”——忽略了这种方法在 40 年都没有意义,但现在又重新成为好的方法,因为两个组件之间的关系——可能是闪存和 NAND 而不是磁盘和核心内存——已经变得像以前一样了)。
当这些指数超越人自身的限制时,重要的转变就发生了。你能从 2 的 16 次方个字符(一个人可以在几个小时打这么多字)过渡到 2 的 3 次方个字符(远超出了一个人打字的范围)。你可以捕捉比人眼能感知的分辨率更高的数字图像。或者你可以将整个音乐专辑存在小巧的磁盘上,放在口袋里。或者你可以将数字化视频录制存储在硬盘上。再通过实时流式传输的能力,可以在一个地方集中存储一次,不需要在数千个本地硬盘上重复记录。
但有的东西仍然是根本的限制条件,那就是空间的三维和光速。我们又回到了爱因斯坦的宇宙。内存的分级结构将始终存在——它是物理定律的基础。稳定的存储和 IO内存计算和通信也都将一直存在。这些模块的相对容量延迟和带宽将会改变但是系统始终要考虑这些元素如何组合在一起以及它们之间的平衡和折衷。Jim Gary 是这方面的大师。
空间和光速的根本限制造成的另一个后果是,性能分析主要是关于三件事:局部化 (locality),局部化,局部化。无论是将数据打包在磁盘上,管理处理器缓存的层次结构,还是将数据合并到通信数据包中,数据如何打包在一起,如何在一段时间内从局部获取数据,数据如何在组件之间传输数据是性能的基础。把重点放在减少管理数据的代码上,增加空间和时间上的局部性,是消除噪声的好办法。
Jon Devaan 曾经说过:“设计数据,而不是设计代码”。这也通常意味着当查看系统结构时,我不太关心代码如何交互——我想看看数据如何交互和流动。如果有人试图通过描述代码结构来解释一个系统,而不理解数据流的速率和数量,他们就不了解这个系统。
内存的层级结构也意味着我缓存将会一直存在——即使某些系统层正在试图隐藏它。缓存是根本的,但也是危险的。缓存试图利用代码的运行时行为,来改变系统中不同组件之间的交互模式。它们需要对运行时行为进行建模,即使模型填充缓存并使缓存失效,并测试缓存命中。如果模型由于行为改变而变差或变得不佳,缓存将无法按预期运行。一个简单的指导方针是,缓存必须被检测——由于应用程序行为的改变,事物不断变化的性质和组件之间性能的平衡,缓存的行为将随着时间的推移而退化。每一个老程序员都有缓存变糟的经历。
我很幸运,我的早期职业生涯是在互联网的发源地之一 BBN 度过的。 我们很自然地将将异步组件之间的通信视为系统连接的自然方式。流量控制和队列理论是通信系统的基础,更是任何异步系统运行的方式。流量控制本质上是资源管理(管理通道的容量),但资源管理是更根本的关注点。流量控制本质上也应该由端到端的应用负责,所以用端到端的方式思考异步系统是自然的。[缓冲区膨胀][8]的故事在这种情况下值得研究,因为它展示了当对端到端行为的动态性以及技术“改进”(路由器中更大的缓冲区)缺乏理解时,在整个网络基础设施中导致的长久的问题。
我发现“光速”的概念在分析任何系统时都非常有用。光速分析并不是从当前的性能开始分析,而是问“这个设计理论上能达到的最佳性能是多少?”真正传递的信息是什么,以什么样的速度变化?组件之间的底层延迟和带宽是多少?光速分析迫使设计师深入思考他们的方法能否达到性能目标,或者否需要重新考虑设计的基本方法。它也迫使人们更深入地了解性能在哪里损耗,以及损耗是由固有的,还是由于一些不当行为产生的。从构建的角度来看,它迫使系统设计人员了解其构建的模块的真实性能特征,而不是关注其他功能特性。
我的职业生涯大多花费在构建图形应用程序上。用户坐在系统的一端,定义关键的常量和约束。人类的视觉和神经系统没有经历过指数性的变化。它们固有地受到限制,这意味着系统设计者可以利用(必须利用)这些限制,例如,通过虚拟化(限制底层数据模型需要映射到视图数据结构中的数量),或者通过将屏幕更新的速率限制到人类视觉系统的感知限制。
复杂性的本质
我的整个职业生涯都在与复杂性做斗争。为什么系统和应用变得复杂呢?为什么在一个应用领域内进行开发并没有随着时间变得简单,而基础设施却没有变得更复杂,反而变得更强大了?事实上,管理复杂性的一个关键方法就是“走开”然后重新开始。通常新的工具或语言迫使我们从头开始,这意味着开发人员将工具的优点与从新开始的优点结合起来。从新开始是重要的。这并不是说新工具,新平台,或新语言可能不好,但我保证它们不能解决复杂性增长的问题。控制复杂性的最简单的方法就是用更少的程序员,建立一个更小的系统。
当然很多情况下“走开”并不是一个选择——Office 建立在有巨大的价值的复杂的资源上。通过 OneNote Office 从 Word 的复杂性上“走开”从而在另一个维度上进行创新。Sway 是另一个例子, Office 决定从限制中跳出来,利用关键的环境变化,抓住机会从底层上采取全新的设计方案。我们有 WordExcelPowerPoint 这些应用,它们的数据结构非常有价值,我们并不能完全放弃这些数据结构,它们成为了开发中持续的显著的限制条件。
我受到 Fred Brook 讨论软件开发中的意外和本质的文章[《没有银子弹》][9]的影响,他希望用两个趋势来尽可能地推动程序员的生产力:一是在选择自己开发还是购买时,更多地关注购买——这预示了开源社区和云架构的改变;二是从单纯的构建方法转型到更“有机”或者“生态”的增量开发方法。现代的读者可以认为是向敏捷开发和持续开发的转型。但那篇文章可是写于 1986 年!
我很欣赏 Stuart Kauffman 的在复杂性的基本性上的研究工作。Kauffman 从一个简单的布尔网络模型(“[NK 模型][10]”)开始建立起来,然后探索这个基本的数学结构在相互作用的分子,基因网络,生态系统,经济系统,计算机系统(以有限的方式)等系统中的应用,来理解紧急有序行为的数学基础及其与混沌行为的关系。在一个高度连接的系统中,你固有地有一个相互冲突的约束系统,使得它(在数学上)很难向前发展(这被看作是在崎岖景观上的优化问题)。控制这种复杂性的基本方法是将系统分成独立元素并限制元素之间的相互连接(实质上减少 NK 模型中的“N”和“K”。当然对那些使用复杂隐藏信息隐藏和数据抽象并且使用松散异步耦合来限制组件之间的交互的技术的系统设计者来说这是很自然的。
我们一直面临的一个挑战是,我们想到的许多拓展系统的方法,都跨越了所有的方面。实时共同编辑是 Office 应用程序最近的一个非常具体的(也是最复杂的)例子。
我们的数据模型的复杂性往往等同于“能力”。设计用户体验的固有挑战是我们需要将有限的一组手势,映射到底层数据模型状态空间的转换。增加状态空间的维度不可避免地在用户手势中产生模糊性。这是“[纯数学][11]”,这意味着确保系统保持“易于使用”的最基本的方式常常是约束底层的数据模型。
管理
我从高中开始着手一些领导角色(学生会主席!),对承担更多的责任感到理所当然。同时,我一直为自己在每个管理阶段都坚持担任全职程序员而感到自豪。但 Office 的开发副总裁最终还是让我从事管理,离开了日常的编程工作。当我在去年离开那份工作时,我很享受重返编程——这是一个出奇地充满创造力的充实的活动(当修完“最后”的 bug 时,也许也会有一点令人沮丧)。
尽管在我加入微软前已经做了十多年的“主管”,但是到了 1996 年我加入微软才真正了解到管理。微软强调“工程领导是技术领导”。这与我的观点一致,帮助我接受并承担更大的管理责任。
主管的工作是设计项目并透明地推进项目。透明并不简单,它不是自动的,也不仅仅是有好的意愿就行。透明需要被设计进系统中去。透明工作的最好方式是能够记录每个工程师每天活动的产出,以此来追踪项目进度(完成任务,发现 bug 并修复,完成一个情景)。留意主观上的红/绿/黄,点赞或踩的仪表板。
我过去说我的工作是设计反馈回路。独立工程师,经理,行政人员,每一个项目的参与者都能通过分析记录的项目数据,推进项目,产出结果,了解自己在整个项目中扮演的角色。最终,透明化最终成为增强能力的一个很好的工具——管理者可以将更多的局部控制权给予那些最接近问题的人,因为他们对所取得的进展有信心。这样的话,合作自然就会出现。
Key to this is that the goal has actually been properly framed (including key resource constraints like ship schedule). Decision-making that needs to constantly flow up and down the management chain usually reflects poor framing of goals and constraints by management.
关键需要确定目标框架(包括关键资源的约束,如发布的时间表)。如果决策需要在管理链上下不断流动,那说明管理层对目标和约束的框架不好。
当我在 Beyond Software 工作时,我真正理解了一个项目拥有一个唯一领导的重要性。原来的项目经理离职了(后来从 FrontPage 雇佣了我)。我们四个主管在是否接任这个岗位上都有所犹豫,这不仅仅由于我们都不知道要在这家公司坚持多久。我们都技术高超,并且相处融洽,所以我们决定以同级的身份一起来领导这个项目。然而这槽糕透了。有一个显而易见的问题,我们没有相应的战略用来在原有的组织之间分配资源——这应当是管理者的首要职责之一!当你知道你是唯一的负责人时,你会有很深的责任感,但在这个例子中,这种责任感缺失了。我们没有真正的领导来负责统一目标和界定约束。
我有清晰地记得,我第一次充分认识到*倾听*对一个领导者的重要性。那时我刚刚担任了 WordOneNotePublisher 和 Text Services 团队的开发经理。关于我们如何组织文本服务团队,我们有一个很大的争议,我走到了每个关键参与者身边,听他们想说的话,然后整合起来,写下了我所听到的一切。当我向其中一位主要参与者展示我写下的东西时,他的反应是“哇,你真的听了我想说的话”!作为一名管理人员,我所经历的所有最大的问题(例如,跨平台和转型持续工程)涉及到仔细倾听所有的参与者。倾听是一个积极的过程,它包括:尝试以别人的角度去理解,然后写出我学到的东西,并对其进行测试,以验证我的理解。当一个关键的艰难决定需要发生的时候,在最终决定前,每个人都知道他们的想法都已经被听到并理解(不论他们是否同意最后的决定)。
在 FrontPage 担任开发经理的工作,让我理解了在只有部分信息的情况下做决定的“操作困境”。你等待的时间越长,你就会有更多的信息做出决定。但是等待的时间越长,实际执行的灵活性就越低。在某个时候,你仅需要做出决定。
设计一个组织涉及类似的两难情形。您希望增加资源领域以便可以在更大的一组资源上应用一致的优先级划分框架。但资源领域越大越难获得作出决定所需要的所有信息。组织设计就是要平衡这两个因素。软件复杂化因为软件的特点可以在任意维度切入设计。Office 已经使用[共享团队][12]来解决这两个问题(优先次序和资源),让跨领域的团队能与需要产品的团队分享工作(增加资源)。
随着管理阶梯的提升,你会懂一个小秘密:你和你的新同事不会因为你现在承担更多的责任,就突然变得更聪明。这强调了整个组织比顶层领导者更聪明。赋予每个级别在一致框架下拥有自己的决定是实现这一目标的关键方法。听取并使自己对组织负责,阐明和解释决策背后的原因是另一个关键策略。令人惊讶的是,害怕做出一个愚蠢的决定可能是一个有用的激励因素,以确保你清楚地阐明你的推理,并确保你听取所有的信息。
结语
我离开大学寻找第一份工作时,面试官在最后一轮面试时问我对做“系统”和做“应用”哪一个更感兴趣。我当时并没有真正理解这个问题。在软件技术栈的每一个层面都会有趣的难题,我很高兴深入研究这些问题。保持学习。
--------------------------------------------------------------------------------
via: https://hackernoon.com/education-of-a-programmer-aaecf2d35312
作者:[ Terry Crowley][a]
译者:[explosic4](https://github.com/explosic4)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://hackernoon.com/@terrycrowley
[1]:https://medium.com/@terrycrowley/leaky-by-design-7b423142ece0#.x67udeg0a
[2]:https://medium.com/@terrycrowley/how-to-think-about-cancellation-3516fc342ae#.3pfjc5b54
[3]:http://queue.acm.org/detail.cfm?id=2462076
[4]:http://web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf
[5]:https://medium.com/@terrycrowley/model-view-controller-and-loose-coupling-6370f76e9cde#.o4gnupqzq
[6]:https://en.wikipedia.org/wiki/Termcap
[7]:http://www.ll.mit.edu/HPEC/agendas/proc04/invited/patterson_keynote.pdf
[8]:https://en.wikipedia.org/wiki/Bufferbloat
[9]:http://worrydream.com/refs/Brooks-NoSilverBullet.pdf
[10]:https://en.wikipedia.org/wiki/NK_model
[11]:https://medium.com/@terrycrowley/the-math-of-easy-to-use-14645f819201#.untmk9eq7
[12]:https://medium.com/@terrycrowley/breaking-conways-law-a0fdf8500413#.gqaqf1c5k

Some files were not shown because too many files have changed in this diff Show More