Merge pull request #9891 from stephenxs/patch-3

20180427 An Official Introduction to the Go Compiler.md tranlasted.
This commit is contained in:
Chang Liu 2018-08-20 12:41:24 +08:00 committed by GitHub
commit 7ec7c6191c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 82 additions and 131 deletions

View File

@ -1,131 +0,0 @@
// Copyright 2018 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
## Introduction to the Go compiler
`cmd/compile` contains the main packages that form the Go compiler. The compiler
may be logically split in four phases, which we will briefly describe alongside
the list of packages that contain their code.
You may sometimes hear the terms "front-end" and "back-end" when referring to
the compiler. Roughly speaking, these translate to the first two and last two
phases we are going to list here. A third term, "middle-end", often refers to
much of the work that happens in the second phase.
Note that the `go/*` family of packages, such as `go/parser` and `go/types`,
have no relation to the compiler. Since the compiler was initially written in C,
the `go/*` packages were developed to enable writing tools working with Go code,
such as `gofmt` and `vet`.
It should be clarified that the name "gc" stands for "Go compiler", and has
little to do with uppercase GC, which stands for garbage collection.
### 1. Parsing
* `cmd/compile/internal/syntax` (lexer, parser, syntax tree)
In the first phase of compilation, source code is tokenized (lexical analysis),
parsed (syntactic analyses), and a syntax tree is constructed for each source
file.
Each syntax tree is an exact representation of the respective source file, with
nodes corresponding to the various elements of the source such as expressions,
declarations, and statements. The syntax tree also includes position information
which is used for error reporting and the creation of debugging information.
### 2. Type-checking and AST transformations
* `cmd/compile/internal/gc` (create compiler AST, type checking, AST transformations)
The gc package includes an AST definition carried over from when it was written
in C. All of its code is written in terms of it, so the first thing that the gc
package must do is convert the syntax package's syntax tree to the compiler's
AST representation. This extra step may be refactored away in the future.
The AST is then type-checked. The first steps are name resolution and type
inference, which determine which object belongs to which identifier, and what
type each expression has. Type-checking includes certain extra checks, such as
"declared and not used" as well as determining whether or not a function
terminates.
Certain transformations are also done on the AST. Some nodes are refined based
on type information, such as string additions being split from the arithmetic
addition node type. Some other examples are dead code elimination, function call
inlining, and escape analysis.
### 3. Generic SSA
* `cmd/compile/internal/gc` (converting to SSA)
* `cmd/compile/internal/ssa` (SSA passes and rules)
In this phase, the AST is converted into Static Single Assignment (SSA) form, a
lower-level intermediate representation with specific properties that make it
easier to implement optimizations and to eventually generate machine code from
it.
During this conversion, function intrinsics are applied. These are special
functions that the compiler has been taught to replace with heavily optimized
code on a case-by-case basis.
Certain nodes are also lowered into simpler components during the AST to SSA
conversion, so that the rest of the compiler can work with them. For instance,
the copy builtin is replaced by memory moves, and range loops are rewritten into
for loops. Some of these currently happen before the conversion to SSA due to
historical reasons, but the long-term plan is to move all of them here.
Then, a series of machine-independent passes and rules are applied. These do not
concern any single computer architecture, and thus run on all `GOARCH` variants.
Some examples of these generic passes include dead code elimination, removal of
unneeded nil checks, and removal of unused branches. The generic rewrite rules
mainly concern expressions, such as replacing some expressions with constant
values, and optimizing multiplications and float operations.
### 4. Generating machine code
* `cmd/compile/internal/ssa` (SSA lowering and arch-specific passes)
* `cmd/internal/obj` (machine code generation)
The machine-dependent phase of the compiler begins with the "lower" pass, which
rewrites generic values into their machine-specific variants. For example, on
amd64 memory operands are possible, so many load-store operations may be combined.
Note that the lower pass runs all machine-specific rewrite rules, and thus it
currently applies lots of optimizations too.
Once the SSA has been "lowered" and is more specific to the target architecture,
the final code optimization passes are run. This includes yet another dead code
elimination pass, moving values closer to their uses, the removal of local
variables that are never read from, and register allocation.
Other important pieces of work done as part of this step include stack frame
layout, which assigns stack offsets to local variables, and pointer liveness
analysis, which computes which on-stack pointers are live at each GC safe point.
At the end of the SSA generation phase, Go functions have been transformed into
a series of obj.Prog instructions. These are passed to the assembler
(`cmd/internal/obj`), which turns them into machine code and writes out the
final object file. The object file will also contain reflect data, export data,
and debugging information.
### Further reading
To dig deeper into how the SSA package works, including its passes and rules,
head to `cmd/compile/internal/ssa/README.md`.
--------------------------------------------------------------------------------
via: https://github.com/golang/go/blob/master/src/cmd/compile/README.md
作者:[mvdan ][a]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://github.com/mvdan

View File

@ -0,0 +1,82 @@
// Copyright 2018 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
## Go编译器介绍
`cmd/compile` 包含构成 Go 编译器主要的包。编译器在逻辑上可以被分为四个阶段,我们将简要介绍这几个阶段以及包含相应代码的包的列表。
在谈到编译器时,有时可能会听到“前端”和“后端”这两个术语。粗略地说,这些对应于我们将在此列出的前两个和后两个阶段。第三个术语“中间端”通常指的是第二阶段执行的大部分工作。
请注意,`go/parser` 和 `go/types``go/*` 系列包与编译器无关。由于编译器最初是用C编写的所以这些 `go/*` 包被开发出来以便于能够写出和 `Go` 代码一起工作的工具,例如 `gofmt``vet`
需要澄清的是名称“gc”代表“Go 编译器”,与大写 GC 无关,后者代表垃圾收集。
### 1. 解析
* `cmd/compile/internal/syntax` (词法分析器、解析器、语法树)
在编译的第一阶段源代码被标记化词法分析解析语法分析并为每个源文件构造语法树译注这里标记指token它是一组预定义、能够识别的字符串通常由名字和值构成其中名字一般是词法的类别如标识符、关键字、分隔符、操作符、文字和注释等语法树以及下文提到的ASTAbstract Syntax Tree抽象语法树是指用树来表达程序设计语言的语法结构通常叶子节点是操作数其它节点是操作码
每棵语法树都是相应源文件的确切表示,其中节点对应于源文件的各种元素,例如表达式,声明和语句。语法树还包括位置信息,用于错误报告和创建调试信息。
### 2. 类型检查和AST变形
* `cmd/compile/internal/gc` 创建编译器AST类型检查AST变形
gc 包中包含一个继承自早期C 语言实现的版本的 AST 定义。所有代码都是基于该 AST 编写的,所以 gc 包必须做的第一件事就是将 syntax 包(定义)的语法树转换为编译器的 AST 表示法。这个额外步骤可能会在将来重构。
然后对 AST 进行类型检查。第一步是名字解析和类型推断,它们确定哪个对象属于哪个标识符,以及每个表达式具有的类型。类型检查包括特定的额外检查,例如“声明但未使用”以及确定函数是否会终止。
特定转换也基于 AST 上完成。一些节点被基于类型信息而细化,例如把字符串加法从算术加法的节点类型中拆分出来。其他一些例子是死代码消除,函数调用内联和逃逸分析(译注:逃逸分析是一种分析指针有效范围的方法)。
### 3. 通用SSA
* `cmd/compile/internal/gc` (转换成 SSA
* `cmd/compile/internal/ssa` SSA 相关的 pass 和规则)
(译注:许多常见高级语言的编译器无法通过一次扫描源代码或 AST 就完成所有编译工作,取而代之的做法是多次扫描,每次完成一部分工作,并将输出结果作为下次扫描的输入,直到最终产生目标代码。这里每次扫描称作一遍,即 pass最后一遍之前所有的 pass 得到的结果都可称作中间表示法,本文中 AST、SSA 等都属于中间表示法。SSA静态单赋值形式是中间表示法的一种性质它要求每个变量只被赋值一次且在使用前被定义
在此阶段AST 将被转换为静态单赋值形式SSA形式这是一种具有特定属性的低级中间表示法可以更轻松地实现优化并最终从它生成机器代码。
在这个转换过程中,将完成内置函数的处理。 这些是特殊的函数,编译器被告知逐个分析这些函数并决定是否用深度优化的代码替换它们(译注:内置函数指由语言本身定义的函数,通常编译器的处理方式是使用相应实现函数的指令序列代替对函数的调用指令,有点类似内联函数)。
在 AST 转化成 SSA 的过程中特定节点也被低级化为更简单的组件以便于剩余的编译阶段可以基于它们工作。例如内建的拷贝被替换为内存移动range循环被改写为for循环。由于历史原因目前这里面有些在转化到 SSA 之前发生,但长期计划则是把它们都移到这里(转化 SSA
然后一系列机器无关的规则和pass会被执行。这些并不考虑特定计算机体系结构因此对所有 `GOARCH` 变量的值都会运行。
这类通用的 pass 的一些例子包括,死代码消除,移除不必要的空指针检查,以及移除无用的分支等。通用改写规则主要考虑表达式,例如将一些表达式替换为常量,优化乘法和浮点操作。
### 4. 生成机器码
* `cmd/compile/internal/ssa` SSA 低级化和体系结构特定的pass
* `cmd/internal/obj` (机器代码生成)
编译器中机器相关的阶段开始于“低级”的 pass该阶段将通用变量改写为它们的机器相关变形形式。例如在 amd64 体系结构中操作数可以在内存中,这样许多装载-存储操作就可以被合并。
注意低级的 pass 运行所有机器特定的重写规则,因此它也应用了很多优化。
一旦 SSA 被“低级化”并且更具体地针对目标体系结构,就要运行最终代码优化的 pass 了。这包含了另外一个死代码消除的 pass它将变量移动到更靠近它们使用的地方移除从来没有被读过的局部变量以及寄存器分配。
本步骤中完成的其它重要工作包括堆栈布局,它将指定局部变量在堆栈中的偏移位置,以及指针活性分析,后者计算每个垃圾收集安全点上的哪些堆栈上的指针仍然是活动的。
在 SSA 生成阶段结束时Go 函数已被转换为一系列 obj.Prog 指令。它们被传递给汇编程序(`cmd/internal/obj`),后者将它们转换为机器代码并输出最终的目标文件。目标文件还将包含反射数据,导出数据和调试信息。
### 后续读物
要深入了解 SSA 包的工作方式,包括它的 pass 和规则,请转到 cmd/compile/internal/ssa/README.md。
--------------------------------------------------------------------------------
via: https://github.com/golang/go/blob/master/src/cmd/compile/README.md
作者:[mvdan ][a]
译者:[stephenxs](https://github.com/stephenxs)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://github.com/mvdan