TranslateProject/sources/tech/20180427 An Official Introduction to the Go Compiler.md
2018-08-07 12:56:12 +08:00

7.9 KiB
Raw Blame History

// Copyright 2018 The Go Authors. All rights reserved. // Use of this source code is governed by a BSD-style // license that can be found in the LICENSE file.

Introduction to the Go compiler

Go编译器介绍

cmd/compile contains the main packages that form the Go compiler. The compiler may be logically split in four phases, which we will briefly describe alongside the list of packages that contain their code.

cmd/compile 包含构成 Go 编译器主要的包。编译器在逻辑上可以被分为四个阶段,我们将简要介绍这几个阶段以及包含相应代码的包的列表。

You may sometimes hear the terms "front-end" and "back-end" when referring to the compiler. Roughly speaking, these translate to the first two and last two phases we are going to list here. A third term, "middle-end", often refers to much of the work that happens in the second phase.

在谈到编译器时,有时可能会听到“前端”和“后端”这两个术语。粗略地说,这些对应于我们将在此列出的前两个和后两个阶段。第三个术语“中间端”通常指的是第二阶段执行的大部分工作。

Note that the go/* family of packages, such as go/parser and go/types, have no relation to the compiler. Since the compiler was initially written in C, the go/* packages were developed to enable writing tools working with Go code, such as gofmt and vet.

请注意,go/parsergo/typesgo/* 系列包与编译器无关。由于编译器最初是用C编写的所以这些 go/* 包被开发出来以便于能够写出和 Go 代码一起工作的工具,例如 gofmtvet

It should be clarified that the name "gc" stands for "Go compiler", and has little to do with uppercase GC, which stands for garbage collection.

需要澄清的是名称“gc”代表“Go 编译器”,与大写 GC 无关,后者代表垃圾收集。

1. Parsing

1. 解析

  • cmd/compile/internal/syntax (lexer, parser, syntax tree)

  • cmd/compile/internal/syntax (词法分析器、解析器、语法树)

In the first phase of compilation, source code is tokenized (lexical analysis), parsed (syntactic analyses), and a syntax tree is constructed for each source file.

在编译的第一阶段源代码被标记化词法分析解析语法分析并为每个源文件构造语法树译注这里标记指token它是一组预定义、能够识别的字符串通常由名字和值构成其中名字一般是词法的类别如标识符、关键字、分隔符、操作符、文字和注释等

Each syntax tree is an exact representation of the respective source file, with nodes corresponding to the various elements of the source such as expressions, declarations, and statements. The syntax tree also includes position information which is used for error reporting and the creation of debugging information.

每棵语法树都是相应源文件的确切表示,其中节点对应于源文件的各种元素,例如表达式,声明和语句。语法树还包括位置信息,用于错误报告和创建调试信息。

2. Type-checking and AST transformations

2. 类型检查和AST变形

  • cmd/compile/internal/gc (create compiler AST, type checking, AST transformations)

  • cmd/compile/internal/gc 创建编译器AST类型检查AST变形

The gc package includes an AST definition carried over from when it was written in C. All of its code is written in terms of it, so the first thing that the gc package must do is convert the syntax package's syntax tree to the compiler's AST representation. This extra step may be refactored away in the future.

gc 包中包含一个继承自早期C 语言实现的版本的 AST 定义。所有代码都是根据该 AST 编写的,所以 gc 包必须做的第一件事就是将 syntax 包(定义)的语法树转换为编译器的 AST 表示法。这个额外步骤可能会在将来重构译注ASTAbstract Syntax Tree抽象语法树用树来表达程序设计语言的语法结构通常叶子节点是操作数其它节点是操作码

The AST is then type-checked. The first steps are name resolution and type inference, which determine which object belongs to which identifier, and what type each expression has. Type-checking includes certain extra checks, such as "declared and not used" as well as determining whether or not a function terminates.

Certain transformations are also done on the AST. Some nodes are refined based on type information, such as string additions being split from the arithmetic addition node type. Some other examples are dead code elimination, function call inlining, and escape analysis.

3. Generic SSA

  • cmd/compile/internal/gc (converting to SSA)
  • cmd/compile/internal/ssa (SSA passes and rules)

In this phase, the AST is converted into Static Single Assignment (SSA) form, a lower-level intermediate representation with specific properties that make it easier to implement optimizations and to eventually generate machine code from it.

During this conversion, function intrinsics are applied. These are special functions that the compiler has been taught to replace with heavily optimized code on a case-by-case basis.

Certain nodes are also lowered into simpler components during the AST to SSA conversion, so that the rest of the compiler can work with them. For instance, the copy builtin is replaced by memory moves, and range loops are rewritten into for loops. Some of these currently happen before the conversion to SSA due to historical reasons, but the long-term plan is to move all of them here.

Then, a series of machine-independent passes and rules are applied. These do not concern any single computer architecture, and thus run on all GOARCH variants.

Some examples of these generic passes include dead code elimination, removal of unneeded nil checks, and removal of unused branches. The generic rewrite rules mainly concern expressions, such as replacing some expressions with constant values, and optimizing multiplications and float operations.

4. Generating machine code

  • cmd/compile/internal/ssa (SSA lowering and arch-specific passes)
  • cmd/internal/obj (machine code generation)

The machine-dependent phase of the compiler begins with the "lower" pass, which rewrites generic values into their machine-specific variants. For example, on amd64 memory operands are possible, so many load-store operations may be combined.

Note that the lower pass runs all machine-specific rewrite rules, and thus it currently applies lots of optimizations too.

Once the SSA has been "lowered" and is more specific to the target architecture, the final code optimization passes are run. This includes yet another dead code elimination pass, moving values closer to their uses, the removal of local variables that are never read from, and register allocation.

Other important pieces of work done as part of this step include stack frame layout, which assigns stack offsets to local variables, and pointer liveness analysis, which computes which on-stack pointers are live at each GC safe point.

At the end of the SSA generation phase, Go functions have been transformed into a series of obj.Prog instructions. These are passed to the assembler (cmd/internal/obj), which turns them into machine code and writes out the final object file. The object file will also contain reflect data, export data, and debugging information.

Further reading

To dig deeper into how the SSA package works, including its passes and rules, head to cmd/compile/internal/ssa/README.md.


via: https://github.com/golang/go/blob/master/src/cmd/compile/README.md

作者:mvdan 译者:译者ID 校对:校对者ID

本文由 LCTT 原创编译,Linux中国 荣誉推出