@wxy
https://linux.cn/article-16430-1.html
This commit is contained in:
Xingyu Wang 2023-12-01 00:43:15 +08:00
parent 9545ba8abd
commit dab07be9eb
2 changed files with 279 additions and 302 deletions

View File

@ -0,0 +1,279 @@
[#]: subject: "git branches: intuition & reality"
[#]: via: "https://jvns.ca/blog/2023/11/23/branches-intuition-reality/"
[#]: author: "Julia Evans https://jvns.ca/"
[#]: collector: "lujun9972/lctt-scripts-1700446145"
[#]: translator: "ChatGPT"
[#]: reviewer: "wxy"
[#]: publisher: "wxy"
[#]: url: "https://linux.cn/article-16430-1.html"
Git 分支:直觉与现实
======
![][0]
你好!我一直在投入写作一本关于 Git 的小册,因此我对 Git 分支投入了许多思考。我不断从他人那里听说他们觉得 Git 分支的操作方式违反直觉。这使我开始思考:直觉上的分支概念可能是什么样,以及它如何与 Git 的实际操作方式区别开来?
在这篇文章中,我想简洁地讨论以下几点内容:
* 我认为许多人可能有的一个直觉性的思维模型
* Git 如何在内部实现分支的表示(例如,“分支是对提交的指针”)
* 这种“直觉模型”与实际操作方式之间的紧密关联
* 直觉模型的某些局限性,以及为何它可能引发问题
本文无任何突破性内容,我会尽量保持简洁。
### 分支的直观模型
当然,人们对分支有许多不同的直觉。我自己认为最符合“苹果树的一个分支”这一物理比喻的可能是下面这个。
我猜想许多人可能会这样理解 Git 分支:在下图中,两个红色的提交就代表一个“分支”。
![][1]
我认为在这个示意图中有两点很重要:
1. 分支上有两个提交
2. 分支有一个“父级”(`main`),它是这个“父级”的分支
虽然这个观点看似合理,但实际上它并不符合 Git 对于分支的定义 — 最重要的是Git 并没有一个分支的“父级”的概念。那么Git 又是如何定义分支的呢?
### 在 Git 里,分支是完整的历史
在 Git 中,一个分支是每个过去提交的完整历史记录,而不仅仅是那个“分支”提交。因此,在我们上述的示意图中,所有的分支(`main` 和 `branch`)都包含了 4 次提交。
我创建了一个示例仓库,地址为:<https://github.com/jvns/branch-example>。它设置的分支方式与前图一样。现在,我们来看看这两个分支:
`main` 分支包含了 4 次提交:
```
$ git log --oneline main
70f727a d
f654888 c
3997a46 b
a74606f a
```
`mybranch` 分支也有 4 次提交。最后两次提交在这两个分支里都存在。
```
$ git log --oneline mybranch
13cb960 y
9554dab x
3997a46 b
a74606f a
```
因此,`mybranch` 中的提交次数为 4而不仅仅是 2 次“分支”提交,即 `13cb960``9554dab`
你可以用以下方式让 Git 绘制出这两个分支的所有提交:
```
$ git log --all --oneline --graph
* 70f727a (HEAD -> main, origin/main) d
* f654888 c
| * 13cb960 (origin/mybranch, mybranch) y
| * 9554dab x
|/
* 3997a46 b
* a74606f a
```
### 分支以提交 ID 的形式存储
在 Git 的内部,分支会以一种微小的文本文件的形式存储下来,其中包含了一个提交 ID。这就是我一开始提及到的“技术上正确”的定义。这个提交就是分支上最新的提交。
我们来看一下示例仓库中 `main``mybranch` 的文本文件:
```
$ cat .git/refs/heads/main
70f727acbe9ea3e3ed3092605721d2eda8ebb3f4
$ cat .git/refs/heads/mybranch
13cb960ad86c78bfa2a85de21cd54818105692bc
```
这很好理解:`70f727` 是 `main` 上的最新提交,而 `13cb96``mybranch` 上的最新提交。
这样做的原因是,每个提交都包含一种指向其父级的指针,所以 Git 可以通过追踪这些指针链来找到分支上所有的提交。
正如我前文所述,这里遗漏的一个重要因素是这两个分支间的任何关联关系。从这里能看出,`mybranch` 是 `main` 的一个分支——这一点并没有被表明出来。
既然我们已经探讨了直观理解的分支概念是如何不成立的,我接下来想讨论的是,为何它在某些重要的方面又是如何成立的。
### 人们的直观感觉通常并非全然错误
我发现,告诉人们他们对 Git 的直觉理解是“错误的”的说法颇为流行。我觉得这样的说法有些可笑——总的来说,即使人们关于某个题目的直觉在某些方面在技术上不精确,但他们通常会有完全合理的理由来支持他们的直觉!即使是“不正确的”模型也可能极其有用。
现在,我们来讨论三种情况,其中直觉上的“分支”概念与我们实际在操作中如何使用 Git 非常相符。
### 变基操作使用的是“直观”的分支概念
现在,让我们回到最初的图片。
![][1]
当你在 `main` 上对 `mybranch` 执行 <ruby>变基<rt>rebase</rt></ruby> 操作时,它将取出“直观”分支上的提交(只有两个红色的提交)然后将它们应用到 `main` 上。
执行结果就是,只有两次提交(`x` 和 `y`)被复制。以下是相关操作的样子:
```
$ git switch mybranch
$ git rebase main
$ git log --oneline mybranch
952fa64 (HEAD -> mybranch) y
7d50681 x
70f727a (origin/main, main) d
f654888 c
3997a46 b
a74606f a
```
在此,`git rebase` 创建了两个新的提交(`952fa64` 和 `7d50681`),这两个提交的信息来自之前的两个 `x``y` 提交。
所以直觉上的模型并不完全错误!它很精确地告诉你在变基中发生了什么。
但因为 Git 不知道 `mybranch``main` 的一个分叉,你需要显式地告诉它在何处进行变基。
### 合并操作也使用了“直观”的分支概念
合并操作并不复制提交,但它们确实需要一个“<ruby>基础<rt>base</rt></ruby>”提交:合并的工作原理是查看两组更改(从共享基础开始),然后将它们合并。
我们撤销刚才完成的变基操作,然后看看合并基础是什么。
```
$ git switch mybranch
$ git reset --hard 13cb960 # 撤销 rebase
$ git merge-base main mybranch
3997a466c50d2618f10d435d36ef12d5c6f62f57
```
这里我们获得了分支分离出来的“基础”提交,也就是 `3997a4`。这正是你可能会基于我们的直观图片想到的提交。
### GitHub 的拉取请求也使用了直观的概念
如果我们在 GitHub 上创建一个拉取请求,打算将 `mybranch` 合并到 `main`,这个请求会展示出两次提交:也就是 `x``y`。这完全符合我们的预期,也和我们对分支的直观认识相符。
![][2]
我想,如果你在 GitLab 上发起一个合并请求,那显示的内容应该会与此类似。
### 直观理解颇为精准,但它有一定局限性
这使我们的对分支直观定义看起来相当准确!这个“直观”的概念和合并、变基操作以及 GitHub 拉取请求的工作方式完全吻合。
当你在进行合并、变基或创建拉取请求时,你需要明确指定另一个分支(如 `git rebase main`),因为 Git 不知道你的分支是基于哪个分支的。
然而,关于分支的直观理解有一个比较严重的问题:你直觉上认为 `main` 分支和某个分离的分支有很大的区别,但 Git 并不清楚这点。
所以,现在我们要来讨论一下 Git 分支的不同种类。
### 主干和派生分支
对于人类来说,`main` 和 `mybranch` 有着显著的区别,你可能针对如何使用它们,有着截然不同的意图。
通常,我们会将某些分支视为“<ruby>主干<rt>trunk</rt></ruby>”分支,同时将其他一些分支看作是“派生”。你甚至可能有派生的派生分支。
当然Git 自身并没有这样的区分(“派生”是我刚刚构造的术语!),但是分支的种类确实会影响你如何处理它。
例如:
* 你可能会想将 `mybranch` 变基到 `main`,但你大概不会想将 `main` 变基到 `mybranch` —— 那就太奇怪了!
* 一般来说,人们在重写“主干”分支的历史时比短期存在的派生分支更为谨慎。
### Git 允许你进行“反向”的变基
我认为人们经常对 Git 感到困惑的一点是 —— 由于 Git 并没有分支是否是另一个分支的“派生”的概念,它不会给你任何关于何时合适将分支 X 变基到分支 Y 的指引。这一切需要你自己去判断。
例如,你可以执行以下命令:
```
$ git checkout main
$ git rebase mybranch
```
或者
```
$ git checkout mybranch
$ git rebase main
```
Git 将会欣然允许你进行任一操作,尽管在这个案例中 `git rebase main` 是极其正常的,而 `git rebase mybranch` 则显得格外奇怪。许多人表示他们对此感到困惑,所以我提供了一个展示两种变基类型的图片以供参考:
![][3]
相似地,你可以进行“反向”的合并,尽管这相较于反向变基要正常得多——将 `mybranch` 合并到 `main` 和将 `main` 合并到 `mybranch` 都有各自的益处。
下面是一个展示你可以进行的两种合并方式的示意图:
![][4]
### Git 对于分支之间缺乏层次结构感觉有些奇怪
我经常听到 “`main` 分支没什么特别的” 的表述,而这令我感到困惑——对于我来说,我处理的大部分仓库里,`main` 无疑是非常特别的!那么人们为何会称其为不特别呢?
我觉得,重点在于:尽管分支确实存在彼此间的关系(`main` 通常是非常特别的!),但 Git 并不知情这些关系。
每当你执行如 `git rebase``git merge` 这样的 `git` 命令时,你都必须明确地告诉 Git 分支间的关系,如果你出错,结果可能会相当混乱。
我不知道 Git 在此方面的设计究竟“对”还是“错”(无疑它有利有弊,而我已对无休止的争论感到厌倦),但我认为,这对于许多人来说,原因在于它有些出人意料。
### Git 关于分支的用户界面也同样怪异
假设你只想查看某个分支上的“派生”提交,正如我们之前讨论的,这是完全正常的需求。
下面是用 `git log` 查看我们分支上的两次派生提交的方法:
```
$ git switch mybranch
$ git log main..mybranch --oneline
13cb960 (HEAD -> mybranch, origin/mybranch) y
9554dab x
```
你可以用 `git diff` 这样查看同样两次提交的合并差异:
```
$ git diff main...mybranch
```
因此,如果你想使用 `git log` 查看 `x``y` 这两次提交,你需要用到两个点(`..`),但查看同样的提交使用 `git diff`,你却需要用到三个点(`...`)。
我个人从来都记不住 `..``...` 的具体用意,所以我通常虽然它们在原则上可能很有用,但我选择尽量避免使用它们。
### 在 GitHub 上,默认分支具有特殊性
同样值得一提的是,在 GitHub 上存在一种“特殊的分支”:每一个 GitHub 仓库都有一个“默认分支”(在 Git 术语中,就是 `HEAD` 所指向的地方),具有以下的特别之处:
* 初次克隆仓库时,默认会检出这个分支
* 它作为拉取请求的默认接收分支
* GitHub 建议应该保护这个默认分支,防止被强制推送,等等。
很可能还有许多我未曾想到的场景。
### 总结
这些说法在回顾时看似是显而易见的,但实际上我花费了大量时间去搞清楚一个更“直观”的分支概念,这是因为我已经习惯了技术性的定义,“分支是对某次提交的引用”。
同样,我也没有真正去思索过如何在每次执行 `git rebase``git merge` 命令时,让 Git 明确理解你分支之间的层次关系——对我而言,这已经成为第二天性,并没有觉得有何困扰。但当我反思这个问题时,可以明显看出,这很容易导致某些人混淆。
*题图MJ/a5a52832-fac8-4190-b3bd-fec70166aa16*
--------------------------------------------------------------------------------
via: https://jvns.ca/blog/2023/11/23/branches-intuition-reality/
作者:[Julia Evans][a]
选题:[lujun9972][b]
译者:[ChatGPT](https://linux.cn/lctt/ChatGPT)
校对:[wxy](https://github.com/wxy)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://jvns.ca/
[b]: https://github.com/lujun9972
[1]: https://jvns.ca/images/git-branch.png
[2]: https://jvns.ca/images/gh-pr.png
[3]: https://jvns.ca/images/backwards-rebase.png
[4]: https://jvns.ca/images/merge-two-ways.png
[0]: https://img.linux.net.cn/data/attachment/album/202312/01/004025i72vi4t0o7027cyf.png

View File

@ -1,302 +0,0 @@
[#]: subject: "git branches: intuition & reality"
[#]: via: "https://jvns.ca/blog/2023/11/23/branches-intuition-reality/"
[#]: author: "Julia Evans https://jvns.ca/"
[#]: collector: "lujun9972/lctt-scripts-1700446145"
[#]: translator: " "
[#]: reviewer: " "
[#]: publisher: " "
[#]: url: " "
git branches: intuition & reality
======
Hello! Ive been working on writing a zine about git so Ive been thinking about git branches a lot. I keep hearing from people that they find the way git branches work to be counterintuitive. It got me thinking: what might an “intuitive” notion of a branch be, and how is it different from how git actually works?
So in this post I want to briefly talk about
* an intuitive mental model I think many people have
* how git actually represents branches internally (“branches are a pointer to a commit” etc)
* how the “intuitive model” and the real way it works are actually pretty closely related
* some limits of the intuitive model and why it might cause problems
Nothing in this post is remotely groundbreaking so Im going to try to keep it pretty short.
### an intuitive model of a branch
Of course, people have many different intuitions about branches. Heres the one that I think corresponds most closely to the physical “a branch of an apple tree” metaphor.
My guess is that a lot of people think about a git branch like this: the 2 commits in pink in this picture are on a “branch”.
![][1]
I think there are two important things about this diagram:
1. the branch has 2 commits on it
2. the branch has a “parent” (`main`) which its an offshoot of
That seems pretty reasonable, but thats not how git defines a branch most importantly, git doesnt have any concept of a branchs “parent”. So how does git define a branch?
### in git, a branch is the full history
In git, a branch is the full history of every previous commit, not just the “offshoot” commits. So in our picture above both branches (`main` and `branch`) have 4 commits on them.
I made an example repository at <https://github.com/jvns/branch-example> which has its branches set up the same way as in the picture above. Lets look at the 2 branches:
`main` has 4 commits on it:
```
$ git log --oneline main
70f727a d
f654888 c
3997a46 b
a74606f a
```
and `mybranch` has 4 commits on it too. The bottom two commits are shared between both branches.
```
$ git log --oneline mybranch
13cb960 y
9554dab x
3997a46 b
a74606f a
```
So `mybranch` has 4 commits on it, not just the 2 commits `13cb960` and `9554dab` that are “offshoot” commits.
You can get git to draw all the commits on both branches like this:
```
$ git log --all --oneline --graph
* 70f727a (HEAD -> main, origin/main) d
* f654888 c
| * 13cb960 (origin/mybranch, mybranch) y
| * 9554dab x
|/
* 3997a46 b
* a74606f a
```
### a branch is stored as a commit ID
Internally in git, branches are stored as tiny text files which have a commit ID in them. That commit is the latest commit on the branch. This is the “technically correct” definition I was talking about at the beginning.
Lets look at the text files for `main` and `mybranch` in our example repo:
```
$ cat .git/refs/heads/main
70f727acbe9ea3e3ed3092605721d2eda8ebb3f4
$ cat .git/refs/heads/mybranch
13cb960ad86c78bfa2a85de21cd54818105692bc
```
This makes sense: `70f727` is the latest commit on `main` and `13cb96` is the latest commit on `mybranch`.
The reason this works is that every commit contains a pointer to its parent(s), so git can follow the chain of pointers to get every commit on the branch.
Like I mentioned before, the thing thats missing here is any relationship at all between these two branches. Theres no indication that `mybranch` is an offshoot of `main`.
Now that weve talked about how the intuitive notion of a branch is “wrong”, I want to talk about how its also right in some very important ways.
### peoples intuition is usually not that wrong
I think its pretty popular to tell people that their intuition about git is “wrong”. I find that kind of silly in general, even if peoples intuition about a topic is technically incorrect in some ways, people usually have the intuition they do for very legitimate reasons! “Wrong” models can be super useful.
So lets talk about 3 ways the intuitive “offshoot” notion of a branch matches up very closely with how we actually use git in practice.
### rebases use the “intuitive” notion of a branch
Now lets go back to our original picture.
![][1]
When you rebase `mybranch` on `main`, it takes the commits on the “intuitive” branch (just the 2 pink commits) and replays them onto `main`.
The result is that just the 2 (`x` and `y`) get copied. Heres what that looks like:
```
$ git switch mybranch
$ git rebase main
$ git log --oneline mybranch
952fa64 (HEAD -> mybranch) y
7d50681 x
70f727a (origin/main, main) d
f654888 c
3997a46 b
a74606f a
```
Here `git rebase` has created two new commits (`952fa64` and `7d50681`) whose information comes from the previous two `x` and `y` commits.
So the intuitive model isnt THAT wrong! It tells you exactly what happens in a rebase.
But because git doesnt know that `mybranch` is an offshoot of `main`, you need to tell it explicitly where to rebase the branch.
### merges use the “intuitive” notion of a branch too
Merges dont copy commits, but they do need a “base” commit: the way merges work is that it looks at two sets of changes (starting from the shared base) and then merges them.
Lets undo the rebase we just did and then see what the merge base is.
```
$ git switch mybranch
$ git reset --hard 13cb960 # undo the rebase
$ git merge-base main mybranch
3997a466c50d2618f10d435d36ef12d5c6f62f57
```
This gives us the “base” commit where our branch branched off, `3997a4`. Thats exactly the commit you would think it might be based on our intuitive picture.
### github pull requests also use the intuitive idea
If we create a pull request on GitHub to merge `mybranch` into `main`, itll also show us 2 commits: the commits `x` and `y`. That makes sense and also matches our intuitive notion of a branch.
![][2]
I assume if you make a merge request on GitLab it shows you something similar.
### intuition is pretty good, but it has some limits
This leaves our intuitive definition of a branch looking pretty good actually! The “intuitive” idea of what a branch is matches exactly with how merges and rebases and GitHub pull requests work.
You do need to explicitly specify the other branch when merging or rebasing or making a pull request (like `git rebase main`), because git doesnt know what branch you think your offshoot is based on.
But the intuitive notion of a branch has one fairly serious problem: the way you intuitively think about `main` and an offshoot branch are very different, and git doesnt know that.
So lets talk about the different kinds of git branches.
### trunk and offshoot branches
To a human, `main` and `mybranch` are pretty different, and you probably have pretty different intentions around how you want to use them.
I think its pretty normal to think of some branches as being “trunk” branches, and some branches as being “offshoots”. Also you can have an offshoot of an offshoot.
Of course, git itself doesnt make any such distinctions (the term “offshoot” is one I just made up!), but what kind of a branch it is definitely affects how you treat it.
For example:
* you might rebase `mybranch` onto `main` but you probably wouldnt rebase `main` onto `mybranch` that would be weird!
* in general people are much more careful around rewriting the history on “trunk” branches than short-lived offshoot branches
### git lets you do rebases “backwards”
One thing I think throws people off about git is because git doesnt have any notion of whether a branch is an “offshoot” of another branch, it wont give you any guidance about if/when its appropriate to rebase branch X on branch Y. You just have to know.
for example, you can do either:
```
$ git checkout main
$ git rebase mybranch
```
or
```
$ git checkout mybranch
$ git rebase main
```
Git will happily let you do either one, even though in this case `git rebase main` is extremely normal and `git rebase mybranch` is pretty weird. A lot of people said they found this confusing so heres a picture of the two kinds of rebases:
![][3]
Similarly, you can do merges “backwards”, though thats much more normal than doing a backwards rebase merging `mybranch` into `main` and `main` into `mybranch` are both useful things to do for different reasons.
Heres a diagram of the two ways you can merge:
![][4]
### gits lack of hierarchy between branches is a little weird
I hear the statement “the `main` branch is not special” a lot and Ive been puzzled about it in most of the repositories I work in, `main` **is** pretty special! Why are people saying its not?
I think the point is that even though branches **do** have relationships between them (`main` is often special!), git doesnt know anything about those relationships.
You have to tell git explicitly about the relationship between branches every single time you run a git command like `git rebase` or `git merge`, and if you make a mistake things can get really weird.
I dont know whether gits design here is “right” or “wrong” (it definitely has some pros and cons, and Im very tired of reading endless arguments about it), but I do think its surprising to a lot of people for good reason.
### gits UI around branches is weird too
Lets say you want to look at just the “offshoot” commits on a branch, which as weve discussed is a completely normal thing to want.
Heres how to see just the 2 offshoot commits on our branch with `git log`:
```
$ git switch mybranch
$ git log main..mybranch --oneline
13cb960 (HEAD -> mybranch, origin/mybranch) y
9554dab x
```
You can look at the combined diff for those same 2 commits with `git diff` like this:
```
$ git diff main...mybranch
```
So to see the 2 commits `x` and `y` with `git log`, you need to use 2 dots (`..`), but to look at the same commits with `git diff`, you need to use 3 dots (`...`).
Personally I can never remember what `..` and `...` mean so I just avoid them completely even though in principle they seem useful.
### in GitHub, the default branch is special
Also, its worth mentioning that GitHub does have a “special branch”: every github repo has a “default branch” (in git terms, its what `HEAD` points at), which is special in the following ways:
* its what you check out when you `git clone` the repository
* its the default destination for pull requests
* github will suggest that you protect the default branch from force pushes
and probably even more that Im not thinking of.
### thats all!
This all seems extremely obvious in retrospect, but it took me a long time to figure out what a more “intuitive” idea of a branch even might be because I was so used to the technical “a branch is a reference to a commit” definition.
I also hadnt really thought about how git makes you tell it about the hierarchy between your branches every time you run a `git rebase` or `git merge` command for me its second nature to do that and its not a big deal, but now that Im thinking about it, its pretty easy to see how somebody could get mixed up.
--------------------------------------------------------------------------------
via: https://jvns.ca/blog/2023/11/23/branches-intuition-reality/
作者:[Julia Evans][a]
选题:[lujun9972][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://jvns.ca/
[b]: https://github.com/lujun9972
[1]: https://jvns.ca/images/git-branch.png
[2]: https://jvns.ca/images/gh-pr.png
[3]: https://jvns.ca/images/backwards-rebase.png
[4]: https://jvns.ca/images/merge-two-ways.png