TranslateProject/sources/tech/20231106 git rebase- what can go wrong.md
DarkSun 1426a03792 选题[tech]: 20231106 git rebase: what can go wrong?
sources/tech/20231106 git rebase- what can go wrong.md
2023-11-07 05:03:20 +08:00

19 KiB
Raw Permalink Blame History

git rebase: what can go wrong?

Hello! While talking with folks about Git, Ive been seeing a comment over and over to the effect of “I hate rebase”. People seemed to feel pretty strongly about this, and I was really surprised because I dont run into a lot of problems with rebase and I use it all the time.

Ive found that if many people have a very strong opinion thats different from mine, usually its because they have different experiences around that thing from me.

So I asked on Mastodon:

today Im thinking about the tradeoffs of using git rebase a bit. I think the goal of rebase is to have a nice linear commit history, which is something I like.

but what are the costs of using rebase? what problems has it caused for you in practice? Im really only interested in specific bad experiences youve had here not opinions or general statements like “rewriting history is bad”

I got a huge number of incredible answers to this, and Im going to do my best to summarize them here. Ill also mention solutions or workarounds to those problems in cases where I know of a solution. Heres the list:

My goal with this isnt to convince anyone that rebase is bad and you shouldnt use it (Im certainly going to keep using rebase!). But seeing all these problems made me want to be more cautious about recommending rebase to newcomers without explaining how to use it safely. It also makes me wonder if theres an easier workflow for cleaning up your commit history thats harder to accidentally mess up.

my git workflow assumptions

First, I know that people use a lot of different Git workflows. Im going to be talking about the workflow Im used to when working on a team, which is:

  • the team uses a central Github/Gitlab repo to coordinate
  • theres one central main branch. Its protected from force pushes.
  • people write code in feature branches and make pull requests to main
  • The web service is deployed from main every time a pull request is merged.
  • the only way to make a change to main is by making a pull request on Github/Gitlab and merging it

This is not the only “correct” git workflow (its a very “we run a web service” workflow and open source project or desktop software with releases generally use a slightly different workflow). But its what I know so thats what Ill talk about.

two kinds of rebase

Also before we start: one big thing I noticed is that there were 2 different kinds of rebase that kept coming up, and only one of them requires you to deal with merge conflicts.

  1. rebasing on an ancestor , like git rebase -i HEAD^^^^^^^ to squash many small commits into one. As long as youre just squashing commits, youll never have to resolve a merge conflict while doing this.
  2. rebasing onto a branch that has diverged , like git rebase main. This can cause merge conflicts.

I think its useful to make this distinction because sometimes Im thinking about rebase type 1 (which is a lot less likely to cause problems), but people who are struggling with it are thinking about rebase type 2.

Now lets move on to all the problems!

fixing the same conflict repeatedly is annoying

If you make many tiny commits, sometimes you end up in a hellish loop where you have to fix the same merge conflict 10 times. You can also end up fixing merge conflicts totally unnecessarily (like dealing with a merge conflict in code that a future commit deletes).

There are a few ways to make this better:

  • first do a git rebase -i HEAD^^^^^^^^^^^ to squash all of the tiny commits into 1 big commit and then a git rebase main to rebase onto a different branch. Then you only have to fix the conflicts once.
  • use git rerere to automate repeatedly resolving the same merge conflicts (“rerere” stands for “reuse recorded resolution”, itll record your previous merge conflict resolutions and replay them). Ive never tried this but I think you need to set git config rerere.enabled true and then itll automatically help you.

Also if I find myself resolving merge conflicts more than once in a rebase, Ill usually run git rebase --abort to stop it and then squash my commits into one and try again.

rebasing a lot of commits is hard

Generally when Im doing a rebase onto a different branch, Im rebasing 1-2 commits. Maybe sometimes 5! Usually there are no conflicts and it works fine.

Some people described rebasing hundreds of commits by many different people onto a different branch. That sounds really difficult and I dont envy that task.

undoing a rebase is hard

I heard from several people that when they were new to rebase, they messed up a rebase and permanently lost a week of work that they then had to redo.

The problem here is that undoing a rebase that went wrong is much more complicated than undoing a merge that went wrong (you can undo a bad merge with something like git reset --hard HEAD^). Many newcomers to rebase dont even realize that undoing a rebase is even possible, and I think its pretty easy to understand why.

That said, it is possible to undo a rebase that went wrong. Heres an example of how to undo a rebase using git reflog.

step 1 : Do a bad rebase (for example run git rebase -I HEAD^^^^^ and just delete 3 commits)

step 2 : Run git reflog. You should see something like this:


    ee244c4 (HEAD -> main) HEAD@{0}: rebase (finish): returning to refs/heads/main
    ee244c4 (HEAD -> main) HEAD@{1}: rebase (pick): test
    fdb8d73 HEAD@{2}: rebase (start): checkout HEAD^^^^^^^
    ca7fe25 HEAD@{3}: commit: 16 bits by default
    073bc72 HEAD@{4}: commit: only show tooltips on desktop

step 3 : Find the entry immediately before rebase (start). In my case thats ca7fe25

step 4 : Run git reset --hard ca7fe25

Another solution folks mentioned to “undoing a rebase is hard” that avoids having to use the reflog is to make a “backup branch” with git switch -c backup before rebasing, so you can easily get back to the old commit.

force pushing to shared branches can cause lost work

A few people mentioned the following situation:

  1. Youre collaborating on a branch with someone
  2. You push some changes
  3. They rebase the branch and run git push --force (maybe by accident)
  4. Now when you run git pull, its a mess you get the a fatal: Need to specify how to reconcile divergent branches error
  5. While trying to deal with the fallout you might lose some commits, especially if some of the people are involved arent very comfortable with git

This is an even worse situation than the “undoing a rebase is hard” situation because the missing commits might be split across many different peoples and the only worse thing than having to hunt through the reflog is multiple different people having to hunt through the reflog.

This has never happened to me because the only branch Ive ever collaborated on is main, and main has always been protected from force pushing (in my experience the only way you can get something into main is through a pull request). So Ive never even really been in a situation where this could happen. But I can definitely see how this would cause problems.

The main tools I know to avoid this are:

  • dont rebase on shared branches
  • use --force-with-lease when force pushing, to make sure that nobody else has pushed to the branch since you last push

I was curious about why people would run git push --force on a shared branch. Some reasons people gave were:

  • theyre working on a collaborative feature branch, and the feature branch needs to be rebased onto main. The idea here is that youre just really careful about coordinating the rebase so nothing gets lost.
  • as an open source maintainer, sometimes they need to rebase a contributors branch to fix a merge conflict
  • theyre new to git, read some instructions online that suggested git rebase and git push --force as a solution, and followed them without understanding the consequences
  • theyre used to doing git push --force on a personal branch and ran it on a shared branch by accident

force pushing makes code reviews harder

The situation here is:

  • You make a pull request on GitHub
  • People leave some comments
  • You update the code to address the comments, rebase to clean up your commits, and force push
  • Now when the reviewer comes back, its hard for them to tell what you changed since the last time you saw it all the commits show up as “new”.

One way to avoid this is to push new commits addressing the review comments, and then after the PR is approved do a rebase to reorganize everything.

I think some reviewers are more annoyed by this problem than others, its kind of a personal preference. Also this might be a Github-specific issue, other code review tools might have better tools for managing this.

losing commit metadata

If youre rebasing to squash commits, you can lose important commit metadata like Co-Authored-By. Also if you GPG sign your commits, rebase loses the signatures.

Theres probably other commit metadata that you can lose that Im not thinking of.

I havent run into this one so Im not sure how to avoid it. I think GPG signing commits isnt as popular as it used to be.

rebasing can break intermediate commits

If youre trying to have a very clean commit history where the tests pass on every commit (very admirable!), rebasing can result in some intermediate commits that are broken and dont pass the tests, even if the final commit passes the tests.

Apparently you can avoid this by using git rebase -x to run the test suite at every step of the rebase and make sure that the tests are still passing. Ive never done that though.

accidentally run git commit --amend instead of git rebase --continue

A couple of people mentioned issues with running git commit --amend instead of git rebase --continue when resolving a merge conflict.

The reason this is confusing is that there are two reasons when you might want to edit files during a rebase:

  1. editing a commit (by using edit in git rebase -i), where you need to write git commit --amend when youre done
  2. a merge conflict, where you need to run git rebase --continue when youre done

Its very easy to get these two cases mixed up because they feel very similar. I think what goes wrong here is that you:

  • Start a rebase
  • Run into a merge conflict
  • Resolve the merge conflict, and run git add file.txt
  • Run git commit because thats what youre used to doing after you run git add
  • But you were supposed to run git rebase --continue! Now you have a weird extra commit, and maybe it has the wrong commit message and/or author

splitting commits in an interactive rebase is hard

The whole point of rebase is to clean up your commit history, and combining commits with rebase is pretty easy. But what if you want to split up a commit into 2 smaller commits? Its not as easy, especially if the commit you want to split is a few commits back! I actually dont really know how to do it even though I feel very comfortable with rebase. Id probably just do git reset HEAD^^^ or something and use git add -p to redo all my commits from scratch.

One person shared their workflow for splitting commits with rebase.

complex rebases are hard

If you try to do too many things in a single git rebase -i (reorder commits AND combine commits AND modify a commit), it can get really confusing.

To avoid this, I personally prefer to only do 1 thing per rebase, and if I want to do 2 different things Ill do 2 rebases.

rebasing long lived branches can be annoying

If your branch is long-lived (like for 1 month), having to rebase repeatedly gets painful. It might be easier to just do 1 merge at the end and only resolve the conflicts once.

The dream is to avoid this problem by not having long-lived branches but it doesnt always work out that way in practice.

miscellaneous problems

A few more issues that I think are not that common:

  • Stopping a rebase wrong : If you try to abort a rebase thats going badly with git reset --hard instead of git rebase --abort, things will behave weirdly until you stop it properly
  • Weird interactions with merge commits : A couple of quotes about this: “If you rebase your working copy to keep a clean history for a branch, but the underlying project uses merges, the result can be ugly. If you do rebase -i HEAD~4 and the fourth commit back is a merge, you can see dozens of commits in the interactive editor.“, “Ive learned the hard way to never rebase if Ive merged anything from another branch”

rebase and commit discipline

Ive seen a lot of people arguing about rebase. Ive been thinking about why this is and Ive noticed that people work at a few different levels of “commit discipline”:

  1. Literally anything goes, “wip”, “fix”, “idk”, “add thing”
  2. When you make a pull request (on github/gitlab), squash all of your crappy commits into a single commit with a reasonable message (usually the PR title)
  3. Atomic Beautiful Commits every change is split into the appropriate number of commits, where each one has a nice commit message and where they all tell a story around the change youre making

Often I think different people inside the same company have different levels of commit discipline, and Ive seen people argue about this a lot. Personally Im mostly a Level 2 person. I think Level 3 might be what people mean when they say “clean commit history”.

I think Level 1 and Level 2 are pretty easy to achieve without rebase for level 1, you dont have to do anything, and for level 2, you can either press “squash and merge” in github or run git switch main; git merge --squash mybranch on the command line.

But for Level 3, you either need rebase or some other tool (like GitUp) to help you organize your commits to tell a nice story.

Ive been wondering if when people argue about whether people “should” use rebase or not, theyre really arguing about which minimum level of commit discipline should be required.

I think how this plays out also depends on how big the changes folks are making if folks are usually making pretty small pull requests anyway, squashing them into 1 commit isnt a big deal, but if youre making a 6000-line change you probably want to split it up into multiple commits.

a “squash and merge” workflow

A couple of people mentioned using this workflow that doesnt use rebase:

  • make commits
  • Run git merge main to merge main into the branch periodically (and fix conflicts if necessary)
  • When youre done, use GitHubs “squash and merge” feature (which is the equivalent of running git checkout main; git merge --squash mybranch) to squash all of the changes into 1 commit. This gets rid of all the “ugly” merge commits.

I originally thought this would make the log of commits on my branch too ugly, but apparently git log main..mybranch will just show you the changes on your branch, like this:


    $ git log main..mybranch
    756d4af (HEAD -> mybranch) Merge branch 'main' into mybranch
    20106fd Merge branch 'main' into mybranch
    d7da423 some commit on my branch
    85a5d7d some other commit on my branch

Of course, the goal here isnt to force people who have made beautiful atomic commits to squash their commits its just to provide an easy option for folks to clean up a messy commit history (“add new feature; wip; wip; fix; fix; fix; fix; fix;“) without having to use rebase.

Id be curious to hear about other people who use a workflow like this and if it works well.

there are more problems than I expected

I went into this really feeling like “rebase is fine, what could go wrong?” But many of these problems actually have happened to me in the past, its just that over the years Ive learned how to avoid or fix all of them.

And Ive never really seen anyone share best practices for rebase, other than “never force push to a shared branch”. All of these honestly make me a lot more reluctant to recommend using rebase.

To recap, I think these are my personal rebase rules I follow:

  • stop a rebase if its going badly instead of letting it finish (with git rebase --abort)
  • know how to use git reflog to undo a bad rebase
  • dont rebase a million tiny commits (instead do it in 2 steps: git rebase -i HEAD^^^^ and then git rebase main)
  • dont do more than one thing in a git rebase -i. Keep it simple.
  • never force push to a shared branch
  • never rebase commits that have already been pushed to main

Thanks to Marco Rogers for encouraging me to think about the problems people have with rebase, and to everyone on Mastodon who helped with this.


via: https://jvns.ca/blog/2023/11/06/rebasing-what-can-go-wrong-/

作者:Julia Evans 选题:lujun9972 译者:译者ID 校对:校对者ID

本文由 LCTT 原创编译,Linux中国 荣誉推出