Home > database >  How to rebase a feature branch with multiple merge commits due to git pull to be able to squash them
How to rebase a feature branch with multiple merge commits due to git pull to be able to squash them

Time:08-18

In several instances, I do a git pull on a feature branch, and I end up having multiple annoying "merge commits", I could understand why they happen, but I wanted to get rid of them.

I tried to use git rebase -i --rebase-merges HEAD~4 but could not figure out how to squash the merge commits.

I did research further, and after a lot of digging, I was able to do the following to remove unwanted merge commits using rebase and then squash them if needed:

git checkout feature
git pull  # will create merge commits
git checkout featur_backup  # to create a backup
git switch --orphan emty_commit
git commit -m "First empty commit for the feature branch" --allow-empty
git switch feature
git rebase empty_commit
git rebase -i --root  # this allows you to squash commits
git branch -D empty_commit

Is there a better way?

CodePudding user response:

To rebase by default on git pull

git config --global pull.rebase true

or to rebase on specific git pull

git pull --rebase

Using git fetch

Fetch the remote status

git fetch

Then if you want to rebase

git rebase origin/main

or if you want to merge

git merge origin/main

If you did a merge you didn't want to do Cancel the merge commit and the modification it brings with:

git reset --hard HEAD~1

then rebase

git rebase origin/main

CodePudding user response:

Ôrel's answer has a recipe. You might not want origin/master directly—what you do want is up to you and you may wish to think about this and experiment, once you've read through and are working out the bits of this explanation—but rebase does work.

I still don't understand how the git rebase you mentioned will work? My problem is that when I do a git pull on a feature branch, it'll end up having merge commit so how the git rebase origin/master will work?

The key here is to understand a number of things simultaneously. (This is often the case with Git.) You need to know:

  • how individual commits work;
  • how branch names work;
  • how git fetch works;
  • how git merge works;
  • that git pull means run git fetch, then run git merge or some other second command (you have been using git merge); and
  • how git rebase works.

This is a lot of stuff! We cannot hope to cover it all, not even in one of my famously1 long answers, so we'll race through a few key items.


1Or other adverb of your choice.


Commits, branches, and branch names

A commit:

  • is numbered: it has a unique hash ID. That number means that commit, not only in your repository, but in every repository, even all the Git repositories that don't have your commit. (This is the main deep magic that powers Git.)

  • is read-only: no part of any commit can ever be changed. This is required by the magic numbering system.

  • contains two things: a full snapshot of all files (stored indirectly, in a special Git-ized form where they're compressed and de-duplicated, so it is a good thing when a new commit re-uses almost all the files from a previous commit: that means they take no space), plus metadata: information about the commit.

The metadata contains stuff like your name and email address and the date-and-time at which you made the commit. For Git's own purposes, Git stores, in the metadata in any one commit, a list of previous commit hash IDs. Most commits—"ordinary" commits, we tend to call them—have exactly one hash ID here. This forms the ordinary commits into a simple chain, except that the arrows connecting commits are backwards instead of forwards. Humans like to think of the arrows as going forwards, but that can't work in Git, because commits are strictly read-only.

What this means is that given a string of commits in a row, each with its own hash ID, we can draw that string of commits, newer commits towards the right, like this:

... <-F <-G <-H

Here H stands for the hash ID of the last commit in the chain. Commit H contains a full snapshot of every file, plus some metadata. In H's metadata, Git has stored the hash ID of some earlier commit G, which we (and Git) call the parent of commit H. So commit H points to earlier commit G.

Of course, G is also a commit, so it has a snapshot and metadata, and that metadata holds the hash ID of some still-earlier parent F. But F is a commit too, so F points backwards as well. This goes on forever, or rather, until we get to the very first commit ever, which—being the first—has no parent (a weird sort of "virgin birth" as it were; Git calls it a root commit ).

The commits are all stored in a big database (of "all Git objects", including commit objects—the other kinds of objects are mostly supporting things for commits, namely tree and blob objects, plus annotated tag objects). This database is a simple key-value store where the keys are the hash IDs. So Git needs a hash ID to find a commit.

All of this has two important implications:

  • Commits don't store diffs. We get diffs—we see a commit as a change—by having Git compare adjacent commits. We pick some parent/child pair, such as G and H, and have Git compare the two snapshots. The de-duplication trick that Git uses makes it easy for Git to throw out all exactly-the-same files right away, so Git only has to figure out what changed in two files that are actually different. This usually does not take too long, so git show or git log -p can show a "patch" even though Git has stored only snapshots.

  • Git can find every commit on its own except the last one, using the parents stored in each commit. We have to tell Git the hash ID of commit H. From there, Git works backwards all on its own. But we have to provide a raw hash ID here, and that's horrible for humans: the hash IDs look random and there's no way to figure one out. You'd have to memorize them, or write them down, or something.

To handle that last problem, Git provides, as a separate key-value database, one that's keyed by names: branch names, tag names, and many other kinds of names. In this database, the value associated with any given name is one hash ID. You get just one hash ID—not two or three or many, just one—but that's all we need, because we only need to store the hash ID of the latest commit, e.g., commit H.

Since we say that commit H "points to" its parent G, we likewise say that a branch name points to the last commit in the branch. Git's term for this is that H is the tip commit, and we can add it to our drawing like this:

...--G--H   <-- main     # or master or whatever

When we're "on" some branch, we attach a special name—HEAD—to that branch name. (Git literally just stores the branch's name in a file in .git named HEAD, at least for the main working tree, though you're not supposed to depend on this in case a future version of Git comes up with a better / fancier way to do it.) This means that if we have multiple branch names, all pointing to commit H—which is a perfectly normal thing to do in Git—then as we add commits, only the HEAD branch-name gets updated. We might start with, e.g.:

...--G--H   <-- feature (HEAD), main, zorg

and then make one new commit—it gets some new, unique hash ID, but we'll just call it "commit I"—and that git commit command makes Git store the new hash ID in the current branch name, so that we get:

...--G--H   <-- main, zorg
         \
          I   <-- feature (HEAD)

The special name HEAD remains attached to the current branch name. The name feature now selects I as its tip commit; I points backwards to H, because H was the tip of feature when we made I; and H and G and so on are all unchanged (they must be, because they are all read-only). The next commit updates the current branch name yet again:

...--G--H   <-- main, zorg
         \
          I--J   <-- feature (HEAD)

New commit J points back to I, which points back to H, and so on. I cannot change once we've made it: no part of any commit can ever change. Note that, while some people will refer to commits I-J as "what's on feature", in fact, all the commits are on feature: it's just that commits up through and including H are also on other branches, while I-J are only on feature at the moment.

git fetch, or, commits are universal but branch names are not

When we clone a repository, we copy all of its commits2 and none of its branch names. Instead of copying its branch names, we take each of their (the other repository's) branch names and change it into a remote-tracking name: their main becomes our origin/main, and their feature becomes our origin/feature, for instance. Then our Git will create one branch name in our new repository, that contains all of their commits and these modified names. The new branch name will match one of their branch names and will select the same tip commit as their name, so our picture might look like this:

...--G--H   <-- main (HEAD), origin/main
         \
          I--J   <-- origin/feature

depending on what they had in their Git repository when we ran git clone to make our Git repository.

We can now create our own feature name as well, and switch to it:

...--G--H   <-- main, origin/main
         \
          I--J   <-- feature (HEAD), origin/feature

and if and when we make new commits, they add on to our feature. Our memory of their feature still points to commit J though:

...--G--H   <-- main, origin/main
         \
          I--J   <-- origin/feature
              \
               K--L   <-- feature (HEAD)

These look exactly like branches, because they are exactly like branches, depending on what we mean by branch (see also What exactly do we mean by "branch"?). To the extent that "branch" means "set of commits found by starting from some name and working backwards", a remote-tracking name works just fine as a branch. (But it's also not a branch because you cannot git switch to it. So is it a branch? That depends on what you mean by branch. This problem with the word branch is why you must be careful when saying "branch"—you may know what you mean, but will someone else? Are you even sure you know what you mean?

  • Related