Home > Software engineering >  Does git merge make changes only in local?
Does git merge make changes only in local?

Time:05-10

If you make

git merge branchA branchB

but you intended to do right the opposite way, that is,

git merge branchB branchA

does this merge affect only your local repository, or by contrary, does it affect the remote?

CodePudding user response:

git merge creates a merge commit. A special commit that has two parents instead of one. In all other ways it behaves like a normal commit. To sync it with remotes you need to push it.

CodePudding user response:

TL;DR

The new commit is purely local. But you have your commands wrong.

Long

If you run:

git merge branchA branchB

you're merging three things, in what Git calls an octopus merge. Don't do that!1 The syntax is actually:

git switch branchA
git merge branchB

The reason for this is that git merge, like git commit, generally adds a new commit to the current branch. So you must pick which of the two branches should be "current" before you invoke git merge on the other branch.


1You can do it once you're a Git Guru, or whatever you'd like to call it.


Branch names and commits

There are a number of key concepts you must understand pretty well before any of this will make sense. We should start by defining a repository and commits:

  • A Git repository is, at its heart, a big database of commits. There's a second, usually much smaller, database of names, which we'll see in a moment, as well.

  • Each Git commit stores two things:

    • A commit has a full snapshot of every file, frozen in time, like an archive. The files in the commit are stored in a special, read-only, Git-only, compressed and de-duplicated form. The de-duplication takes care of the fact that most commits mostly re-use files from earlier commits: this way, they take no space.

    • A commit also stores some metadata. The metadata in a commit tell you and Git about that particular commit: who made it (name and email address), for instance, and when (date-and-time stamps). The metadata include the committer's log message, telling you why they made that commit. There's also some information strictly for Git itself.

  • Each commit is numbered, with a big ugly hash ID (or more formally, object ID or OID). This number is unique to this one particular commit: once it's used up for that commit, it can never be used for any other commit. That's why the number is so big. Git uses the number, as expressed in hexadecimal, to find the commit in the big all-objects database. Git needs that number to find the commit. So you might have to memorize every commit hash ID, but that would be horrible.

To avoid having to memorize the hash IDs, Git saves the hash ID of the latest commit in a second database. There is one entry per branch, plus one per tag, and so on: the second database holds names, whether they're branch names, tag names, remote-tracking names, or any other kind of name. Each name remembers one (1) hash ID.

You might—should, really—wonder what good it does to remember only one hash ID, when a branch is (sometimes2) many commits. The answer is in the commit metadata.

Each commit, stored in the objects database, holds some metadata, and the metadata in any given commit includes a list of previous commit hash IDs. Git calls these previous-commit-hash-IDs the parents of the commit. Most commits have exactly one parent.

If we draw a chain of commits, all in a row, with newer commits towards the right, these hash IDs mean that each commit points backwards to exactly one earlier commit, which then points backwards to a still-earlier commit:

... <-a123456 <-b789abc <-def01234   <--latest

Here, the branch name latest holds the hash ID def01234: the latest commit on our branch named latest. Commit def01234 holds the hash ID b789abc, which is its parent commit: that commit comes one commit earlier. Commit b789abc in turn holds the hash ID a123456.

We say that the branch name points to the tip commit of the branch, and the commits themselves point to their parents. As long as this is a nice simple line, it's all pretty sweet:

... <-F <-G <-H   <-- branch

which we can get lazy and draw like this:

...--F--G--H   <-- branch

(The numbering scheme for commits means that no part of any commit, including any of its snapshot or metadata, can ever change, so because commit H points backwards to commit G, it does so forever. Commit G points backwards to F forever, and so on. These can't be changed, so the arrows must go backwards, because hash IDs are unpredictable.3 That allows for the laziness here.)


2The word branch in Git is badly overused, almost to the point of losing any meaning at all. See also What exactly do we mean by "branch"?

3The hash ID of a commit includes, among other things, the exact second at which you make it. Unless you know the time of each future commit you'll make, you don't know what hash ID they will have.


Making a new commit

When you make a new commit, you do this on some branch. Git remembers which branch name you're "on"—so that git status can say on branch main or on branch br1 or whatever—by stuffing the branch's name in a special name, HEAD.4

The way Git makes a new commit is simple enough. Suppose we're on the third commit ever, in our very small repository with these three commits:

A--B--C   <-- main (HEAD)

We do the usual edit-and-add-and-git commit. Git packages up some metadata, using user.name and user.email and the current date and time and any log message you enter, along with the source snapshot. Git stores all of these as a new commit, which gets a new, unique, random-looking hash ID, but we'll just call this commit D. Git makes sure that the parent for new commit D is existing commit C: Git gets C's hash ID by reading HEAD, which says main, then reading branch name main, which says whatever the hash ID is for C.

The result at this point is this:

A--B--C   <-- main (HEAD)
       \
        D

New commit D points backwards to C. But now commit D is the latest commit. So Git uses the name HEAD to find the branch name main, and writes D's hash ID, whatever it is, into that branch name. The result is:

A--B--C
       \
        D   <-- main (HEAD)

and there's no real reason to bother drawing the kink in the graph any more.


4HEAD is traditionally stored in a file, .git/HEAD; you can look at it if you like. It has the magic string ref:, a space, and then the full name of the current branch, and then a newline. Note that added working trees from git worktree add get a different HEAD though, so you should not count on this.


Using more than one branch name

Let's look now at what happens with our tiny four-commit repository when we create a new branch name, develop:

A--B--C--D   <-- develop, main (HEAD)

We now have two names, both of which point to commit D.

If we make a new commit now, staying on main, we'll get this:

           E   <-- main (HEAD)
          /
A--B--C--D   <-- develop

(and this time there's a reason to keep the kink in the graph drawing, so that the name develop can still point to commit D).

Exercise: Which commits are on which branch? Commits A-B-C-D were all on main before, right? Now E is on main. Are A-B-C-D off of main? They're clearly on develop. I'll put the answer in a footnote, so that you can think about this before jumping down to the spoiler.5

Meanwhile, we can now git checkout develop or git switch develop. We'll get this:

           E   <-- main
          /
A--B--C--D   <-- develop (HEAD)

The set of commits in the repository has not changed at all, but we are now using commit D again, instead of commit E. The name main remembers E's hash ID for us, so that we don't have to; the name develop remembers D's, and commit D is the commit we're using now.

If we make another new commit now, we get this:

           E   <-- main
          /
A--B--C--D
          \
           F   <-- develop (HEAD)

Exercise (easy after doing the previous one): which commits are on which branches? Does it make any difference if we draw this as:

           E   <-- main
          /
A--B--C--D--F   <-- develop (HEAD)

or:

A--B--C--D--E   <-- main
          \
           F   <-- develop (HEAD)

? (The answers should be obvious; if they're not, have a run through Think Like (a) Git.)


5The answer is that all the commits are on main, and A-B-C-D are on develop. In Git, a commit is very often "on" many branches at the same time! Any given commit is "on" any branch where, by starting with the branch-tip commit, we can work our way backwards, through the parent linkages, to reach that commit.

A good way to think of this is that commits aren't so much "on" some branch as "contained in" some branches, plural. The git branch --contains sub-command/option can tell you which names "contain" a commit (you give the command the commit's hash ID, and it finds all the branch names). (This same concept works with tag names and git tag --contains, although it's often not as useful here.)


Merging

Suppose we start with:

          I--J   <-- br1 (HEAD)
         /
...--G--H
         \
          K--L   <-- br2

That is, we're "on" branch br1, whose latest ("tip") commit is J; the tip commit of br2 is L.

We now run:

git merge br2

to combine work. The work we'll combine is defined by doing git diffs from the merge base commit—the "best" commit that's on both branches, which in this case is commit H—against the two tip commits. These diffs show what we did on br1, for the first diff, and what they did on br2, for the second diff. Git's job is to combine the changes and then apply the combined changes to the snapshot in H.

This keeps our changes and adds theirs, or, if you prefer to think of it this way, keeps their changes and adds ours. By adding the two sets of changes together, and applying the sum to the base, Git combines the work. Conflicts, if there are any, occur because the two diffs don't "add up" smoothly.

Assuming all goes well, though, Git takes the snapshot it gets by applying these combined-changes to the snapshot in H and makes a new commit M. This new commit is a merge commit. The definition of a merge commit is a commit with two or more parents. (Most merges just have the two.) We can draw this new commit like this:

          I--J
         /    \
...--G--H      M
         \    /
          K--L

I've left the branch names out on purpose, since your question involves (in part) what happens to the branch names. So: What happens to the branch names? Well, remember what happened when we used git commit. Git made the new commit, then stuffed its hash ID—whatever that was—into the current branch name. Merges work the same as regular commits; they just have an extra parent. So commit M's hash ID goes into the name br1, like this:

          I--J
         /    \
...--G--H      M   <-- br1 (HEAD)
         \    /
          K--L   <-- br2

We were "on" br1 before we ran git merge, and we're still "on" br1. We have merely added a commit to it, and the kind of commit we added is a merge commit.

But wait: there's a special case

We started the merge above with two branches that had diverged. Let's see what happens if we start with a less-complicated situation:

...--G--H   <-- main (HEAD)
         \
          I--J   <-- develop

We now run git merge develop. Git needs to find the merge base—the best commit that's on both branches—and does so, and it turns out that this is commit H. But we're using commit H right now. Git would have to diff commit H's snapshot against commit H's snapshot to see what we changed. This would be, well, nothing at all.

Meanwhile Git would then have to diff H-vs-J to see what they changed. Then Git would add their changes (something) to our changes (nothing) and get ... well, exactly whatever their changes are. There can't be any conflicts, so this will always work. Then Git would have to apply their changes to commit H and make a new merge commit:

...--G--H------M   <-- main (HEAD)
         \    /
          I--J   <-- develop

You can make Git do this, but it won't do it on its own. Instead, Git says to itself: Aha, I can cheat! I can use a short-cut! The snapshot I'd make for a new merge commit M would exactly match the snapshot in existing commit J! I think I'll just do this...

...--G--H
         \
          I--J   <-- develop, main (HEAD)

That is, Git "slides the branch name" main forward, and checks out whatever files are in commit J. It does this instead of doing a merge. It then, rather deceptively, calls this a fast-forward merge, but it's not a merge at all, it's just a check-out! It's a check-out operation that dragged the current branch name forward.

In this case, you don't get any new commit. You get a fast-forward instead of a merge.

This is all local

No matter which kind of merge—the real one, or the fast-forward fake—git merge performs, the result is only in your repository. You either have a new merge commit, or you don't, and your current branch name points to the new commit, or points to the now-tip-of-both-branches shared commit.

Nothing has happened in any other Git repository. You've just updated your own repository's names database, so that the current branch name selects the new or other commit, and maybe added a new commit (and supporting objects) to the objects database.

Octopus merges

To show what an octopus merge looks like, let's draw a main-line and three features:

        _-I   <-- feature/two
       /
      /   J   <-- feature/three
     /   /
...--G--H   <-- main (HEAD)
      \
       K--L   <-- feature/one

We can now run:

git merge feature/one feature/two feature/three

Git will do its best to combine all the work—the details here are very messy and complicated and Git sidesteps the merge conflict issue entirely by requiring that there be no merge conflicts at all—and if all goes well, Git will make a new merge commit with, in this case, four parents:

        _-I_  <-- feature/two
       /    \
      /   J | <-- feature/three
     /   / \|
...--G--H---M   <-- main (HEAD)
      \    /
       K--L   <-- feature/one

Commit M points back to commit H, which is the one we were using when we ran git merge. But then it also points back to commits L (from feature/one), I (from feature/two), and J (from feature/three).

It's now possible to delete all three of the feature/* names, as we can find those commits by working backwards (with an appropriate sideways jink) through commit M. This doesn't do anything you couldn't have done with multiple git merge commands: you could merge feature/one, then merge feature/two, then merge feature/three. In fact, it can't do something those can do, namely resolve merge conflicts. The fact that it's weaker is the main reason to use it, but the fact that it's confusing is the main reason not to use it.

CodePudding user response:

git merge branchA branchB

is purely local, plus if it's the wrong command and you've just given it, you can trivially undo it immediately by saying:

git reset --hard @~1

Note that your command probably does not do what you think it does: it does not mean "merge branchA into branchB". So having corrected your mistake, you might need to think a little harder before giving the "right" command.

  •  Tags:  
  • git
  • Related