Home > Software engineering >  How can I merge a branch in a local repository to a branch that is in another local repository
How can I merge a branch in a local repository to a branch that is in another local repository

Time:11-04

Suppose I have 2 local repositories.

Local Repository A //Cloned from production remote repository.

Local Repository B //Cloned from development remote repository.

From the Local Repository A, I created my own feature branch called

FeatureA //In Local Repository A

After finishing the FeatureA from the Local Repository A, I want to merge this FeatureA branch into a branch called

Developer //In Local Repository B 

Notice, how two branches, FeatureA and Developer, are in a different local repositories.

How can I merge the FeatureA branch into Developer branch ?

CodePudding user response:

Git does not actually merge branches. In fact, like most things in Git, branches don't matter at all here: only the commits matter. Git merges commits.

What this means for you is that you can only get a proper result if all the commits are related. (They probably are, but we can't see that. You can find that out.)

Long

There are a few important things to know about commits (this is not a complete list, but important here for merging):

  • They're found via hash IDs.
  • The hash IDs are large, ugly, and random-looking (not actually random, but quite unpredictable, and unusable by humans).
  • Commits refer to other, earlier commits by hash ID. That's fine for the commits themselves, which are after all read by a computer program (or series of computer programs: we call these programs git), but not much good for the hapless humans supposedly in charge of these programs and computers.

Because of these particular bullet points, Git makes a big concession to humans: it will let us use branch names to find commits. Not only that, it has a special feature just for humans. When we:

  1. first direct Git to be "on" some particular branch, then
  2. direct Git to make a new commit,

Git will update the branch name that we're "on" so that this name now refers to the new commit, instead of referring to whichever specific commit it meant just a moment ago.

For ordinary, single-parent commits, we can draw this situation like this. We start by replacing the real hash IDs that the real commits have with fake, single-letter pseudo-IDs that we pick to work with our feeble human brains. Then we draw each commit with an arrow coming out of it, pointing backwards to one earlier commit. This puts the latest commit on the right:

... <-F <-G <-H

So here H stands in for the hash ID of the latest commit. Somewhere inside the internal representation of commit H, Git has saved away the true hash ID of earlier commit G. We draw this as an arrow coming out of our representation of H, pointing to our representation of G.

Of course, G itself is also a commit, so it also has a stored previous-commit hash ID: G points to F. F likewise points to some earlier commit, and so on. This repeats forever, or rather, until we get back to the very first commit ever: commit A in this formulation. (Our repository thus has a paltry eight commits.)

For StackOverflow purposes, due to laziness and/or font issues, I tend to stop drawing the commit-to-commit arrows as arrows and do this instead:

A--B--...--G--H

but in fact each connection from commit to commit goes only one way: from the later commit, e.g., H, to the earlier one. This is because commits, once made, are completely, totally, 100% read-only. Not a single bit in a commit can ever change.1

When we add branch names to these drawings, their workings become much clearer. Let's say we have two names, main and develop, and *both names point to commit H, like this:

...--G--H   <-- main, develop

This means all commits up through and including H are on both branches. We must now pick one branch to be "on", using git checkout or git switch:3

git switch main

To remember which branch we're using, we add the special name HEAD, written in all uppercase like this, attached to just one of these branch names:

...--G--H   <-- main (HEAD), develop

This indicates that we are using commit H via the name main.

The checkout or switch command works by (very roughly):

  • removing from the work area—which Git calls the working tree—all the files that came out of some other commit, if / as needed;
  • filling in this work area with all the files from the commit we've just switched to.

We'll see this in action in a moment, but for now, let's switch to develop, or even create a new branch name feature.


1This read-only property is required to make the hashing scheme work. The hash ID of a commit is simply a cryptographic checksum of all the bits stored in that commit. If you take a commit out of a Git database, turning it into ordinary data, then modify that data in some way and put it back into the Git database, what you get is a new and different commit with a new and different hash ID. The old commit remains, unchanged, under the old ID.

Git verifies, at object-extraction time, that all the bits that come out of the database still checksum to the original value. If they don't, Git declares the database corrupt, and ceases to function. Since file contents are also stored using this same hashing trick, that's how we can be sure that none of our files are ever damaged. Once they're in the repository, they're in there forever2 and can never be changed.

2Technically, it's possible to strip commits out of a repository database, but it's tricky and we won't cover it here.

3There's no difference here between these uses of checkout and switch. Certain historical mistakes with git checkout were eventually cleaned up by splitting that one command, checkout, into two separate commands, switch and restore, and it makes sense to learn the new ones as long as you are not forced to use an old version of Git that lacks the new ones. (I have been using Git for more than 15 years at this point though so I have old and sometimes not-so-good habits here. If I use git checkout, it's by habit, or because someone gave me a Git 1.7 version to update, perhaps.)


Making new commits on a branch

If we now switch to existing branch develop, we get:

...--G--H   <-- main, develop (HEAD)

To do this, Git would need to remove all the files that came out of commit H, and instead, put in all the files from commit H. This kind of remove-and-replace-with-sameness is obviously stupid, so Git skips this step for this particular case.4 Git doesn't remove or replace any files at all this time. So if we start making changes but forget to switch to a different branch (or create a new branch; see below) it's generally safe to do that as soon as you notice your error.

Anyway, now that we're on develop, let's make a new commit in the usual way. I will skip over a lot of important detail—in particular, I won't mention Git's index aka staging area—and will just assume that you know everything there is to know here;5 and however, we do it, we now have Git make a new commit, which we will call I.:

          I
         /
...--G--H

New commit I points backwards to existing commit H. But now the special magic trick happens: Git writes the new commit's hash ID into the current branch name, i.e., the one that has HEAD attached. So to complete our drawing—and see why I put I on a line by itself here—we draw this:

          I   <-- develop (HEAD)
         /
...--G--H   <-- main

Note how the name develop now points to the new commit. All the other branch names are untouched: only the name develop moved.

If we make a second new commit J, we get:

          I--J   <-- develop (HEAD)
         /
...--G--H   <-- main

Commits up through H are on both branches, while new commits I-J are only on develop.

If we now run git checkout main or git switch main, we get this:

          I--J   <-- develop
         /
...--G--H   <-- main (HEAD)

This time, Git really does have to remove some files—those specific to commit J—and replace them with the right files for commit H. So Git does that, and if we now examine our files, we'll see that we have the files from commit H.6

Now that we're back on commit H via the name main, let's make a new branch name. We have to pick some commit for this new name to point-to, and the usual choice is "the commit we're on now", i.e., commit H:

git switch -c feature    # or git checkout -b feature

This leaves us in this state:

          I--J   <-- develop
         /
...--G--H   <-- feature (HEAD), main

If we now make two more commits, we get:

          I--J   <-- develop
         /
...--G--H   <-- main
         \
          K--L   <-- feature (HEAD)

We now have a state in which merging makes sense.


4This skipping is achieved in a smarter way than described here, but for the "switch branches without switching commits", the effect is that the change is always allowed. In more complicated cases, you get odd effects; see Checkout another branch when there are uncommitted changes on the current branch for the gory details.

5There is a lot to know! For more details, see other answers or Git tutorials.

6This is where the complications mentioned in footnote 4 can come in. Files that we forgot to commit, or deliberately didn't commit, can be carried along in the working tree and/or in Git's index aka staging-area. But if we made sure to commit everything in I and J, we'll be in a "clean" state, as reported by git status, before and after the checkout—unless things get really complicated, but let's not go there.


Merging is about combining work

Let's draw this again but change some names, switch branches around, and drop the name main entirely (it's in the way):

          I--J   <-- br1 (HEAD)
         /
...--G--H
         \
          K--L   <-- br2

We're now using commit J via name br1. We can run:

git merge <hash-of-L>

or:

git merge br2

to have Git locate commit L and do the work of a merge. Or, if we aren't ready to merge commit L yet, but want to merge in commit K, we can run:

git merge <hash-of-K>

That is, we'd run git log br2 and see commit L, then see commit K. It has some big ugly hash ID, b789abc... or whatever, and we'd grab that with the mouse, cut-and-paste style, and produce a command like:

git merge b789abc

(abbreviated hash IDs work too, so you can retype the first 4 or 7 or 15 characters and stop, but it's way too easy to make a mistake here: I always use cut-and-paste for this).

We generally don't bother to merge with some number of commits back like this, but in some complicated cases—e.g., if we have:

          o--o--...--o   <-- br1 (HEAD)
         /
...--o--*
         \
          o--...--(thousands of commits)--...--o   <-- br2

we might want to break the merge up into smaller chunks, picking some commit somewhere along the very long line of br2 to merge in first:

          o--...--o---M   <-- br1 (HEAD)
         /           /
...--o--*           /
         \         /
          o--...--o--(hundreds of commits)--...--o   <-- br2

Having merged in just 500 commits, we took an original 1400 commits down to 900 left to merge; we can do another 500, leaving only 400 left to merge, etc.

In any case, regardless of how many commits we are merging, the merge operation works the same way. Given:

          I--J   <-- br1 (HEAD)
         /
...--G--H
         \
          K--L   <-- br2

and git merge br2, Git:

  • finds the current commit J (that's easy: Git uses HEAD);
  • finds the other commit L (that's easy: Git uses br2, which we gave it); and
  • finds the merge base commit, commit H: that's harder.

Git finds that merge base commit through an algorithm, but we can just describe at as the best common ancestor, and in this case it's simply commit H.7

Git now uses the snapshots in each commit—we haven't described this properly, but each commit holds a full snapshot of every file—to figure out "what we changed" on "our branch" br2, by diffing commit H vs commit J, and to figure out "what they changed" on "their branch" by diffing commit H vs commit L. The three commits to diff here are:

  • the merge base, on the left of both git diff commands;
  • our current or HEAD commit, on the right of the "ours" diff;
  • the commit we chose to merge, on the right of the "theirs" diff.

The output from the two diffs determines the set of changes to merge.

The merge algorithm now combines these changes, applies the combined changes to the snapshot in the merge base commit—commit H here—and thus keeps our changes while adding their changes, which is what we want.

Having successfully combined these two sets of changes and applied them to the merge base, Git now makes a new commit from the result.8 That new commit is a merge commit, which is special in exactly one way: instead of pointing back to a single parent, it points back to two parents, like this:

          I--J
         /    \
...--G--H      M   <-- br1 (HEAD)
         \    /
          K--L   <-- br2

The first parent of new merge commit M is the commit we were using when we ran git merge, i.e., commit J. The other parent is the commit we named on the command line, commit L in this case. Merge commit M has, as its snapshot of all files, the result of the combining-and-applying-to-H's-snapshot.


7As the Wikipedia article notes, there is not necessarily a single unique LCA node in a DAG. For these cases, the merge algorithm gets trickier; we won't cover those here.

8If Git fails to combine the changes, it deliberately stops in the middle of the merge, leaving us a mess to clean up. We won't cover that case here either.


What this means for you

For git merge to do its job, the commits you give it must:

  • all be in one repository; and
  • be related, in terms of having some best shared commit: H in our example above.

When you have a single repository, the first condition is trivially satisfied—all the commits are in the (single) repository—and the second is usually the case because we normally make a new branch by growing it from some starting point that's on some existing branch. That shared starting point commit is a common starting point and therefore a shared commit. If there have been merges since then, there may be some better shared commit, but otherwise this is the shared commit:

 ...--*--*--*--o--o   <-- br1 (HEAD)
             \
              o----o   <-- br2

The starred commits * are on both branches, so the rightmost one works as the merge base. Or:

 ...--*--*--*--o--o--M1--o--o   <-- br1 (HEAD)
             \      /
              *----*----o----o   <-- br2

Again, all the starred commits are on both branches. The extra parent of merge commit M1 joins br2 back into br1, so once again, the rightmost starred commit works as the merge base. Once we make merge M2 we have:

 ...--*--*--*--o--o--M1--o--o--M2   <-- br1 (HEAD)
             \      /         /
              *----*----*----*   <-- br2

Note how merging "adds" all the other branch's commits to br1.

When you have two separate repositories, though, are the commits related? Now we get into one of the complications with any distributed version control system, like Git.

When you clone a Git repository, you literally copy the commits. A git clone of some repository R makes some clone C, but C has all the commits from R.9 In Git, cloning copies the commits, but doesn't copy the branches,10 which in some sense is weird—Mercurial copies the branches too, for instance—but the important thing for your case is that the commits get copied.

Now, after the commits are copied into C, someone can make more commits in R, and/or someone else can make more commits in C. But if they both follow the same sort of standard procedure—of starting with the commits they have, and merely adding on—these commits will all "join up in the past", in exactly the same way we get with a single repository.

All you have to do, in this case, is:

  • clone either of R or C into a third Git repository, then
  • add to that repository all the commits that are in the other of these two repositories, that aren't already in your third clone.

That second step—"add commits that we don't have"—might seem like a big thing. In some ways, it is ... but we already have to do that because of cloning. That is, suppose there's some "source of truth" repository Rcentral that everyone clones. You make your clone Cyou. Alice makes clone Calice, Bob makes clone Cbob, and so on.

At some point, somebody makes new commits, and eventually—somehow—gets their commits into Rcentral. And now everybody with a clone has to get those new commits into their clones, if they want to see them and use them. So we have git fetch.

We run git fetch name. Git calls the name we use here a remote. When you clone Rcentral, your Git, in your clone C, adds a standard remote name, origin. Your Git stores the URL of Rcentral under this standard name, and from now on, you can just run:

git fetch origin

to have your Git call up the Rcentral Git. They will list out their commits (by hash ID) and their branch names (by name), and your Git will figure out if any of those commits are new to you, and if so, obtain them. Your Git will then set up remote-tracking names by taking their branch names, main and feature and whatever, and sticking origin/ in front of them: the origin part comes from the remote. These names "track"11 the branch names over on origin, so they are the remote-tracking names for origin.

You can add more Git repositories as additional remotes. That is, using:

git remote add repo-xyz <url>

you add a second remote, using the name repo-xyz, to store the given url. Now you can run:

git fetch repo-xyz

Your Git will call up the Git at the URL you just saved, ask them about their branch names and commit hash IDs, and bring over any commits they have, that you don't. Your Git will then create, in your clone C, remote-tracking names of the form repo-xyz/*. You'll have a repo-xyz/main if they have a main. If they have a develop, you'll have a repo-xyz/develop.

Each of these remote-tracking names will remember exactly one commit hash ID, just as each branch name in C, Rcentral, or this added remote remembers exactly one commit hash ID. Because git fetch reads their current state, your remote-tracking names will now remember their branch state as of the time you ran git fetch.

So, having run git fetch origin and git fetch repo-xyz, you now have:

  • all the commits you had, plus
  • all the commits origin had that you didn't, plus
  • all the commits repo-xyz had that you didn't, plus
  • remote-tracking names origin/* and repo-xyz/* to remember branch names and commit hash IDs from origin and repo-xyz.

Remote-tracking names, which locate specific commits, work just as well as branch names for locating specific commits. So you can pass a remote-tracking name to git merge. The only thing they don't work for here is that you cannot get "on" a remote-tracking name. That's because it is not a branch name, and Git will only let you attach HEAD to a branch name. If you want a branch name to point to some commit that some existing remote-tracking name locates, you can use git checkout or git switch to do this:

git switch -c update-some-abc-branch repo-xyz/abc-branch

Because you have two remotes (origin and repo-xyz), you may run into the annoyance that you can't make one name, like main, that you work with when working with both origin/main and repo-xyz/main. You may need to use some funky mismatched branch names, like I just did above. That works fine: there's no need to use the same names in each repository.12

This gives you all the information you need to:

  • create branch names in your repository C to locate specific commits while getting "on" those branches;
  • run git merge with the commits identified by your branch names and/or your remote-tracking names.

As long as you remember that what Git really cares about are the commits and their hash IDs, and that your branch names are just there to let you find your chosen commits, you'll be fine.


9It's possible to make clones that omit some commits, but we'll consider the usual case where we don't do that.

10Git uses, instead, remote-tracking names. Copying branch names would be possible but would lead to confusion. The remote-tracking name technique leads, instead, to ... confusion. I'm not sure there's that much of an improvement here.

  • Related