Difference between merging develop into master and master into develop? I tried to merge develop into master and it gave me 34 files and 1,312 additions and 324 removals and I tried to merge master into develop and it gave me 251 files and 87,123 additions and 1,321 removals. My guess is that it takes the time it spun off from master and then take all the changes and compared it with the files changed on that branch from the files in the branch we want to merge into? Am I correct?
It means that for both branches to be the same, we need to merge master into develop and then merge develop into master every time when both branches were being changed on a daily basis for 1 month by a dozen of developers?
What does git-diff give us? Does it give us all the differences from both branches or what we would get if we tried to merge branch 1 into branch 2?
CodePudding user response:
To understand the answer to that question, let's start with some facts about Git:
Git stores commits, rather than files or branches. The commits are the history in a repository: each one has a unique number (a hash ID or object ID aka OID), and each one holds two things: a full snapshot of every file, plus some metadata. The metadata in any one commit includes a list of previous commit hash IDs, which lets Git relate the later commits back to earlier ones. Most (all "ordinary") commits have just one previous hash ID in them, which links the commit to its parent. This allows Git to work backwards, from the latest commit to the earliest.
Branch names like
master
ormain
,develop
,br1
,feature/tall
, etc., just contain one commit hash ID. By definition, whatever hash ID is stored in the name is the latest commit on that branch.
From these two facts alone we can start to visualize commits:
... <-F <-G <-H <--br1
Here we have a branch name like br1
that selects, or points to, the last commit that is on that branch. That commit has some hash ID that we'll just call H
so that we don't have to generate some random looking thing and try to remember it.
Commit H
holds a snapshot of all files, but also holds metadata to say who made commit H
, why (their log message), and so on. The metadata for commit H
stores the hash ID of one previous commit, which we'll just call G
. So commit H
points backwards to earlier commit G
.
Commit G
, being a commit, stores a full snapshot, and metadata. The metadata in G
make G
point backwards to earlier commit F
, which is also a commit, so it points backwards to another earlier commit.
When we look at a commit with git diff
or git show
, we're actually giving Git two commit hash IDs. We start with the commit itself, such as H
, maybe using the branch name br1
:
git show br1
Git uses that to locate H
, then uses H
to locate its parent commit G
. Git then extracts, to a temporary area in memory, both snapshots, and compares them. We are, after all, interested only in the files that changed. (This is assisted by the fact that commit snapshots de-duplicate file contents, so if H
and G
mostly share most of the files, Git can tell that instantly and not even bother extracting those files.)
For files that did change, Git figures out a "change recipe"—a diff—and shows that as "what happened". This works great for ordinary commits like commit H
. But it breaks down with merges.
Merges
To understand git merge
, we start with the goal of a merge: to combine work. Let's draw a picture of some commits where there's been a fork of some sort, so that we have two different chains of work, like this:
I--J <-- br1
/
...--G--H
\
K--L <-- br2
That is, there are at this point two "latest" commits. Commit J
is latest; and its parent is I
, whose parent is H
, whose parent is G
and so on. But commit L
is also latest; its parent is K
, whose parent is H
, whose parent is G
and so on. Here J
is the latest br1
commit and L
is the latest br2
commit.
As always, every commit holds a full snapshot of all files. To combine work on both branches, we need to find changes. How do we do that? Well, we already know an answer: we can use git diff
or git show
to pick two commits and compare them.
The thing everyone tries first—which doesn't work—is to pick commits J
and L
and compare them. But that shows what's different between these two "latest"s, which usually isn't what we want. For instance, maybe Alice made br1
by fixing a typo in the README and adding feature 1, and Bob made br2
by fixing a different typo in some other documentation and adding feature 2. If we diff J
vs L
, the recipe Git will give us is: remove the fix and feature from Alice, and add the fix and feature from Bob. What we want is add the fix and feature from Alice, and also add the fix and feature from Bob.
To cut to the chase, the trick here is to start from commit H
. That's the best shared commit: a commit that is literally on both branches. By starting at J
and working backwards and also starting at L
and working backwards, we find that H
is the best shared commit. So we diff H
vs J
to see what Alice did, and then—separately!—diff H
vs L
to see what Bob did. That gets us the sets of changes to combine.
Git will then do its best to combine these change-recipes, using some very simple rules:
- If nobody touched a file, use any version of it: all three are the same.
- If one branch touches some file and the other branch doesn't touch it at all, use the changed file from the one branch that changed it.
- If both touched some file, try to combine the changes, line-by-line. If they're on different, non-overlapping lines, they can be combined. Git adds the rule that the lines must not abut either. If they do overlap, they must be 100% identical. Otherwise, Git will declare a merge conflict and make you, the programmer, clean up the mess.
These rules work surprisingly often, so that git merge
can get both sets of changes out of the two branches, apply the combined changes to the snapshot in H
, and use that for a new snapshot. Depending on how you like to view this, the result is that we keep the br1
changes while adding the br2
changes, or we keep the br2
changes while adding the br1
changes. Note that, like ordinary mathematical addition, the result is the same regardless of the order of the addends (that is, we don't need to define a separate "augend" vs "addend" because the operation is commutative).1
Having come up with a snapshot for the new commit, Git then makes the new commit. You supply a log message in which you explain why you did the merge—or you use the crappy default message, merge branch br2
for instance, which is what most people really do—and Git makes a new commit that is just like any commit: it has a snapshot and metadata. What makes the new commit special is that instead of just one parent, it has two:
I--J
/ \
...--G--H M
\ /
K--L
Note that I have filed the branch names off this picture. Whenever you make any new commit—whether with git commit
, or git merge
, or git cherry-pick
or git revert
or whatever—Git will update the current branch name automatically for you, so that M
is now the latest commit. But which branch name gets updated? Well, that depends on which branch name git status
said you were on
:
$ git status
On branch `br1`
...
$ git merge br2
results in:
I--J
/ \
...--G--H M <-- br1 (HEAD)
\ /
K--L <-- br2
That is, you were "on" br1
—that's what the HEAD
attached to br1
means here—and you still are, so the new commit is also the latest for br1
. But if git status
said that you were On branch br2
and you had run git merge br1
, it would be the name br2
that is updated.
1Note that adding options to git merge
, such as -s ours
or -X theirs
for instance, changes this: the operation is no longer commutative.
This answers the first part of your question
[What is the d]ifference between merging develop into master and master into develop?
One will advance the name master
, and the other will advance the name develop
. That is, you'll have either:
o--...--o
/ \
...--o M <-- master (HEAD)
\ /
o--...--o <-- develop
or:
o--...--o <-- master (HEAD)
/ \
...--o M <-- develop (HEAD)
\ /
o--...--o
when you're done. The snapshot in M
will be the same either way.