I had a feature
branch and that I needed to merge into master
so I created a pull request through github and merge it.
I was expecting that all changes in feature
brach that were different from master
will overwrite that particular file in master
but after the merge I noticed the opposite that feature branch got updated with the code (the one which was different in master) and master file remained same.
Please correct my understanding if I am wrong. When we merge brach A into branch B, branch A should not get changed. isn't it ?
Thanks,
CodePudding user response:
I’m guessing you ran git merge master
. This merged master into the current branch.
To merge the feature branch into master, you need to switch to the master branch, and then run git merge feature
. That being said, you can create a PR without merging feature into master.
CodePudding user response:
I was expecting that all changes in
feature
branch that were different frommaster
will overwrite that particular file inmaster
...
This expectation is wrong.
Merge does not mean make same. Merge means combine work.
This is a bit complicated by one thing: Git doesn't store the work you did. Git stores, instead, full snapshots of every file. That is:
Each commit is numbered, with a big, ugly, unique hash ID. This number is how Git actually retrieves the commit from Git's main database. (This database contains commits and other supporting objects, all of which have these big ugly hash IDs.)
Each commit stores two things:
A commit has a full snapshot of every file. The files inside each commit are stored in a special, read-only, Git-only, compressed and de-duplicated format. If you have a repository with lots of commits, and you check out one commit that has some really big files and some really small ones and you change just one of the small ones and add-and-commit, Git has to store all the files all over again, but it can re-use all but the one changed file. So only the changed file actually takes any space. But each commit still has every file!
A commit also stores some metadata, or information about the commit itself. This includes the stuff you see in
git log
output: who made the commit, when, and why (their log message), for instance. For Git's own purposes, every commit stores a list of previous commit hash IDs in this metadata, too.
This list-of-previous-commit-hash-IDs usually has exactly one entry. We call that one entry the commit's parent. It means that each child commit "points to" its parent:
... <-F <-G <-H
Here H
stands for the hash ID of the latest commit. This commit is a commit, so it has a snapshot—all of the files as of the form they had when you made H
—and metadata. The metadata says that you made it last August, or whatever. It also says that the parent of commit H
is earlier commit G
(whatever the real, random-looking, hash ID really is: git log
will show it).
Commit G
, though, is a commit, so it has a full snapshot of every file, and some metadata. Git can extract both commits (to a temporary area in memory) and compare the files. When the files exactly match—and are therefore de-duplicated—that's pretty boring, and Git generally will say nothing at all. When the files don't match, Git can come up with a recipe—a set of changes to apply to the older file—that will make it match the newer file. That's what you see in git log -p
.
Having shown commit H
as a change from earlier commit G
, git log
now steps back one hop. Now its job is to show commit G
. For this, it needs commit F
, G
's parent, to get the snapshot to figure out what changed, if you want to see that, but that's easy because G
holds F
's hash ID.
So Git can show you the author and message and so forth for G
, and then a diff from F
to G
to see what changed, and now Git can move back one hop yet again and show you F
. That's a commit, so it has a snapshot and a parent, and ... well, the idea should be obvious now.
That's all well and good for simple cases, but we have one problem: we have to somehow give the hash ID for commit H
to Git. We could scribble it onto a whiteboard, or jot it down on paper. But why should we do that, when we have a computer? Let's have Git store this hash ID for us, in—say—a branch name. And that's just what we do.
Now, if we have a simple chain of commits like this:
...--F--G--H <-- master (HEAD)
and we create a new branch name feature
, we get:
...--F--G--H <-- feature, master (HEAD)
We're currently using commit H
, via the name master
. If we switch to feature
, with git checkout feature
or git switch feature
, we switch to using commit H
via the name feature
:
...--F--G--H <-- feature (HEAD), master
Nothing else has to change, so Git takes a shortcut and changes nothing else: we're still using commit H
, just through another name.
If we now make new commits, these new commits get:
- a new snapshot, from what we tell Git to use;
- new metadata: your name as author/committer and "now" as the date and time, with the parent commit being the current commit.
By writing out the new commit, Git acquires a new hash ID—there's some fairly deep magic here, using cryptographic hashes, which also explains why nothing about any commit can ever change once you've made it—but we'll just call our new commit I
. New commit I
will point back to existing commit H
:
I
/
...--F--G--H
and now Git pulls its clever little trick: it writes I
's hash ID into the current branch name, i.e., feature
:
I <-- feature (HEAD)
/
...--F--G--H <-- master
If we make yet another new commit before we do anything else, we get:
I--J <-- feature (HEAD)
/
...--F--G--H <-- master
If we now switch back to the name master
, here's what happens:
I--J <-- feature
/
...--F--G--H <-- master (HEAD)
We've now had Git attach the special name HEAD
to the name master
, which points to commit H
, not commit J
. Since we're changing commits, Git will now remove, from our working tree, all the files from J
and put in all the files from H
instead (using the snapshots).1
If we were to run git merge feature
now, you'd get the thing you expected. But before we do, let's make more commits. We'll modify some file that also got modified in I
and/or J
, as we make two new commits K
and L
:
I--J <-- feature
/
...--F--G--H
\
K--L <-- master (HEAD)
If we compare the files in J
vs the files in L
(in either order), we'll get a recipe that would change what's in one of those two commits to match what's in the other. But if we run git merge feature
, we don't want to lose the good stuff we did in commits K-L
. So that's not what git merge
does.
Instead, git merge
:
- locates the current commit: that's easy, it's whatever
HEAD
is attached to, and we already have that commit checked out too, as--ours
; - locates the other commit, in this case commit
J
, from the name we supply: we can actually give a raw hash ID as Git just needs the commit; this becomes the--theirs
commit; - and, last, uses the graph we've been drawing to find the best shared starting-point commit.
That last commit is obvious from the drawing here: it's commit H
. Commit H
is on both branches. Commits I-J
are only on feature
, and commits K-L
are only on master
, but all commits up to and including H
are on both branches. So all those commits are shared, and H
is obviously the best one.2
Now, to do the merge, Git will:
- compare the snapshot in
H
to that inL
: that shows what you did, on branchmaster
, to get to the--ours
commit; - compare the snapshot in
H
to that inJ
: that shows that they did, on branchfeature
, to get to the--theirs
commit; - combine the two sets of changes.
The combined changes then get applied to the snapshot from H
. That keeps our changes and adds theirs, or, equivalently, keeps their changes and adds ours. Either way, what ends up in the files doesn't necessarily match either commit J
or commit L
, because we took two sets of changes.
If all goes well, Git makes a new merge commit. A merge commit is exactly the same as any other commit: it has a snapshot, and a list of parent hash IDs. What makes it a merge commit is that the list of parent hash IDs has not just one hash ID, but two:3
I--J <-- feature
/ \₂
...--F--G--H M
\ /¹
K--L <-- master (HEAD)
There's still just the one snapshot. The first parent, marked with a tiny 1
here, leads back to the commit we were on when we started the git merge
. The second parent leads back to the commit we told Git to merge.
Note that in the much simpler case of:
I--J <-- feature
/
...--F--G--H <-- master (HEAD)
Git still does all the fancy merge-base calculation. This time, however, our commit is H
, theirs is J
, and the merge base is ... H
again. If Git were to compare the snapshot in H
vs the snapshot in H
, the list of changes it would find would be empty.
We can force Git to do that anyway, with git merge --no-ff
. The result is a new merge commit, with two parents as usual:
I--J <-- feature
/ \
...--F--G--H------M <-- master (HEAD)
but now the snapshot in M
really does match the snapshot in J
. That's because Git compared H
to J
to see what they changed, then applied those changes to H
, and used no changes from our side, to make the snapshot. Algebraically,4 H (J - H)
is just J
again.
If we don't force a real merge, Git will do a sort of fake not-actually-a-merge, called a fast forward. But GitHub won't let you do a fast-forward, if you use their clicky buttons: they have MERGE, REBASE AND MERGE, and SQUASH AND MERGE and none of those is Git's internal fast-forward. (The REBASE AND MERGE option comes close, but isn't the same.)
1This explanation deliberately skips a lot of weird corner cases that Git allows.
2Proof by vigorous hand waving, number 8 on the list. More seriously, see Lowest Common Ancestor of a DAG.
3Technically, a Git merge can have more than two parents, but that's rare (and mainly used for showing off