Backstory:
- My professor gave me a programming assignment with some template boilerplate code → I completed it and submitted it.
- My professor then added code to his original template code and told everyone to please redo the assignment.
I figured this would be a good opportunity to use a git merge:
What I did:
- I did
git checkout -b linkedListUpdate
Then pasted his new code template over my original code.
Then I did
git add .
I did
git commit -m "Added professor's update to a new branch"
I did
git checkout main
Then I made a minor adjustment to the code (I changed a random comment)
Then I did
git add .
- I did
git commit -m "separating branches..."
- FINALLY I try to merge them:
git merge linkedListUpdate
What I expected: A bunch of merge conflicts for me to resolve to pop up.
What actually happend: linkedListUpdate overwrote what I had in my main branch leaving me with just my professors template code.
Side Question:
What is the better way to separate main from the other branch so that merge forces conflict resolution instead of a fast-forward?
(committing a comment change just to adjust the 'geometry' of the branches seems kinda wrong)
CodePudding user response:
The only real error here is in your expectations: you expected merge conflicts, but there were not going to be any, and there were in fact none. It was definitely a good exercise for you though, as you've just hit on several very important questions!
There are a few background things you should start with, when setting up your own expectations here. Not all of these are strictly required but they help in terms of getting accurate mental images:
A Git repository is mostly a collection of commits and other supporting objects. I say "mostly" because there's also a collection of names (branch, tag, etc., names), which help Git (and you) find particularly-interesting commits, and when working with a usable repository—as opposed to a server-side one that you might find on GitHub, for instance—there's also an area in which you do your work, and then there's a whole host of smaller auxiliary items that are useful for all kinds of things.
A commit holds two things:
- Directly, each commit has some metadata, or information about the commit itself.
- Indirectly (not that you need to care about this part), the commit stores a full snapshot of every file, in a frozen-for-all-time format that only Git can read, and literally nothing can write. The file contents stored in this format are compressed and, importantly, de-duplicated. So if you make a commit, then change just one file and make another commit, it's true that both commits store all the files, but it's also true that the new commit has re-used the files from the earlier commit, except for the one you changed.
All objects, but especially commit objects—you'll rarely if ever deal with the other ones directly—have a hash ID. This is the key by which Git stores the object, in its big key-value database, and hence the key by which Git actually retrieves commits. Git needs the key to look up the commit.1
A branch name, in Git, is a distinguished kind of name (kept in a separate namespace, apart from tag names for instance, so that you could have both a branch
xyz
and a tagxyz
, though that's still a bad idea anyway). All of Git's names are stored in a second key-value database with the full name as the key—the full name of branch B isrefs/heads/B
—and one hash ID as the value. You only get one hash ID, but that's all Git needs.
The graph that you in your image see has round dots representing commits, and labels in oblong (rectangular with rounded corner) boxes representing names. The names point to the commits by storing the commit hash IDs.
As a plain-text image, I would draw the same thing like this:
J <-- main
/
...--G--H <-- origin/main
\
I <-- linkedListUpdate
where each of these uppercase letters stands in for a raw commit hash ID (we avoid trying to type these in as they're kind of unusable, e.g., 9bf691b78cf906751e65d65ba0c6ffdcd9a5a12c
). The metadata for any one given commit, such as H
here, contains the raw hash ID of the commit that comes right before it, e.g., G
. So these are actually backwards-pointing arrows:
... <-F <-G <-H <-- origin/main
with the name origin/main
giving us quick direct access to commit H
, and commit H
itself giving us (and Git) indirect access to commit G
, which in turn gives us access to earlier commit F
, and so on.
Git says that the commits that are reachable from a name and working backwards are "on" the branch. So commit H
and earlier, here, is on all the branches.2 Commit I
is only on linkedListUpdate
and commit J
is only on main
.
With all this in mind, let's take a look at git merge
, and also at the distinction between a true merge and the fake, non-merge-y merges that you are already aware of and asking about in your side question.
1There are maintenance commands that can (slowly and painfully) trawl through the entire database, but as these can take many minutes, you wouldn't want that to be the normal mode of getting-work-done.
2Whether origin/main
counts as a branch depends on who you ask and what they're thinking at this moment: in particular, the name origin/main
is not a branch name, but rather a remote-tracking name living in the refs/remotes/
namespace. You can easily extract this particular commit, because it has a name to find it in one step, but you cannot get "on" origin/main
as a branch because it's not a branch name.
Ultimately, the word branch is badly overused in Git, and it's often a good idea to quality exactly what you mean when you say "branch".
True merges
Given a starting setup like this:
J <-- main (HEAD)
/
...--G--H
\
I <-- linkedListUpdate
(the attached HEAD
shows which branch you're "on", as in git status
would say on branch main
), you run:
git merge linkedListUpdate
What happens? Let's start with the goal: The goal of a merge is to combine work. But now we have to think about this. What does work even mean?
Let's get a little more abstract and look at a situation where there are two or more commits in each "branch" that branches off from some common starting point:
I--J <-- br1
/
...--G--H
\
K--L <-- br2
Each commit holds a full snapshot, so the only way we can see what changed is to run git diff
or similar, to compare two commits.
We could compare commits J
and L
directly, but all that tells us is what's different. It doesn't say who did which kind of work. Suppose we added some lines to some files on "our" branch br1
, and they—whoever they are—added different lines to the same files on their branch. The diff from our commit J
to their commit L
will say to delete the lines we added and add the lines they added. That's not right!
Swapping the commits, so that we compare L
and J
directly, does not help: now we delete the lines they added, and add our lines. Clearly we need something fancier.
We could compare H-I
to see what we changed in I
, then compare I-J
to see what we changed in J
. That at least gets us "the work we did". If there are many commits, we'd have a lot of individual comparisons to do here. But—hang on a minute!—every commit is a full snapshot of every file. What if we just compare H
to J
directly? We'll see "what we did", completely ignoring all the "noise" of how we get there, with intermediate commit I
(and maybe a dozen more that we don't show).
The same trick works for comparing H
to L
, to see what they changed. That's the work they did. The only really hard part is coming up with the best shared commit, but in this case that's obvious: it's commit H
.3 Git calls this best-shared-commit the merge base.
Going back to your own more concrete case, we have:
J <-- main (HEAD)
/
...--G--H
\
I <-- linkedListUpdate
and we have Git compare H
vs J
to see what you did—change one line, probably:
changed a random comment
—and that's the change Git wants to take from "your side" of the operation. We have Git compare H
vs I
to see what your professor did. Then we have Git combine these two sets of changes.
You will get a merge conflict if:
- you and he changed the same lines, in different ways, or
- you made a change to a line, and he made a change to an adjacent line, so that your two sets of changes abut (touch at the edge).
But as long as Git is able to apply your change(s) to places the other branch doesn't touch, and vice versa, Git will be able to combine these changes on its own.
Having combined the changes, Git then applies those combined changes to the snapshot in the merge base commit H
. The resulting snapshot is ready to go into a new merge commit, which I like to call M
for Merge:
J
/ \
...--G--H M <-- main (HEAD)
\ /
I <-- linkedListUpdate
The only thing special about M
is that, in its metadata, it lists two previous commits instead of just one.4 As with any commit, the act of creating the new commit tells Git to update the current branch name so that the branch now points to the new commit. So now commit M
is on main
. Because M
points backwards to two previous commits, though, suddenly commit I
is also on branch main
.
(This shows that branch, in Git, is kind of meaningless. And yet branch names are crucial since they're how we find the commit from which we work backwards. So branches are meaningless, and also very important.