So I'm working on a bunch of branches. And I remember, that on one of these branches, I did some super-smart change to a file. But I can't remember which branch this happened in.
Are there a git-command, that can show me all the latest changes to a file, across all the branches I have locally, for a repository?
Example
/---D
/
/ /---E
/ /
A - B - C
\
\
\--F
I'm sitting on C and I know that I've made a super-clever commit in either D, E or F, to a specific file.
I could go through them one by one, to see the contents of the file. But I was hoping for a command like this:
$ git magic-command-1 "path/to/target/file"
Commit 123456 on branch "F" at 2022-05-19 15:10
Commit 234567 on branch "E" at 2022-05-19 14:33
Commit 345678 on branch "D" at 2022-05-19 11:12
and maybe also something that shows the differences.
I tried this:
git log -p -- cypress.development.json
But I'm not sure if it show across all branches or not. Or which branches the given changes show.
I also read here about something about an --all
-flag, but the output doesn't show which branch the change is made on:
I also looked at the --source
-flag, but the results doesn't really make any sense to me.
Regardless of what I do, I feel like I'm missing a command to appropriately compare the same file across all local branches.
CodePudding user response:
TL;DR
Use git branch --contains
with the hash IDs you find. But: why do you care? The hash ID is all you really need.
Long
There's a basic problem here with your diagram: it has no branch names on it. Let's put some branch names on it and then ask a key question:
/---D <-- br1
/
/ /---E <-- br2
/ /
A - B - C <-- br3
\
\
\--F <-- br4
Which branch is commit A
on?
Warning: this is a trick question! The answer is below, with (I hope) enough text in between so that you can't just cheat and read it, and will instead have to think about this. The obvious answer is "it's on br3
" but this isn't right. (It's not wrong, it's just not right.)
What you will want to do
I also read here about something about an
--all
-flag ...
Use this flag, then use git describe
or git branch --contains
with the found commit hash IDs, or:
I also looked at the
--source
-flag, but the results doesn't really make any sense to me.
The --source
flag does what the git log
documentation says it does:
--source
Print out the ref name given on the command line by which each commit was reached.
but, as is common, the reference manual is terse and laden with jargon here. The flag gets you some of the information you need, and sometimes it will be everything you need, but git branch --contains
or git describe
may still be more useful.
The answer to the trick question
Commit A
is on every branch.
The trick here is that in Git, many commits are on many branches simultaneously. Some commits may be on no branch. This gets us into a separate Git question, which is: What exactly do we mean by "branch"? The word branch in Git is actually ambiguous, and overused, sometimes to the point where it nearly loses all meaning. Once you get used to the crazy multiple meanings, though, it turns out that humans usually assign the right meaning automatically: a branch is a branch name, but it's also a remote-tracking name, a particular commit that Git calls more formally a tip commit, and a set of commits ending at the tip commit. A Git branch is all of these things, and yet, when a human says "branch", they usually mean only one of these things.
To make any sense out of this, we need the concept of reachability. Reachability is actually a graph-theory thing. The diagram you drew is a commit graph, with the letters A
through F
standing in for actual commits. Each actual commit has some unique, big and ugly and random-looking hash ID, but those are too hard for humans, so we mostly ignore them whenever we can, or use substitutes like these letters A
through F
here.
Each commit links backwards to a previous or parent commit. Here, commit C
links backwards to commit B
, which links backwards to commit A
. Commit D
links backwards to A
as well, and so does F
; E
links backwards to B
, which we already noted links backwards to A
.
By following the backwards-pointing links, Git finds the commits. Git finds the end commits—the branch tip commits—using the branch names, which are what humans tend to care about and use. But then Git works backwards from there.
When we start with, say, br1
, Git will find commit D
, then work backwards and find commit A
. This means commit A
is "on", or "contained in", branch br1
. But we can also start with br2
and find A
, and we can start with br3
and find A
, and so on. Indeed, since A
is our very first commit, all roads lead to Rome A
: commit A
is on every branch. It will be on future branches too.1
It is literally impossible, in Git, to know which branch a commit was created on unless you record that as text in the commit message. That's because we can create and destroy branch names at will: each branch name simply selects (or "points to") some commit in the commit graph. We pick this commit at the time we create the branch name.
Then, when we check out (switch to) the branch and make a new commit, Git makes the new commit such that it points backwards to the commit we had checked out, and stores the new commit's hash ID into the branch name so that the new commit is now the tip commit. So, given your diagram, if we git switch br3
and make a new commit, the name br3
will point to our new commit G
afterward; G
will point backwards to C
; and commit A
remains on every branch.
If we delete branch name br1
entirely, commit D
becomes un-findable, because we find the commits using branch names and working backwards. There's only one way to find D
right now, and that's to use br1
. So by deleting the name br1
, we "lose" commit D
. It becomes unreachable.2
So reachability means "how we get there". We get to commits from branch names. For much more on this concept, see Think Like (a) Git.
1It is possible, in Git, to create more than one root commit, and hence set up new branches that don't lead back to commit A
. But that's not very typical and we won't cover it here.
2Git will eventually discard an unreachable commit. You do, however, get a grace period to get the commit back, typically a minimum of 30 days. The problem is that you must find the commit's unique hash ID, which you would do using the branch name, but now that the branch name is gone... well, that's the dilemma.
Reachability, git branch --contains
, and git log --source
Now that you understand reachability, git branch --contains
will make sense. You give git branch --contains
some hash ID, e.g., the hash ID of commit B
or E
or A
. What git branch --contains
does is:
- starting from every branch name, work backwards;
- if this reaches the commit, print the branch name
so when used with the commit hash ID B
this will print br2
and br3
, as those are the two branch names that can reach B
.
The --source
option to git log
simply prints whichever name git log
was using at the time it found some commit. This is actually more complicated to explain, because git log
itself is pretty complicated!
What git log
does is walk the graph, printing some of the commits it encounters as it goes. That is, we give git log
some number of starting points, such as one or more branch names or commit hash IDs. The git log
command takes these names and resolves them to hash IDs, or takes the hash IDs (which are already hash IDs), and finds the named commits. It puts each commit into a priority queue.
If we run git log
with no arguments, git log
uses the special name HEAD
. This name is normally attached to one branch name. Using git switch
or git checkout
, we control which branch name HEAD
is attached-to; that's the branch that gets extended when we make a new commit, so it's pretty important! That branch name is the current branch, and that's what git log
shows by default: that is, running git log
with no arguments means git log
resolves HEAD
to the current commit's commit hash ID, and puts that (single) hash ID in the queue.
Now that the queue has some commit or commits in it, git log
takes the front entry off the queue. Since the queue is a priority queue, there's a sorting order, if there's more than one entry in it. But it's extremely common for the queue to have just the one entry! For instance, if we run git log
with no arguments, the current commit is the one entry in it when we start. If we run git log br1
, Git puts F
's hash ID into it, and again there's just the one entry.
Anyway, having taken the front entry out of the queue, git log
now decides, based on any arguments you gave like --no-merges
or whatever, whether to show this commit. If it's supposed to show the commit, it does that. We call this visiting the commit, as though we're on holiday and going to certain attractions or cities or whatever.
Next, having shown or not shown the commit, git log
finds the parent or parents of the commit. In your sample graph, each commit has exactly one parent, except for commit A
which has no parent. (A merge commit, if there were any, would have two parents.) By default, git log
puts all the parents into the queue, unless those parents have already been visited.
With its one parent, if we've just visited F
, git log
would put F
's parent A
into the queue. The queue was empty—F
was the only thing in it at the start of all of this—so now there's again just one entry in the queue. The git log
command now takes out and visits the one commit in the queue, i.e., commit A
. It shows commit A
, if it's supposed to do that, and then puts A
's parents into the queue. There are no parents, so this puts nothing in the queue, and the queue remains empty.
Once the queue is empty like this, git log
quits. So by starting at F
via name br4
, we visit commits F
and A
and stop, and that's what git log
would show.
If, on the other hand, we run git log --all
, the code will put D
, E
, C
, and F
all into the queue. There are now four entries so the priority really matters. This priority causes git log
to sort its output. The default sort is based on the stored committer date in each commit, with later commits being higher priority. So if commit F
is the latest commit, that's the one that surfaces first.
We'll visit F
, printing it out and putting its parent A
into the queue: the queue now contains A
, D, E
, and C
(in date order). Let's say that E
has the next-highest-priority date: git log
will pop E
out of the queue, visit it, and insert B
into the queue. Then git log
will take the highest priority commit out of the queue—let's continue the theme and say this is D
—and visit that one. This would put A
into the queue, but it's already there; it doesn't go in twice. We now visit C
, which wants to put B
in the queue, but it's already there; we then visit B
, which wants to put A
in the queue, but it's already there; and we visit A
, which is the last thing in the queue and puts nothing into the queue, and so git log
finally stops.
The --source
flag simply annotates each output, for any given commit, with the name that first led Git to this commit. So for C
, that's br3
.3 For B
, that's either br2
or br3
, depending on whether git log
visited C
or E
first.
The visiting order depends on the priority order. You can control this, to some extent at least, with options like --topo-order
or --author-date-order
. But in a big graph, especially one with a lot of branch-and-merge action in it, it's very difficult to know which of many names might first reach some commit. Only in small and simple graphs like yours here will you get something predictable.
3With git log --all
you will see refs/heads/br3
rather than just br3
. That's simply the full name of the branch. All branches have short names like br3
, and full ones like refs/heads/br3
. I like to think of the full name as what their mom (or spouse) says when she's mad at them, kind of like Stella Mudd in these ST:TOS clips.