Recently, I observed the following: I had committed locally, and the commits were also visible when looking at the log file. However, remotely, they were not visible when looking under "Commits"
. When doing git status
, I'd expect sth like this:
On branch main Your branch is ahead of 'origin/main' by 1 commit. (use "git push" to publish your local commits)
However, the output of git status
was instead
On branch iss_42 nothing to commit, working tree clean
Only after doing a push did I see the changes remotely.
Does anybody have an explanation for this? Thanks!
CodePudding user response:
they were not visible when looking under "Commits".
You need to check under which branch you were checking those commits.
By default, a remote hosting service (like for instance GitHub) would show you (by default) the main
branch .
But if your git status
shows you that, locally, you are on branch iss_42
(and might have pushed from there), then you need to switch branch (on the remote side, through its Web GUI).
Then you should see your new commits.
CodePudding user response:
When doing
git status
, I'd expect sth like this:On branch main Your branch is ahead of 'origin/main' by 1 commit. (use "git push" to publish your local commits)
[but instead I got]
On branch iss_42 nothing to commit, working tree clean
There are a bunch of complications here, but the TL;DR is: iss_42
does not have an upstream set (at this time). It's fairly likely that iss_42
cannot have the "right" upstream set yet—not, that is, until you run git push
.
Long: what's going on
The first thing to know about Git is that it is, at its heart, about commits, and that a Git repository consists mainly of two databases. One of these two databases holds the commits and other supporting Git objects. The other database holds names, which mainly exist becaues humans are bad at what Git calls hash IDs or object IDs ("OIDs").
Repositories come in two flavors—"bare", which are what we find on most servers, and for lack of an adjective, "not bare" (clothed?), which are the ones we use to get work done. I'm going to skip over this distinction entirely, along with most of the "how we get work done" part, except to touch lightly on a thing about commits. (You won't need to worry about bare repositories here, just remember vaguely that they exist.)
The big (usually much bigger) database is the one holding the Git commits and other supporting objects. All of the things in this database have a hash ID or OID. A hash ID is a big ugly string of letters and digits, such as 30cc8d0f147546d4dd77bf497f4dec51e7265bd8
(this link takes you to a commit in the Git repository for Git itself and that's a real hash ID). For commits in particular, this hash ID is unique. What I mean by this is that big ugly number—it's a hexadecimal number—that starts with 30cc8d0f14
has now been "used up". It means that commit. No other Git repository anywhere in the universe can ever use that number again for some other commit.1
By making this uniqueness constraint, Git sets things up so that two different Git repositories can, at any particular time we choose, "meet up" and compare commit hash IDs. Having done this, the two repositories now know which commits they both share, and which ones only one repository has. Since the hash IDs are unique, the two commits are the same if and only if the hash IDs are the same. The hash IDs, in effect, are the commits, in a way.
The big database is a simple key-value store, with the hash ID being the key, and the object (commit or other supporting object) being the value. The hash ID is thus the "true name" of the object. Git reaches into its big database, using the hash ID as the key, and obtains the value: the commit itself, in the case of a commit. So as long as:
- we have the commit itself in our Git repository, and
- we have somehow memorized its hash ID
we can get that commit out of our Git repository.
This is the fundamental reality of commits in a Git repository: they are numbered, with these big ugly hash IDs, and Git needs that number to get the commit out of the big database. But humans are very bad at these numbers. (What was that number again? Do you even remember that it started with a 3
? Did it start 30ccd8
or 30cd8c
or 30cc8d
? Git sometimes needs the whole thing. Do you remember all 40 characters?) This is why there is a second database. It, too, is a simple key-value database, but its keys are names.
The names in this second database are branch names, tag names, and all kinds of other names that Git provides for us. For each such name, the value that Git keeps is a single hash ID. That single hash ID is enough, and that's where things start to get a little complicated.
1Due to the pigeonhole principle, this trick of Git's is doomed to fail someday. By keeping unrelated Git repositories from meeting each other, and by the sheer size of the hash ID space, we can at least hope that the day of failure will be so many billions of years in the future that not only will we all be dead, so will the sun and perhaps the entire universe.
Note that hash collisions in separate repositories never cause a problem; it's only when they collide within a repository that there's an issue. By keeping unrelated repositories separate, we reduce the birthday problem effect here.
Commits and names
A Git commit is, as we've seen, a numbered entity, which Git looks up by its hash ID. The value stored in the big database holds two things:
Each commit has a full snapshot of some set of source files. This snapshot-of-all-files is the set of files we'll see, and have available to work on / with, if we check out that commit.
Each commit has some metadata, or information about the commit itself. This includes the name and email address of the person who made the commit. It includes a log message as well, and a smattering of other useful information. One thing in particular—needed by Git itself, and put in there by Git at the time you, or whoever, make the commit—is a list of previous commit hash IDs.
The previous-commit-hash-ID-list in each commit is usually exactly one entry long. This (single) hash ID gives Git the hash-ID of the parent commit of this particular commit. We say that the commit points to its parent.
A branch name, in Git, holds one hash ID as well (just like every name in the names database). We say that this name points to the last commit in the branch. That is, if the last commit in branch main
has some hash ID that we'll call H
, we can draw that like this:
<-H <--main
The longer arrow coming out of main
is the hash ID of commit H
, so that main
points to H
; the shorter arrow coming out of H
is the (presumably single) hash ID stored in H
's metadata.
That arrow-from-H
, of course, points to the previous commit. Let's call it G
and draw it in:
<-G <-H <--main
But G
has another arrow sticking out of it, so let's draw in yet another commit:
... <-F <-G <-H <--main
The end result of this is that by using the name main
to find the last commit in the branch, Git can now work backwards, from last commit to parent, parent to grandparent, and so on. The set of commits that Git finds, doing this backwards pointer-following process, is the set of commits that is on the branch.
When we have main
being "ahead of" origin/main
by one commit, that just means that we have this picture:
...--F--G--H <-- origin/main
\
I <-- main
That is, main
—which probably used to point to H
—now points to I
instead. From I
, Git will work backwards to H
, then G
, then F
, and so on, all the way back to the very first commit ever.
This working-backwards process produces the history, so the history in the repository is nothing more or less than the commits in the repository, as found by starting from every name and working backwards. There is of course some overlap: when we start from origin/main
, that's commit H
here, which overlaps with what we get when we start from main
at I
and step back once.
Some other things to know before we add more repositories
There are many more things to know about commits: for instance, the numbering system that Git uses requires that nothing in the objects database ever change. We can add new objects, but we cannot change existing ones. This means that if you have made a bad commit:
...--G--H--I <-- main
(where I
is bad), we can't actually get rid of commit I
at all. But what we can do is make a new commit, which we can call J
—or maybe better, I'
, meaning "new and improved version of I
"—and set things up so that I'
points directly to H
, skipping over I
, and make main
point to I'
, like this:
I ???
/
...--G--H--I' <-- main
Because we (humans) find commits by starting with the names and then having Git work backwards from there, we won't find commit I
any more. If we haven't memorized the hash IDs (and humans just don't do that, it's too hard and useless), it will seem as though commit I
changed into commit I'
. It didn't really, and the fact that it didn't sometimes eventually "leaks out". This can become a problem later, though it won't in this answer.
Note that both the snapshot and the backwards-pointing "arrow" embedded inside the commit are part of the commit; to "change" either one, we have to copy the commit to a new and improved replacement. Again, we won't really cover this here, but it's important to remember for later.
Let's add another repository now
When you git clone
some existing repository (say, from GitHub) to your laptop (or whatever local computer you're using), you are copying the big database. That is, you tell your Git software to:
- create a new, empty pair of databases (so that you have no commits and no branches or other names);
- call up some other Git repository, which has its commits and its names;
- list out all of their commits—or all the ones that can be found using names, anyway; and
- for each of those commits and its supporting objects, copy them into your all-objects database.
The end result is that you have a new Git repository in which you have all of their commits. A peculiar feature of this kind of git clone
, though, is that it does not directly copy their names. Instead, your Git takes the names and hash IDs they listed, and changes their branch names.
The reason for this change is simple enough: your Git software wants you to have your own branch names. So your Git software takes each of their branch names and renames it, turning it into what I call a remote-tracking name.2 That is, their main
becomes your origin/main
, for example.
So, your names database is now full of remote-tracking names, with one of these for each one of their branch names. Your git clone
process is done talking to their Git software now, and your Git disconnects from theirs. But there's one thing left to do:
- You have all of the commits, but no branches.
- You need a branch, because Git likes to have a "current branch name".
- So your Git software creates one branch name.
The branch name that your git clone
creates here is the one you picked out using your -b
option. You probably didn't use a -b
option though. If that's the case, your Git software already asked their Git software, during the cloning process, what name they recommend, and that's the name your Git will use here. With a typical GitHub setup, the name will be main
. So your Git (your software working in your repository) will create your main
, using your origin/main
, which remembers what their main
was at the time you ran git clone
.
It's kind of a long way around, but you now have a branch main
that matches their main
, which your Git remembers as origin/main
. We draw that as:
...--G--H <-- main (HEAD), origin/main
(This attached HEAD
—a special name in Git3—tells Git which branch name is the current branch name. The desire for a current branch is why git clone
created main
in the first place.)
To create a new branch in your clone, you simply tell your Git that it should create a new branch name. This name must point to some commit. Any commit that you have in your repository will do, but you must pick some commit, somehow. Most people mostly pick the latest main
or develop
or whatever commit—which one might find with origin/develop
due to the whole remote-tracking name thing, for instance—and of course these names by definition select the latest commit, so that's easy enough. We'll skip right over the actual command involved (git branch
? git checkout -b
? git switch -c
? these all work) and get to the next part.
2Git calls this a remote-tracking branch name but the word branch here mostly serves to confuse. We already have too many things called "branch". Let's ditch this extra word here.
3This name is so special that if something bad happens on your computer (e.g., your laptop battery runs out unexpectedly, or your laptop crashes and reboots) and the magic HEAD
file gets lost from the .git
repository folder, Git will stop believing that the repository is a repository. This is often easy to repair. It's rare for this to happen at all, but it's nice that a simple repair usually revives the repository. Due to Murphy's laws, this sort of thing always seems to happen right before a big demo or whatever.
Switching to a branch and adding commits to it
Let's imagine a slightly more complicated repository to start with:
...--G--H <-- main (HEAD), origin/main
\
I--J <-- origin/develop
We'll choose to create a new branch pointing to commit J
, to fix some issue. Exactly how we go about creating that branch will affect things, so let's say we run:
git switch --detach origin/develop
which gets us what Git calls a detached HEAD that we draw like this:
...--G--H <-- main, origin/main
\
I--J <-- HEAD, origin/develop
and then we run:
git switch -c iss_42
which creates the name iss_42
pointing to commit J
and switches to that name so that it is the current name:
...--G--H <-- main, origin/main
\
I--J <-- iss_42 (HEAD), origin/develop
We're back in "attached HEAD" mode—which is usually how we want to be, in Git—with our current branch name being iss_42
, and that name selecting commit J
, the latest commit for the GitHub repository's develop
as reflect by our origin/develop
.
This sequence of commands checks out commit J
. The act of checking out a commit tells Git to populate your working tree (where you do your work) with usable copies of the files in that commit's snapshot. So we now have the snapshot from J
as usable files.
You will do your work in the usual way, run git add
in the usual way, and run git commit
in the usual way. Without getting too deep into the details, this makes a new commit from the files in Git's index aka staging area (which is why you have to git add
all the time). The new commit's parent will be the current commit J
, so that our new commit—which we'll call K
—will point back to J
:
...--G--H <-- main, origin/main
\
I--J <-- origin/develop
\
K <-- iss_42 (HEAD)
Note how git commit
has, rather sneakily, stuffed commit K
's new unique hash ID into the name iss_42
. Git knows which name to modify because that's the name HEAD
is attached-to. Git has saved all the files as a new snapshot (with Git's semi-magical file de-duplication trick making the commit not use up lots of disk space, if you only changed one file or two) that goes with commit K
.
You are now one commit ahead of... well, wait, what are you one commit ahead of?
Upstreams and what git push
does (it's a little complicated)
Note that you have this commit, but nobody else has it yet. If your Git software calls up the software on GitHub, and yours lists out your hash IDs, they will see that you have this new commit K
that they don't have at all yet.
Until you run git push
to send commit K
to GitHub, and they store it in the repository over there, they just won't have it at all. They literally can't find it because they don't have it!
You're now expected to run:
git push origin iss_42
git branch --set-upstream-to=origin/iss_42
or:
git push -u origin iss_42
or there's a new feature in Git 2.37 that will imply the -u
option: see VonC's answer to Why do I need to do `--set-upstream` all the time? The -u
option, if you use it (or use the new feature to imply it) tells Git to set an upstream on the current branch. The git branch --set-upstream-to
command tells Git to set an upstream on the current branch.
An upstream offers some extra features, specifically including the one you asked about. That is, to get the Your branch is ahead of ...
message, you must have an upstream set. An old answer of mine (from 2016, before I dropped the word "branch" from "remote-tracking branch name") has more details, but an upstream is just a name—usually a remote-tracking name—that you tell your Git to associate with one of your branch names.
You only have to set it once. Once it's set, it stays set, though you can change it at any time, with another git push -u
or with git branch --set-upstream-to
.
You can't set the upstream of iss_42
to origin/iss_42
until origin/iss_42
exists. Here's the complication, and the reason you want to have git push
set the upstream for you.
Until you send commit K
to the GitHub Git repository, there's no commit K
in their for them to have their iss_42
branch name point-to. So you must run git push
with the right options and arguments to cause your Git to deliver commit K
to them.
The git push
operation works this way:
You select which Git repository to call up, with the
origin
argument for instance.You select which commits you'd like to send, usually using one of your branch names. You're going to send the latest commit on that branch—the one the name points to—and any other earlier commits that you have, that they lack. You select this by writing that branch name.
This is where
git push origin iss_42
comes from in the first place. That's you, selecting where to send and what to send.Your Git sends the commit(s) if/as needed. But remember, Git offers humans branch names so that the humans can find the commits, because we're so bad at hash IDs. So your Git now asks their Git to create or update some branch name. Which one? Well, that's obvious at first: it's the branch name you used here.
It's optionally complicated—Git uses what Git calls a refspec here—but for this kind of
git push
, it really is this simple. So the obvious answer is the right one: the fact that you rangit push origin iss_42
means your Git will ask their Git to create or update their branch namediss_42
.
This request, that they create-or-update their iss_42
branch, is just that: a polite request. They're allowed to refuse, and GitHub have a bunch of ways of controlling who can push, what names they can push to, and so on, but assuming all goes well, they will let you create this new branch on their end. If they say yes to your Git's polite request, that means they have a new branch named iss_42
, pointing to commit K
. That is, after all, what our Git just asked their Git to do, and it said yes.
So, our Git now knows that their Git has a new branch name iss_42
pointing to commit K
. Our Git therefore creates our remote-tracking name origin/iss_42
in our repository. We now have:
...--G--H <-- main, origin/main
\
I--J <-- origin/develop
\
K <-- iss_42 (HEAD), origin/iss_42
in our repository. So we now have the name we need for our git branch --set-upstream-to
operation.
Once we've done all this, creating new commits in our repository, while "on" our iss_42
branch, will produce, e.g.:
...--G--H <-- main, origin/main
\
I--J <-- origin/develop
\
K <-- origin/iss_42
\
L <-- iss_42 (HEAD)
and since iss_42
has origin/iss_42
set as its upstream, you'll see Your branch is ahead of 'origin/iss_42' by 1 commit.
The near-opposite of git push
is git fetch
Last, let's mention that:
git fetch origin
works by having your Git call up the other Git software that answers at the URL stored under the name origin
. Instead of sending commits to them, though, this has them list out all their branch names (and other names) and repeats the same kind of thing git clone
did:
- figure out which commits they have, that we don't;
- bring those over; and
- create or update remote-tracking names.
In fact, git clone
and git fetch
share the code that brings over the commits and creates the remote-tracking names. Cloning is just making a new empty repository, setting it up with a remote, fetching, and then creating and checking out one branch.
Note that the difference between git fetch
and git push
is that with git fetch
, you generally just take everything new they have and update all your remote-tracking names, using git fetch origin
. But with git push
, you pick one branch from which to send commits, and then ask them to set their branch, with git push origin iss_42
. It is possible to use git push
to make multiple create-or-update requests, but that's not very common.
It's also possible to run git fetch
with specific names, e.g., git fetch origin iss_42
. This limits what your fetch brings over and which remote-tracking names get updated. (Unless you're going to do this all the time, though, the next git fetch origin
will bring over everything, and you'll pay whatever cost there is for "get everything" sooner or later. As the price tends to be very low, it's usually best and easiest to just get everything every time: you only pick up the new stuff.)