When pushing from develop to master, there is an error on stage distributed-tasks in a test job (in a Gitlab pipeline). Locally or in develop, there was no problem with it.
I'd like to understand what does this error message actually mean, what is "disk" here and what does this master~10 mean exactly (while googling, I understood that it means "compare current pushed branch to all the parent commits in master up to 10", is it correct and why to do it at all?)
how to solve this issue and how to debug it locally?
CodePudding user response:
I'd like to understand what
fatal: Path '.eslintrc.json' exists on disk, but not in 'master~10'
actually mean[s], what is "disk" here and what does this master~10 mean exactly
Being exact means we need to start with some key background.
Background
When using Git, we run:
git checkout somebranch
or:
git switch somebranch
or sometimes—CI/CD systems are particularly fond of doing this—we might use:
git checkout <hash-ID>
or:
git switch --detach <hash-ID>
These commands direct Git to extract the source tree from some particular commit, and make that particular commit the current commit. The hash ID of that commit is either one we give Git directly—a raw hash ID, as in the git switch --detach hash-ID
case—or the one that some branch name implies, when we use a branch name.
(When we do use a branch name, the effect of the checkout or switch command is to make this the current branch, while making its most recent or tip commit the current commit. When we use the detached HEAD mode, we end up with no branch, but we still have the given commit as the current commit. So we always have a current commit; we have a current branch name as well, if and only if we use a branch name in the checkout / switch command.)
The reason Git works this way is that Git does not store files, exactly, nor does Git really work with branches. What Git stores are commits. The commits all have unique hash IDs: these are their "true names", as it were. Each commit stores files: in fact, each commit stores a full set of every file, as of the form it had at the time you (or whoever) made that commit. So this is how we get files out of Git: they're in each commit, and we pick the commit we want and we get those files.
The problem with using raw hash IDs is that they are big, ugly, random-looking things that humans are generally incapable of getting right. Humans don't like or use them. The computer can use them—computers are good at this sort of thing—but humans won't. Humans like names: branch names like master
or main
, develop
, feature/tall
, and so on. These work well for humans. So besides the snapshots-in-commits—stored as a big database of Git internal objects, indexed by hash ID—Git provides a database in which Git stores these names and pairs each name up with a single hash ID.
Branch names, in particular, always mean the latest commit "on" that branch. Git strings the commits themselves together, backwards, using metadata stored in each commit.
In other words, each commit, as found by its unique (and random-looking and big and ugly and impossible-for-humans) hash ID, stores two things:
A commit has a source snapshot, rather like a tarball or WinZip archive or whatever. (Unlike some archive formats, however, the files in the snapshot are stored in a special Git-only format where they not only compressed, they are de-duplicated against all commits, including this commit itself.)
A commit has metadata, or information about the commit itself. The metadata include the name of the person who made the commit, for instance. They include a log message that the committer wrote. They include some date-and-time stamps to show when the committer made the commit.
To make Git branches work, Git includes, in the metadata for each commit, a list of previous commit hash IDs. This list is typically just one element long: that gives the hash ID of this commit's parent commit.
The result of these "ordinary commit" parent hash IDs is a simple, linear, but backwards-looking chain. Let's draw one, replacing the real hash IDs with simple single uppercase letters. We'll put the most recent commit—whose hash ID we will call H
—on the right, here:
... <-F <-G <-H
Commit H
stores a snapshot and metadata, and in the metadata for H
, Git has stored another, earlier commit's hash ID. That hash ID—G
in our diagram, but some big ugly random-looking thing in reality—is the true name of commit G
in the big database of Git objects, so Git can use that hash ID to extract commit G
.
Commit G
is of course a commit, so it has both a snapshot and metadata, and in the metadata for G
, there's one hash ID for earlier commit F
. So Git can read G
to find F
's hash ID, and use that to find F
itself.
Commit F
is of course a commit, so it has both a snapshot and metadata, and ... well, you should see where this is going by now. By repeatedly reading one commit, including its metadata, Git can keep backing up through history, one commit at a time: from H
we get to G
, from there we get to F
, from there we get to (presumably) E
, and so on, backwards through time. Eventually we'll get to the very first commit ever, which here would be commit A
: that commit has an empty list of previous commits, because there is no previous commit, and so Git can finally stop going backwards.
This is the history in a repository: The commits in the repository are the history. All we have to do is somehow find the raw hash ID of the last commit H
, and Git can then work backwards. This is where branch names come in.
In order to find commit H
—the last commit—Git stores the hash ID of the last commit in a name. When that name is a branch name, we can give it to git checkout
or git switch
and get "on the branch". Since the name contains the raw hash ID, we'll draw it as an arrow, pointing to the commit:
...--G--H <-- master
As the branch name here is master
, the name master
contains the raw hash ID of commit H
, from which Git can find the earlier commits.
If and when we make a new branch, we start out the new name pointing to some existing commit. We can pick any commit, but usually we'll start with H
in a case like this:
...--G--H <-- develop, master
When we use git checkout
or git switch
to pick one of these names, we get an attached HEAD, so that that name is the current name:
...--G--H <-- develop (HEAD), master
This means our most recent checkout/switch was to develop
. We have all the files from commit H
to work on / with.
The files in a Git commit are read-only. The de-duplication / sharing trick only works because all parts of every commit are completely read-only (and only Git can actually read the files stored in the commit), so Git has to copy the files out of the commit before we can work on or with them. Having copied them out, these files are now available for use. Git calls them "on disk" at this point, in your error message.
Given that we're working on/with commit H
via develop
, if we now modify the "on disk" copy of some files and git add
and git commit
, we'll get a new commit, which we can call I
, and Git will make the name develop
point to new commit I
:
I <-- develop (HEAD)
/
...--G--H <-- master
The name develop
now means the latest commit on develop
, which is commit I
. The name master
now means the latest commit on master
, which is still commit H
. If we make a second new commit on develop
we'll get:
I--J <-- develop (HEAD)
/
...--G--H <-- master
That is, the name develop
now points to commit J
. Commit J
points back to commit I
, which points back to commit H
, which points back to commit G
, and so on.
Note that:
There are now two latest commits:
H
is the latest (onmaster
) andJ
is the latest (ondevelop
). By definition, whatever commit some branch name points to, is the latest commit "on" that branch. Git doesn't normally do this, but we can force the namemaster
to point toG
instead ofH
:H--I--J <-- develop (HEAD) / ...--G <-- master
If we do this (there are reasons not to!) then
G
is now the latest onmaster
and commitsH-I-J
are only ondevelop
. Note that all the commits are ondevelop
. If we movemaster
forward one hop toH
again, we're back to the previous setup, with commits up throughH
, andI-J
only ondevelop
.It's the commits that matter; the branch names are just there to help us (and Git) find the commits. The commits never change, but the branch names change a lot. The set of commits that is "on" a branch typically increases over time: we add new commits, and/or we have Git move a branch name "forwards".
If we switch back to master
, Git will erase, from our "on-disk" working area, all the files from commit J
and put in, instead, the files from commit H
. We can now create a second feature branch:
I--J <-- develop
/
...--G--H <-- feature (HEAD), master
and make two more commits:
I--J <-- develop
/
...--G--H <-- master
\
K--L <-- feature (HEAD)
We can only ever add commits to the repository. Existing commits cannot be changed. New commits point backwards to existing commits, so that there is history. History is nothing but the commits, as found by starting from some branch name and working backwards. The commits store files and metadata; the metadata lets history work; and the branch names find the latest commits, from which Git works backwards.
Back to your question
what is "disk" here and what does this master~10 mean exactly
We've already seen what "on disk" means: there is a current commit that Git has extracted, and by extracting the files from that commit, Git has created files in your working tree. You may have also created new files since the checkout or switch operation: those files are also "on disk" in this situation.
Git has found .eslint.json
in your working tree.
The master~10
is now explainable via the graphs that we have been drawing. Suppose we have:
I--J <-- develop
/
...--G--H <-- master (HEAD)
\
K--L <-- feature
This means we are "on" master
, using commit H
. The name master
literally means "commit H
" whenever we use it in a context in which Git needs a commit, such as if we run:
git diff master feature
or:
git show master
In these cases we're directing Git to look at commits H
and L
for the git diff
command, or commit H
and its parent G
for the git show
command.
Adding a tilde ~
suffix, followed by an optional number, directs Git to hop backwards the given number of times. The number defaults to 1, but here we have 10
, so this would mean to move back ten times. That's too many to bother with for our example, so let's consider master~3
instead:
- start at
H
- move back once: we're now at
G
- move back again: we're now at
F
, presumably (I didn't draw it though) - move back a third time: we're now at
E
, presumably
so master~3
would select commit E
, assuming stuff about the commits I didn't draw.
Similarly, feature~3
would mean:
- start at
L
- move back once to
K
- move back a second time to
H
- move back a third time to
G
and hence it would mean commit G
. And develop~3
would start at J
, move back three times, and end up at commit G
again.
So these are all just ways of spelling a commit hash ID without having to spell out a commit hash ID. Moreover, they're relative to the last commit in the branch (by definition because the branch name always means the last commit in the branch). As we make new commits on the branch, the branch gets longer and longer, and if we always move back 3 hops (or 10 hops) we'll always get a commit that's 3 (or 10) hops back from the last.
So the error message that says .eslintrc.json
is not in master~10
just means that whatever commit you've chosen via that ~10
suffix applied to master
, that commit does not have .eslint.json
in it. The files in that commit lack the .eslint.json
file.
To get this particular message, you would probably have run:
git diff master~10
which means compare the files stored in the commit found via master~10
to the files in the working tree (on disk).