I have the folder structure thus
project/
----A/
----B/
-1.txt
-2.txt
-.gitignore [ content is: (Line1) * (Line2) !1.txt ]
-.gitignore [ content is: (Line1) /B/* ]
-.gitignore [ content is: (Line1) /A/*
.git/
-.gitignore [content is: (Line1) /project/*]
The above does not track 1.txt
nor does it track 2.txt
My understanding of project/.gitignore
which contains:
/A/*
was:
Ignore everything under folder A/
except for exceptions you may encounter in deeper .gitignore
s in subfolders, for instance, due to, say project/A/B/.gitignore
which is:
*
!1.txt
that force you to track 1.txt
. That was also my interpretation of project/A/.gitignore
which is:
/B/*
That is, ignore everything under folder B/
except for exceptions you may encounter in deeper .gitignore
s in subfolders, for instance, due to, say project/A/B/.gitignore
.
Since in the example above neither 1.txt
nor 2.txt
are tracked, I am unclear what the right interpretation of /A/*
and /B/*
mean in the context above.
Everything else being the same, the following change to project/.gitignore
of:
!A/
tracks 1.txt
while not tracking 2.txt
.
I would like to understand clearly why /A/*
does not work while !A/
works in this case.
CodePudding user response:
The information you provide alone is not enough to reproduce your setup :
running the following script :
#!/bin/bash
rm -rf /tmp/testrepo
mkdir -p /tmp/testrepo
cd /tmp/testrepo
git init
mkdir -p project/A/B
touch project/A/B/1.txt project/A/B/2.txt
check_ignore () {
local path=$1
echo "--- checking $path:"
git check-ignore -v "$path"
}
echo "# with initial .gitignore files:"
check_ignore project/A
check_ignore project/A/B
check_ignore project/A/B/1.txt
check_ignore project/A/B/2.txt
echo "!A/" >> project/.gitignore
echo
echo "# after adding '!A/' in project/.gitignore:"
check_ignore project/A
check_ignore project/A/B # that directory is still gitignored
# by the '/A/*' gitignore rule
check_ignore project/A/B/1.txt # so its content is not inspected
check_ignore project/A/B/2.txt
I have directory B
(in project/A/B
) completely ignored, which makes that neither 1.txt nor 2.txt is tracked.
If an ignore rule matches a directory, then git
will not descend into that directory at all and no inner .gitignore
file can act on what is tracked within it.
So, in your case :
- the
/A/*
rule will not ignore directory/A/
:git
will inspect its content, and possibly apply rules described in/A/.gitignore
, - if however no rule counters the
/A/*
forA/B
, thenB/
will be completely ignored, and neitherB/1.txt
norB/2.txt
will be tracked.
Such a rule can be :
- a
!B/
rule inproject/A/.gitignore
- or a
!A/B
rule inproject/.gitignore
Your sentence should be adjusted :
a /A/*
pattern allows you to unignore files and folders one level down (in A/.gitignore
), but .gitignore
files at deeper levels will not have an impact on their own.
CodePudding user response:
See LeGEC's answer for a flaw in your question as posed. I'm going to ignore the flaw and plow directly into .gitignore
rules instead. But first, we need to consider something odd here. There's a sort of impedance mismatch here between Git, which does not store folders (only files), and your OS, which insists that files must exist inside folders. It's this fundamental disagreement between "how Git works" and "how your OS insists that Git should work instead" that leads to this issue. Git has to bridge this difference, and to do that, Git makes certain compromises.
Background, or what you need to know before we even start
Let's look at the difference between some stored-in-Git file and some OS-stored version of that same file, assuming for the moment that we're on Windows, so that files have path names like C:\path\to\file
. We'll be in C:\top\mid
and create a new Git repository here, and make a commit that has in it the following two files:
.gitignore
sub/file
To Git, that second file is a file named sub/file
. You can see this by running:
git ls-files --stage
which will list out both files. Technically, these two files are in Git's index or staging area at this point, but Git builds commits from the index, not from what's in your working tree. (The terms index and staging area are pretty much interchangeable. I tend to use the shorter and less meaningful one for various reasons, when talking about the technical aspects of Git.)
Your Windows machine, by contrast, does not have a file named sub/file
. Instead, it has, in C:\top\mid
, a folder named sub
, and in that sub
folder, a file named file
. So the full path of that file is C:\top\mid\sub\file
. Git knows that the repository itself is C:\top\mid
at this point and takes that part away, and constructs the name sub/file
, with forward slash, to update its index copy of the file, when you run git add
as appropriate.
So Git has a sort of flat file system, with files with "folder names" embedded right in the file names, and literal forward slashes. But the computer's file system has folders-and-files. Even if we move to macOS or Linux or whatever, we still have the folder-and-file arrangement; we just now have /top/mid/sub/file
instead of the silly drive-letter things and the annoying backwards slashes.
Since Git actually makes new commits by writing out, to the repository, a commit containing all the files (names and contents) as listed in the index / staging-area, our job—whenever we're doing new work—consists of updating, and maybe adding and/or removing, OS-style files in our working tree, but then we have to tell Git to update its index. We do that part—the hey Git, I have new stuff now step—using git add
and sometimes git rm
or git rm --cached
. This operation tells Git to look in the working tree—the folder-and-file stuff that the OS demands we use—from which Git will assemble its internal-format, ready-to-commit "blob" objects whose hash IDs and path names Git stashes in the index / staging-area.
The base problem
When we run any en-masse git add
command, like:
git add .
we're telling Git to scan, recursively, all the folders and sub-folders we have at the current working directory. That is, Git will open (using the C library opendir
function) the path .
to read the current directory, where it will find .gitignore
and sub
. Using additional OS calls if and as needed, Git will find out that .gitignore
is a file, and sub
is a folder, and will get lstat
data about the file and folder.
Git's index—which has a third term, cache—contains previously-obtained lstat
data and Git can sometimes use this to very quickly determine that, e.g., the .gitignore
file has not been modified, and therefore there is no need to replace the index copy of .gitignore
with a new compressed and Git-ified file. But (with certain exceptions that have grown over time as the Git software has gotten more and more complicated), there's no entry in the index for a folder, so in general, Git is forced to open and read the sub
folder, recursively, the same way it opened and read the .
folder.
Having opened and read through sub
, Git will find file
, and Git will assemble the two pieces of name to get sub/file
(even on Windows, where the OS wants to call it sub\file
). As usual, the cached lstat data may or may not enable Git to quickly skip opening, reading, compressing, and generally Git-ify-ing the sub/file
content. If not, Git opens and reads and compresses it, and checks to see if that content is already present anywhere in any commit anywhere in the repository.
All of this scanning and opening and reading is very slow. So for files that shouldn't be added, we prevent Git from bothering by listing their names in .gitignore
. That's great for files—but for every folder in mid
, Git has to open and read it, and for every sub-folder within that folder, Git has to open and read it, and so on recursively. Since Git is pretty well optimized, it turns out that this recursive scan of the directories is often the slowest part of git add
.
To make this go much faster, Git tries to be clever. Suppose that we'll ultimately ignore everything in sub2
due to a line like sub2/**
or sub2/*
or sub2/
. Then instead of opening and reading the folder sub2
, Git can simply skip it entirely!
So, Git does that: if we tell Git that some directory aka folder should be ignored, Git skips opening and reading it entirely. This means that any files within that directory—even .gitignore
files in it—are never even seen, and hence can't be obeyed.
That means that if you want Git to get into some directory (folder) to scan it, that part of the path, starting at .
(corresponding to top/mid
in our case) must not be ignored. Once it's not ignored, Git commits to opening and reading it, including any .gitignore
file it contains. The rules within that .gitignore
file are then temporarily added to the top level .gitignore
and/or .git/info/exclude
and/or core.excludesFile
ignore rules (with higher priority, but forcibly limited to this sub-directory) while doing the recursive scan.
More detail
Keeping the above in mind—those rules cover what Git sees, and if Git doesn't see something, it can't possibly git add
it—we now get to the individual .gitignore
-file rules:
- An entry can be a simple name or glob like
sub2
or*
. - An entry can be prefixed with a slash, or contain a slash, such as
/sub2
orsub2/path
. Parts of this can use glob characters like*
or**
(with**
nominally meaning match across directories / folders, vs a single*
that won't cross over a slash character). - An entry can be prefixed with
!
, making it negated. For!
to mean negation it must be the very first character, so if you want to prefix with both!
and/
you must write!/
, not/!
. - An entry can end with
/
. This final slash has a particular meaning and doesn't affect the "prefixed with" or "contains" slash stuff.
The stuff about slashes, excluding those final slash characters, gets a bit messy. I like to use the terms anchored and un-anchored to distinguish between these: a name like sub2
or pattern like *
is un-anchored, but a name like sub2/path
or /sub2
or /*
is anchored. However, */
is not anchored since the slash is the last character.
The final slash, if present, means "only if this is a directory". So sub2/
means "sub2, but only if sub2 is actually a directory" and */
means "everything, but only if it's a directory".
Now we get into how Git views these ignore rules. Remember, at the point that Git is scanning through some directory (folder) like .
or sub
, it's already read in the appropriate .gitignore
file and has converted the rules to the internal form, so that it knows:
- this rule applies only to directories, or not (had a trailing
/
which is now removed); - this rule is anchored, or not (did or didn't have another
/
); - is negated, or not (did or didn't start with
!
which is now removed); - in which level the
.gitignore
appeared (e.g., was itsub/.gitignore
orsub2/.gitignore
?—this information can technically be compressed down to a single integer indicating how deep we are in the recursive traversal, but you can think of it as a path, if that makes it easier to think about).
Git now reads each entry in the directory, one at a time. Each entry names either a file—including a symbolic link, which Git treats "as if" it were a file whose contents are the symlink target—or is a folder/directory. (On systems like Linux that have "socket files" and "device special files" and the like, if Git encounters one, it just skips over it and pretends it's not there—Git can't deal with these.)
Having read the entry's name, Git has both the short and simple name (file
or d.ext
for instance) and the constructed full path (sub/file
, if we're reading sub
, or sub2/a/b/c/d.ext
or whatever, if we're reading sub2/b/c
for instance). Git now checks to see if the entry matches, which depends on the anchored-ness:
If the entry is not anchored, it matches if the simple name (
file
ord.ext
) matches this unanchored rule, provided that any "must be a directory" thing matches.If the entry is anchored, the full path name must match the anchored rule, excluding whatever part gets excluded based on depth. For instance if we're looking in
sub2/b/c
and there's asub2/b/.gitignore
that saysc/d.ext
, we match here if this isd.ext
, but not if the entry saysx/d.ext
: the part we take away from the full path issub2/b/
since that's where the.gitignore
rule came from).
[Note that **
matching gets kind of complicated here, and occasionally the (quite hairy) .gitignore
code that tries to speed this up gets this wrong in test releases. The internal test suite for Git has gotten complicated to try to catch such bugs.]
If the entry doesn't match, we move on. If it does match, it gets remembered, and we move on. We do this for every .gitignore
entry, and take the last match, whatever that is, or we have no match.
If we have no match, the file or directory is not ignored. We'll consider git add
-ing it if it's a file, or recursively scanning it.
If we have a match, the file or directory is ignored unless it's a negated rule: for a negated rule we pretend we didn't have a match.
That's the whole set of rules. Note that there are no exceptions for, e.g., "there's a rule that says don't bother reading sub2
even though there's an additional negated rule that says to keep sub2/important.file
". I'd argue that Git should do this automatically for you, at least for constant strings (glob matchers like *
and **
might make it too hard).
Some general helpful hints
The usual problem is that Git ignores a directory we want it to search. We can—at a cost—tell Git never ignore any directory at all with the simple rule:
!*/
That's a negated, un-anchored rule. Putting this as the last entry in each .gitignore
means that Git will search all the sub-directories it finds at this level, or any lower level that didn't override this rule with its own .gitignore
.
This completely defeats the (sometimes very important) optimization that lets Git not scan entire sub-trees of files.
A more targeted trick is that, if there's some path:
!keep/this/important.file
you can prefix that with:
!keep/
!keep/this/
to make sure that Git searches inside keep
, and then keep/this/
, assuming keep/.gitignore
does not exist or does not override the keep/this/
entry.