Home > Enterprise >  Nested .gitignore files -- difference between /folder/* and !folder/
Nested .gitignore files -- difference between /folder/* and !folder/

Time:11-25

I have the folder structure thus

project/
       ----A/
            ----B/
                 -1.txt
                 -2.txt
                 -.gitignore [ content is: (Line1) * (Line2) !1.txt ]
            -.gitignore [ content is: (Line1) /B/* ]
       -.gitignore [ content is: (Line1) /A/*
.git/
-.gitignore [content is: (Line1) /project/*]

The above does not track 1.txt nor does it track 2.txt

My understanding of project/.gitignore which contains:

/A/* 

was:

Ignore everything under folder A/ except for exceptions you may encounter in deeper .gitignores in subfolders, for instance, due to, say project/A/B/.gitignore which is:

*
!1.txt

that force you to track 1.txt. That was also my interpretation of project/A/.gitignore which is:

/B/*

That is, ignore everything under folder B/ except for exceptions you may encounter in deeper .gitignores in subfolders, for instance, due to, say project/A/B/.gitignore.

Since in the example above neither 1.txt nor 2.txt are tracked, I am unclear what the right interpretation of /A/* and /B/* mean in the context above.

Everything else being the same, the following change to project/.gitignore of:

!A/

tracks 1.txt while not tracking 2.txt.

I would like to understand clearly why /A/* does not work while !A/ works in this case.

CodePudding user response:

The information you provide alone is not enough to reproduce your setup :

running the following script :

#!/bin/bash

rm -rf /tmp/testrepo
mkdir -p /tmp/testrepo
cd /tmp/testrepo

git init

mkdir -p project/A/B

touch project/A/B/1.txt project/A/B/2.txt

check_ignore () {
        local path=$1
        echo "--- checking $path:"
        git check-ignore -v "$path"
}

echo "# with initial .gitignore files:"

check_ignore project/A
check_ignore project/A/B
check_ignore project/A/B/1.txt
check_ignore project/A/B/2.txt

echo "!A/" >> project/.gitignore

echo
echo "# after adding '!A/' in project/.gitignore:"

check_ignore project/A
check_ignore project/A/B       # that directory is still gitignored
                               # by the '/A/*' gitignore rule
check_ignore project/A/B/1.txt # so its content is not inspected
check_ignore project/A/B/2.txt

I have directory B (in project/A/B) completely ignored, which makes that neither 1.txt nor 2.txt is tracked.


If an ignore rule matches a directory, then git will not descend into that directory at all and no inner .gitignore file can act on what is tracked within it.

So, in your case :

  • the /A/* rule will not ignore directory /A/ : git will inspect its content, and possibly apply rules described in /A/.gitignore,
  • if however no rule counters the /A/* for A/B, then B/ will be completely ignored, and neither B/1.txt nor B/2.txt will be tracked.

Such a rule can be :

  • a !B/ rule in project/A/.gitignore
  • or a !A/B rule in project/.gitignore

Your sentence should be adjusted :

a /A/* pattern allows you to unignore files and folders one level down (in A/.gitignore), but .gitignore files at deeper levels will not have an impact on their own.

CodePudding user response:

See LeGEC's answer for a flaw in your question as posed. I'm going to ignore the flaw and plow directly into .gitignore rules instead. But first, we need to consider something odd here. There's a sort of impedance mismatch here between Git, which does not store folders (only files), and your OS, which insists that files must exist inside folders. It's this fundamental disagreement between "how Git works" and "how your OS insists that Git should work instead" that leads to this issue. Git has to bridge this difference, and to do that, Git makes certain compromises.

Background, or what you need to know before we even start

Let's look at the difference between some stored-in-Git file and some OS-stored version of that same file, assuming for the moment that we're on Windows, so that files have path names like C:\path\to\file. We'll be in C:\top\mid and create a new Git repository here, and make a commit that has in it the following two files:

.gitignore
sub/file

To Git, that second file is a file named sub/file. You can see this by running:

git ls-files --stage

which will list out both files. Technically, these two files are in Git's index or staging area at this point, but Git builds commits from the index, not from what's in your working tree. (The terms index and staging area are pretty much interchangeable. I tend to use the shorter and less meaningful one for various reasons, when talking about the technical aspects of Git.)

Your Windows machine, by contrast, does not have a file named sub/file. Instead, it has, in C:\top\mid, a folder named sub, and in that sub folder, a file named file. So the full path of that file is C:\top\mid\sub\file. Git knows that the repository itself is C:\top\mid at this point and takes that part away, and constructs the name sub/file, with forward slash, to update its index copy of the file, when you run git add as appropriate.

So Git has a sort of flat file system, with files with "folder names" embedded right in the file names, and literal forward slashes. But the computer's file system has folders-and-files. Even if we move to macOS or Linux or whatever, we still have the folder-and-file arrangement; we just now have /top/mid/sub/file instead of the silly drive-letter things and the annoying backwards slashes.

Since Git actually makes new commits by writing out, to the repository, a commit containing all the files (names and contents) as listed in the index / staging-area, our job—whenever we're doing new work—consists of updating, and maybe adding and/or removing, OS-style files in our working tree, but then we have to tell Git to update its index. We do that part—the hey Git, I have new stuff now step—using git add and sometimes git rm or git rm --cached. This operation tells Git to look in the working tree—the folder-and-file stuff that the OS demands we use—from which Git will assemble its internal-format, ready-to-commit "blob" objects whose hash IDs and path names Git stashes in the index / staging-area.

The base problem

When we run any en-masse git add command, like:

git add .

we're telling Git to scan, recursively, all the folders and sub-folders we have at the current working directory. That is, Git will open (using the C library opendir function) the path . to read the current directory, where it will find .gitignore and sub. Using additional OS calls if and as needed, Git will find out that .gitignore is a file, and sub is a folder, and will get lstat data about the file and folder.

Git's index—which has a third term, cache—contains previously-obtained lstat data and Git can sometimes use this to very quickly determine that, e.g., the .gitignore file has not been modified, and therefore there is no need to replace the index copy of .gitignore with a new compressed and Git-ified file. But (with certain exceptions that have grown over time as the Git software has gotten more and more complicated), there's no entry in the index for a folder, so in general, Git is forced to open and read the sub folder, recursively, the same way it opened and read the . folder.

Having opened and read through sub, Git will find file, and Git will assemble the two pieces of name to get sub/file (even on Windows, where the OS wants to call it sub\file). As usual, the cached lstat data may or may not enable Git to quickly skip opening, reading, compressing, and generally Git-ify-ing the sub/file content. If not, Git opens and reads and compresses it, and checks to see if that content is already present anywhere in any commit anywhere in the repository.

All of this scanning and opening and reading is very slow. So for files that shouldn't be added, we prevent Git from bothering by listing their names in .gitignore. That's great for files—but for every folder in mid, Git has to open and read it, and for every sub-folder within that folder, Git has to open and read it, and so on recursively. Since Git is pretty well optimized, it turns out that this recursive scan of the directories is often the slowest part of git add.

To make this go much faster, Git tries to be clever. Suppose that we'll ultimately ignore everything in sub2 due to a line like sub2/** or sub2/* or sub2/. Then instead of opening and reading the folder sub2, Git can simply skip it entirely!

So, Git does that: if we tell Git that some directory aka folder should be ignored, Git skips opening and reading it entirely. This means that any files within that directory—even .gitignore files in it—are never even seen, and hence can't be obeyed.

That means that if you want Git to get into some directory (folder) to scan it, that part of the path, starting at . (corresponding to top/mid in our case) must not be ignored. Once it's not ignored, Git commits to opening and reading it, including any .gitignore file it contains. The rules within that .gitignore file are then temporarily added to the top level .gitignore and/or .git/info/exclude and/or core.excludesFile ignore rules (with higher priority, but forcibly limited to this sub-directory) while doing the recursive scan.

More detail

Keeping the above in mind—those rules cover what Git sees, and if Git doesn't see something, it can't possibly git add it—we now get to the individual .gitignore-file rules:

  • An entry can be a simple name or glob like sub2 or *.
  • An entry can be prefixed with a slash, or contain a slash, such as /sub2 or sub2/path. Parts of this can use glob characters like * or ** (with ** nominally meaning match across directories / folders, vs a single * that won't cross over a slash character).
  • An entry can be prefixed with !, making it negated. For ! to mean negation it must be the very first character, so if you want to prefix with both ! and / you must write !/, not /!.
  • An entry can end with /. This final slash has a particular meaning and doesn't affect the "prefixed with" or "contains" slash stuff.

The stuff about slashes, excluding those final slash characters, gets a bit messy. I like to use the terms anchored and un-anchored to distinguish between these: a name like sub2 or pattern like * is un-anchored, but a name like sub2/path or /sub2 or /* is anchored. However, */ is not anchored since the slash is the last character.

The final slash, if present, means "only if this is a directory". So sub2/ means "sub2, but only if sub2 is actually a directory" and */ means "everything, but only if it's a directory".

Now we get into how Git views these ignore rules. Remember, at the point that Git is scanning through some directory (folder) like . or sub, it's already read in the appropriate .gitignore file and has converted the rules to the internal form, so that it knows:

  • this rule applies only to directories, or not (had a trailing / which is now removed);
  • this rule is anchored, or not (did or didn't have another /);
  • is negated, or not (did or didn't start with ! which is now removed);
  • in which level the .gitignore appeared (e.g., was it sub/.gitignore or sub2/.gitignore?—this information can technically be compressed down to a single integer indicating how deep we are in the recursive traversal, but you can think of it as a path, if that makes it easier to think about).

Git now reads each entry in the directory, one at a time. Each entry names either a file—including a symbolic link, which Git treats "as if" it were a file whose contents are the symlink target—or is a folder/directory. (On systems like Linux that have "socket files" and "device special files" and the like, if Git encounters one, it just skips over it and pretends it's not there—Git can't deal with these.)

Having read the entry's name, Git has both the short and simple name (file or d.ext for instance) and the constructed full path (sub/file, if we're reading sub, or sub2/a/b/c/d.ext or whatever, if we're reading sub2/b/c for instance). Git now checks to see if the entry matches, which depends on the anchored-ness:

  • If the entry is not anchored, it matches if the simple name (file or d.ext) matches this unanchored rule, provided that any "must be a directory" thing matches.

  • If the entry is anchored, the full path name must match the anchored rule, excluding whatever part gets excluded based on depth. For instance if we're looking in sub2/b/c and there's a sub2/b/.gitignore that says c/d.ext, we match here if this is d.ext, but not if the entry says x/d.ext: the part we take away from the full path is sub2/b/ since that's where the .gitignore rule came from).

[Note that ** matching gets kind of complicated here, and occasionally the (quite hairy) .gitignore code that tries to speed this up gets this wrong in test releases. The internal test suite for Git has gotten complicated to try to catch such bugs.]

If the entry doesn't match, we move on. If it does match, it gets remembered, and we move on. We do this for every .gitignore entry, and take the last match, whatever that is, or we have no match.

If we have no match, the file or directory is not ignored. We'll consider git add-ing it if it's a file, or recursively scanning it.

If we have a match, the file or directory is ignored unless it's a negated rule: for a negated rule we pretend we didn't have a match.

That's the whole set of rules. Note that there are no exceptions for, e.g., "there's a rule that says don't bother reading sub2 even though there's an additional negated rule that says to keep sub2/important.file". I'd argue that Git should do this automatically for you, at least for constant strings (glob matchers like * and ** might make it too hard).

Some general helpful hints

The usual problem is that Git ignores a directory we want it to search. We can—at a cost—tell Git never ignore any directory at all with the simple rule:

!*/

That's a negated, un-anchored rule. Putting this as the last entry in each .gitignore means that Git will search all the sub-directories it finds at this level, or any lower level that didn't override this rule with its own .gitignore.

This completely defeats the (sometimes very important) optimization that lets Git not scan entire sub-trees of files.

A more targeted trick is that, if there's some path:

!keep/this/important.file

you can prefix that with:

!keep/
!keep/this/

to make sure that Git searches inside keep, and then keep/this/, assuming keep/.gitignore does not exist or does not override the keep/this/ entry.

  • Related