Home > database >  What are the consequences of git config core.filemode false?
What are the consequences of git config core.filemode false?

Time:09-27

For context, I'm experiencing this and trying to solve it: Git in Visual studio code says file is modified even when there is no change I am using cygwin on a Windows 10 machine, but all my coworkers are using macs.

The highest voted answer says to git config core.filemode false, but I can't figure out the consequences of doing that. Is it safe? Does it mean if I create a shell script, pushing it will lose the executable bit? Does it mean when I pull a new executable, it will lose the executable bit? What are the gotchas, if any?

I've checked the documentation, but it doesn't answer that question either, it just explains when you'd need to change it.

CodePudding user response:

It seems that git only cares about the executable bit, so a file in git could only be 644 or 755. source code

I've just done a test:

$ mkdir test && cd test && git init
$ touch before && chmod a x before && git add before && git commit -m 'before' && git ls-tree HEAD

> [master (root-commit) 1cb9c41] before
 1 file changed, 0 insertions( ), 0 deletions(-)
 create mode 100755 before
100755 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    before

$ git config core.fileMode false

$ touch after && chmod a x after && git add after && git commit -m 'after' && git ls-tree HEAD

> [master b4d7a48] after
 1 file changed, 0 insertions( ), 0 deletions(-)
 create mode 100644 after
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    after
100755 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    before

As you can see, before the core.fileMode change, git keeps the file's executable bit(0755), while after the change, newly created files lost the executable bit(0644), while old files keep the old executable bit.

So, in summary:

With git config core.filemode false, git ignores the executable bit change on the local repository. Since git only cares about the executable bit, this won't lead to a 0000 file, but a 0644 file.

Does it mean if I create a shell script, pushing it will lose the executable bit?

Yes

Does it mean when I pull a new executable, it will lose the executable bit?

It depends on your file system. Some file systems such as NTFS change every files' permission to 0777 while others may lose the executable bit.

CodePudding user response:

TL;DR

Setting core.filemode to false makes Git ignore the executable bit of st_mode lstat() results on files in your working tree. Instead, the mode of any existing index (staging area) entry is retained, unless you use git update-index --chmod. New file index entries get mode 100644. This is sensible primarily when the lstat() emulation on your own system does not support modes correctly.

Long

It's generally wrong to change any of the core.* settings, including core.fileMode (or core.filemode—the documentation is not consistent about whether to give it an uppercase M, but in fact it does not matter anyway). There are some special cases where you can set it manually, and here your question is the right one: What, precisely, does this do?

To answer that, we have to start with what "file modes" are in the first place, and how Git determines them. A file mode, in Git, is really " x" or "-x" on a committed or to-be-committed blob object, i.e., ordinary file. In Git, files—or rather, file contents—are stored in commits as these "blob object" things: compressed, de-duplicated, and all-around-read-only, found by hash ID.1 But this is just the file's data, not its x or -x state, so where does that come from?

Well, if we run git ls-files --stage and look at some files that are and are not executable, we find that the ones that aren't executable are shown as:

100644 <hash> 0       <name>

while the ones that are executable are shown as:

100755 <hash> 0       <name>

That 100644 or 100755 is the mode. It's stored in a Git tree object, which Git builds at the time we run git commit (though we can build some earlier using git write-tree). The tree object stores both the file's name and this mode, just like the index / staging-area does.2 (The index, or staging area, is what git ls-files --stage displays.)

So, the modes are 100644 = -x and 100755 = x. That leaves us with another mystery: why are they these weird numbers? This is where the how does Git determine these question comes in.

Since Git was originally written for Linux and other Unix-like systems, Git depends heavily on the lstat system call. Some other non-Unix systems don't have this as an actual system call, but most at least fake it in some sort of compatibility library. (See, e.g., What is lstat() alternative in windows?) The Unix stat family of calls fill in a struct stat in C, and this structure contains a field, st_mode. The st_mode field is composed of various composable bits:

  • Permissions: these are the lowest three octal digits. A file that's rw-r--r-- has 644 in these bits. A file that's rwxr-xr-x has 755 in these bits.

  • Three bits that don't apply to Git: these occupy the next higher octal digit. Since they don't apply to Git, we get a zero here, always (if the OS provides a nonzero value, Git just masks it away). That is, we'll see 0644 or 0755, for instance, once we include the bottom three octal digits.

  • "format" bits (S_IFMT), in the first few octal digits (e.g., the 10 or 04 in 10xxxx or 04xxxx): these determine whether the entity is a file, a directory, a symbolic link, and various other inapplicable cases. A directory has bits 04 in this field, and a regular file has bits 10 in this field. So a directory, after masking with these bits, winds up being mode 040xxx, for some permission bits xxx. A file winds up being mode 100xxx, for some permissions bits xxx.

When we combine these, we see two of those modes that Git shows: 100755 for an executable file, and 100644 for a non-executable regular file. Of course a directory's st_mode will be 040755 or 040700 or some such, but Git doesn't bother with read/write/execute bits on directories, so it just masks them away: here, we see the third mode that Git shows, 040000 for tree objects linked to another tree object.4 This is also the source for the symlink entry mode of 120000: the S_IFMT bits here are 12 on Linux and Unix. The commit or gitlink entry type, 160000, does not correspond to any Linux/Unix mode, but is the bitwise result of OR-ing together the S_IFDIR and S_IFLNK mode bits (120000|040000).

So this is where all the mode entries in the index come from: they're straight out of the st_mode field of a struct stat, as filled-in by lstat, with the following changes:

  • For a tree object, permissions are irrelevant and are zeroed out. (Tree objects do not appear in the index in the first place; they're created on demand by git write-tree when a file name requires one.) The same holds for symlinks—where on Unix-like system the permission bits are generally ignored—and for gitlinks (which are internal to Git anyway).

  • For a file, the user, group, and other read and write bits are pretended to be rw-r--r-- always, regardless of the actual mode of the underlying file. The presence of an x bit causes all three x bits to be set in the index mode.5

This accommodates historical mistakes (see footnote 5) and is therefore somewhat messy. It would be much simpler if the storage format simply held the file type and, for files, x or -x, for instance, but it does also leave room for future expansion (e.g., the entire setuid setgid sticky set of 3 bits is currently always zero, so nonzero values could acquire meanings).

All of this makes sense in a Unix-like environment, where the mode bits are preserved in ordinary on-disk files. But in other systems, the lstat mode bits are literally faked. Windows is the canonical example here. There is no "executable bit", so lstat-ing a file on Windows must either show all files as executable, or no files as executable, if we're to make up an arbitrary x bit result.

Hence, when you run git init to create a new repository, Git probes the system's underlying behavior. Git creates a file with an OS "create new file" call (open(name, O_CREAT|other_open_flags, mode)) with mode 0644. It then tries using an OS chmod call to change the mode to 0755, and then uses an OS lstat call to see if the change "sticks".6 If so, the OS must honor x bits, so Git will set core.filemode to true. If not, the OS must disregard x bits, so Git will set core.filemode to false.

Later, if core.filemode is false, Git will call lstat as usual to get stat data for each file, but will completely ignore the three x bits in the st_mode result. It will read the existing index entry for that file to get the x bits to set in any new updated index entry for that file. The one exception to this rule is the git update-index operation, where the user can specify an entire mode, or use the --chmod flag:

git update-index --chmod= x path/to/file.ext

This grabs the existing index entry, checks that it's for a file (mode 100xxx), and if so, replaces the xxx part with 755: the file is now marked x. Similarly, --chmod=-x replaces the xxx part with 644 (again only for regular files; you cannot --chmod a symlink or gitlink).

If core.filemode is true, however, any ordinary git add on a file will read and obey the working tree's x bits. If lstat has st_mode set to 100700, for instance, the index entry will become 100755. If lstat has st_mode set to 100444, the index entry becomes 100644.

That is, in C-like code that doesn't quite match the internals of Git, the new mode, for any ordinary file, is:

ce = lookup_existing_cache_entry(path);
if (core_filemode) {
    // Note: the link in banyudu's answer goes to code
    // that checks `& 0100`, not `& 0111`.  Perhaps Git
    // only inspects the user's bit.
    new_mode = st.st_mode & 0111 ? 100755 : 100644;
} else {
    new_mode = ce != NULL && ce->ce_mode == 100755 ? 100755 : 100644;
}

Once the file is added, the cache entry (index) mode field is set to new_mode.


1The hash ID of the blob object is determined strictly by the contents: it's a checksum of the data prefixed by the word blob, an ASCII space (0x20), the data size in bytes expressed in decimal, and an ASCII NUL (0x00) byte. The checksum function is currently SHA-1 although an upcoming Git change will start using SHA-256. This hashing is in fact how the de-duplication works: given the same byte sequence, Git produces the same hash ID. So if the literal text hello world plus a newline CTRL-J byte is stored in Git as a blob object, using SHA-1, we have:

$ printf 'blob 12\0hello world\n' | shasum
3b18e512dba79e4c8300dd08aeb37f8e728b8dad  -

so we see that every file containing just the one line hello world has blob hash ID 3b18e512dba79e4c8300dd08aeb37f8e728b8dad, in every Git repository everywhere. Try it:

$ echo 'hello world' > hello.txt
$ git add hello.txt
$ git ls-files --stage hello.txt
100644 3b18e512dba79e4c8300dd08aeb37f8e728b8dad 0       hello.txt

Note the blob hash ID, 3b18e512dba79e4c8300dd08aeb37f8e728b8dad, is just what we calculated it would be.

2There are some important differences between tree entries and index entries. In particular, an index entry has the file's full name spelled out complete with forward slashes, so that, e.g., file path/to/file.ext is just that: path/to/file.ext in the index.3 But as a set of tree objects, Git breaks this up into pseudo-directories, so that we have path, to, and file.ext. The path part is stored in the top level tree of the commit; the to part is stored as a subtree of the path tree; and the file.ext part is stored as a blob entry in the to tree. The top level tree has a subtree entry named path that holds the hash ID of the subtree that holds the name to and the hash ID of the subtree that holds the name file.ext. (Whew!) This is easier seen by working from the bottom up, recursively:

  • We build a tree at the bottom level holding 100644 file.ext and any other names under the to name. We store this tree object in the objects database, finding its internal hash ID.

  • Now we build another tree holding 40000 to and the hash ID of the tree we just built, along with any other entries needed to go under path.

  • Finally, we build a tree holding 40000 path and the hash ID of the tree we built in the middle step, plus any other entries needed to go in the top level.

This set of trees is what git write-tree builds, using whatever is in Git's index at this time. The git write-tree program then emits the hash ID of the top level tree, which is what goes into the commit object that git commit-tree builds.

3The current index format uses compression tricks to avoid repeating leading strings. See the technical documentation for details.

4The leading zero is stripped in the modes stored in the tree object, but re-inserted for display purposes in git ls-tree -r output, for instance.

5In very early versions of Git, more mode bits were preserved into the Git mode field. This turned out to be a mistake. Today, for backwards compatibility, Git allows an existing mode of 100664 (rw-rw-r--), but will never create any new ones, so that existing Git repositories that date back to this early version of Git can be read.

6If I remember right, the actual test consists of: stat the file, flip all the X bits (new_mode = old_mode ^ 0111), chmod, stat again, and see if the result changed. If so, at least one X bit is obeyed. If not, no X bit is obeyed.

  • Related