For context, I'm experiencing this and trying to solve it: Git in Visual studio code says file is modified even when there is no change I am using cygwin on a Windows 10 machine, but all my coworkers are using macs.
The highest voted answer says to git config core.filemode false
, but I can't figure out the consequences of doing that. Is it safe? Does it mean if I create a shell script, pushing it will lose the executable bit? Does it mean when I pull a new executable, it will lose the executable bit? What are the gotchas, if any?
I've checked the documentation, but it doesn't answer that question either, it just explains when you'd need to change it.
CodePudding user response:
It seems that git only cares about the executable bit, so a file in git could only be 644 or 755. source code
I've just done a test:
$ mkdir test && cd test && git init
$ touch before && chmod a x before && git add before && git commit -m 'before' && git ls-tree HEAD
> [master (root-commit) 1cb9c41] before
1 file changed, 0 insertions( ), 0 deletions(-)
create mode 100755 before
100755 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 before
$ git config core.fileMode false
$ touch after && chmod a x after && git add after && git commit -m 'after' && git ls-tree HEAD
> [master b4d7a48] after
1 file changed, 0 insertions( ), 0 deletions(-)
create mode 100644 after
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 after
100755 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 before
As you can see, before the core.fileMode
change, git keeps the file's executable bit(0755), while after the change, newly created files lost the executable bit(0644), while old files keep the old executable bit.
So, in summary:
With git config core.filemode false
, git ignores the executable bit change on the local repository. Since git only cares about the executable bit, this won't lead to a 0000 file, but a 0644 file.
Does it mean if I create a shell script, pushing it will lose the executable bit?
Yes
Does it mean when I pull a new executable, it will lose the executable bit?
It depends on your file system. Some file systems such as NTFS change every files' permission to 0777 while others may lose the executable bit.
CodePudding user response:
TL;DR
Setting core.filemode
to false
makes Git ignore the executable bit of st_mode
lstat()
results on files in your working tree. Instead, the mode of any existing index (staging area) entry is retained, unless you use git update-index --chmod
. New file index entries get mode 100644
. This is sensible primarily when the lstat()
emulation on your own system does not support modes correctly.
Long
It's generally wrong to change any of the core.*
settings, including core.fileMode
(or core.filemode
—the documentation is not consistent about whether to give it an uppercase M
, but in fact it does not matter anyway). There are some special cases where you can set it manually, and here your question is the right one: What, precisely, does this do?
To answer that, we have to start with what "file modes" are in the first place, and how Git determines them. A file mode, in Git, is really " x" or "-x" on a committed or to-be-committed blob object, i.e., ordinary file. In Git, files—or rather, file contents—are stored in commits as these "blob object" things: compressed, de-duplicated, and all-around-read-only, found by hash ID.1 But this is just the file's data, not its x or -x state, so where does that come from?
Well, if we run git ls-files --stage
and look at some files that are and are not executable, we find that the ones that aren't executable are shown as:
100644 <hash> 0 <name>
while the ones that are executable are shown as:
100755 <hash> 0 <name>
That 100644
or 100755
is the mode
. It's stored in a Git tree object, which Git builds at the time we run git commit
(though we can build some earlier using git write-tree
). The tree object stores both the file's name and this mode, just like the index / staging-area does.2 (The index, or staging area, is what git ls-files --stage
displays.)
So, the modes are 100644
= -x
and 100755
= x
. That leaves us with another mystery: why are they these weird numbers? This is where the how does Git determine these question comes in.
Since Git was originally written for Linux and other Unix-like systems, Git depends heavily on the lstat
system call. Some other non-Unix systems don't have this as an actual system call, but most at least fake it in some sort of compatibility library. (See, e.g., What is lstat() alternative in windows?) The Unix stat
family of calls fill in a struct stat
in C, and this structure contains a field, st_mode
. The st_mode
field is composed of various composable bits:
Permissions: these are the lowest three octal digits. A file that's
rw-r--r--
has644
in these bits. A file that'srwxr-xr-x
has755
in these bits.Three bits that don't apply to Git: these occupy the next higher octal digit. Since they don't apply to Git, we get a zero here, always (if the OS provides a nonzero value, Git just masks it away). That is, we'll see
0644
or0755
, for instance, once we include the bottom three octal digits."format" bits (
S_IFMT
), in the first few octal digits (e.g., the10
or04
in10xxxx
or04xxxx
): these determine whether the entity is a file, a directory, a symbolic link, and various other inapplicable cases. A directory has bits04
in this field, and a regular file has bits10
in this field. So a directory, after masking with these bits, winds up beingmode 040xxx
, for some permission bitsxxx
. A file winds up being mode100xxx
, for some permissions bitsxxx
.
When we combine these, we see two of those modes that Git shows: 100755
for an executable file, and 100644
for a non-executable regular file. Of course a directory's st_mode
will be 040755
or 040700
or some such, but Git doesn't bother with read/write/execute bits on directories, so it just masks them away: here, we see the third mode that Git shows, 040000
for tree objects linked to another tree object.4 This is also the source for the symlink
entry mode of 120000
: the S_IFMT
bits here are 12
on Linux and Unix. The commit
or gitlink entry type, 160000
, does not correspond to any Linux/Unix mode, but is the bitwise result of OR-ing together the S_IFDIR
and S_IFLNK
mode bits (120000|040000
).
So this is where all the mode entries in the index come from: they're straight out of the st_mode
field of a struct stat
, as filled-in by lstat
, with the following changes:
For a tree object, permissions are irrelevant and are zeroed out. (Tree objects do not appear in the index in the first place; they're created on demand by
git write-tree
when a file name requires one.) The same holds for symlinks—where on Unix-like system the permission bits are generally ignored—and for gitlinks (which are internal to Git anyway).For a file, the user, group, and other read and write bits are pretended to be
rw-r--r--
always, regardless of the actual mode of the underlying file. The presence of anx
bit causes all threex
bits to be set in the index mode.5
This accommodates historical mistakes (see footnote 5) and is therefore somewhat messy. It would be much simpler if the storage format simply held the file type and, for files, x
or -x
, for instance, but it does also leave room for future expansion (e.g., the entire setuid setgid sticky set of 3 bits is currently always zero, so nonzero values could acquire meanings).
All of this makes sense in a Unix-like environment, where the mode bits are preserved in ordinary on-disk files. But in other systems, the lstat
mode bits are literally faked. Windows is the canonical example here. There is no "executable bit", so lstat
-ing a file on Windows must either show all files as executable, or no files as executable, if we're to make up an arbitrary x
bit result.
Hence, when you run git init
to create a new repository, Git probes the system's underlying behavior. Git creates a file with an OS "create new file" call (open(name, O_CREAT|other_open_flags, mode)
) with mode 0644. It then tries using an OS chmod
call to change the mode to 0755, and then uses an OS lstat
call to see if the change "sticks".6 If so, the OS must honor x
bits, so Git will set core.filemode
to true
. If not, the OS must disregard x
bits, so Git will set core.filemode
to false
.
Later, if core.filemode
is false, Git will call lstat
as usual to get stat data for each file, but will completely ignore the three x
bits in the st_mode
result. It will read the existing index entry for that file to get the x
bits to set in any new updated index entry for that file. The one exception to this rule is the git update-index
operation, where the user can specify an entire mode, or use the --chmod
flag:
git update-index --chmod= x path/to/file.ext
This grabs the existing index entry, checks that it's for a file (mode 100xxx
), and if so, replaces the xxx
part with 755
: the file is now marked x
. Similarly, --chmod=-x
replaces the xxx
part with 644
(again only for regular files; you cannot --chmod
a symlink or gitlink).
If core.filemode
is true, however, any ordinary git add
on a file will read and obey the working tree's x
bits. If lstat
has st_mode
set to 100700
, for instance, the index entry will become 100755
. If lstat
has st_mode
set to 100444
, the index entry becomes 100644
.
That is, in C-like code that doesn't quite match the internals of Git, the new mode, for any ordinary file, is:
ce = lookup_existing_cache_entry(path);
if (core_filemode) {
// Note: the link in banyudu's answer goes to code
// that checks `& 0100`, not `& 0111`. Perhaps Git
// only inspects the user's bit.
new_mode = st.st_mode & 0111 ? 100755 : 100644;
} else {
new_mode = ce != NULL && ce->ce_mode == 100755 ? 100755 : 100644;
}
Once the file is added, the cache entry (index) mode
field is set to new_mode
.
1The hash ID of the blob object is determined strictly by the contents: it's a checksum of the data prefixed by the word blob
, an ASCII space (0x20), the data size in bytes expressed in decimal, and an ASCII NUL (0x00) byte. The checksum function is currently SHA-1 although an upcoming Git change will start using SHA-256. This hashing is in fact how the de-duplication works: given the same byte sequence, Git produces the same hash ID. So if the literal text hello world
plus a newline CTRL-J byte is stored in Git as a blob object, using SHA-1, we have:
$ printf 'blob 12\0hello world\n' | shasum
3b18e512dba79e4c8300dd08aeb37f8e728b8dad -
so we see that every file containing just the one line hello world
has blob hash ID 3b18e512dba79e4c8300dd08aeb37f8e728b8dad
, in every Git repository everywhere. Try it:
$ echo 'hello world' > hello.txt
$ git add hello.txt
$ git ls-files --stage hello.txt
100644 3b18e512dba79e4c8300dd08aeb37f8e728b8dad 0 hello.txt
Note the blob hash ID, 3b18e512dba79e4c8300dd08aeb37f8e728b8dad
, is just what we calculated it would be.
2There are some important differences between tree entries and index entries. In particular, an index entry has the file's full name spelled out complete with forward slashes, so that, e.g., file path/to/file.ext
is just that: path/to/file.ext
in the index.3 But as a set of tree objects, Git breaks this up into pseudo-directories, so that we have path
, to
, and file.ext
. The path
part is stored in the top level tree of the commit; the to
part is stored as a subtree of the path
tree; and the file.ext
part is stored as a blob entry in the to
tree. The top level tree has a subtree entry named path
that holds the hash ID of the subtree that holds the name to
and the hash ID of the subtree that holds the name file.ext
. (Whew!) This is easier seen by working from the bottom up, recursively:
We build a tree at the bottom level holding
100644 file.ext
and any other names under theto
name. We store this tree object in the objects database, finding its internal hash ID.Now we build another tree holding
40000 to
and the hash ID of the tree we just built, along with any other entries needed to go underpath
.Finally, we build a tree holding
40000 path
and the hash ID of the tree we built in the middle step, plus any other entries needed to go in the top level.
This set of trees is what git write-tree
builds, using whatever is in Git's index at this time. The git write-tree
program then emits the hash ID of the top level tree, which is what goes into the commit object that git commit-tree
builds.
3The current index format uses compression tricks to avoid repeating leading strings. See the technical documentation for details.
4The leading zero is stripped in the modes stored in the tree
object, but re-inserted for display purposes in git ls-tree -r
output, for instance.
5In very early versions of Git, more mode bits were preserved into the Git mode
field. This turned out to be a mistake. Today, for backwards compatibility, Git allows an existing mode
of 100664
(rw-rw-r--
), but will never create any new ones, so that existing Git repositories that date back to this early version of Git can be read.
6If I remember right, the actual test consists of: stat the file, flip all the X bits (new_mode = old_mode ^ 0111
), chmod, stat again, and see if the result changed. If so, at least one X bit is obeyed. If not, no X bit is obeyed.