Home > front end >  Editing .gitattributes to normalize line endings doesn't work?
Editing .gitattributes to normalize line endings doesn't work?

Time:09-03

I am trying to normalize line endings for my git repo, specifically, I am trying to make all .sh files use LF (Unix-style endings). When I researched about it, I figured out I should add this to my .gitattributes file:

*.sh eol=lf

That doesn't work... It doesn't change .sh file endings to LF.. Could anyone tell me what I should do? Thank You!

I also tried * text=auto eol=lf and git config --global core.autocrlf false

CodePudding user response:

TL;DR

Use git add --renormalize (e.g., git add --renormalize .).

Long

The directives in a .gitattributes file just tell Git what to do. They don't actually do it. As a result, simply changing an existing .gitattributes file, or adding a new one, has no effect yet.

Each attribute you can specify has some particular meaning. Unfortunately, the CRLF-line-endings attributes meanings are blurry:

  • there are contradictory settings;
  • there is a lot of history: Git did certain things in the past, but it turned out to be terrible; but existing behavior can't be changed for compatibility reasons, so it still does that, but now it does these other things too; and
  • it's very hard to observe these things actually happen.

The result of the middle item is that what to set depends on which Git version(s) you're using. There are some general rules that always apply though, and you should keep these in mind as you experiment. Also, keep in mind that as initially written, Git never messed with line endings at all, and that's usually the default setting in most initial installations.

Three copies of files

First: Git has two kinds of transformation. To explain this properly, we must start a bit further back, with the difference between committed files, files in Git's index or staging area, and files in your working tree.

When you're working with files in Git, there's always1 a current commit. That's the commit you selected when you ran git checkout or git switch, for instance, to pick the branch main or develop or whatever. That branch name has a distinguished latest commit,2 and that's the commit you are now using. Every commit contains a full snapshot of every file,3 as a sort of permanent archive. Most version control systems do the same thing, though each VCS has its own way of doing this: one key point of version control is, after all, to be able to get back the old versions.

So, one copy of every file, for the current commit, is stored, permanently and unchangeably, in that particular commit.4 This copy can have either kind of line ending, especially if you have not told Git to mess with line endings. These copies will have whatever line endings they have, and they cannot be changed. They are immutable!

In order to use a version control system that stores an immutable copy of every file—such as Git—that version control system has to provide you with a mutable copy too. So Git does this too: when you check out some commit, making that commit the current commit, Git copies all the files out of that commit into your working tree.

Now, this is a deliberate overstatement: Git will sometimes short-cut things by doing nothing, whenever Git (thinks it) can get away with it, because doing nothing is so much faster and more efficient than doing something. (Exercise: which is a faster way to get home, when you're already home: driving to the airport, flying to Paris and back, and driving home, or just staying home?) The details on when Git thinks it can get away with it get fairly complicated, and aren't completely relevant to understanding the line-ending issues. But we'll come back to this in a bit. For now, this is where the third copy of each file comes from: Git wrote it into your working tree when you checked out the commit.

The weird thing about Git—compared to many / most other version control systems at least—is that Git keeps an in-between copy of each file too. This in-between copy is in the internal format, ready to go into a new commit. But unlike the committed copy of a file, which is entirely immutable, Git lets you replace the internal copy, using git add.


1Well, almost always. In a new, totally empty repository, for instance, there are no commits yet. That means there's no current commit. You can re-create this oddball situation using git checkout --orphan or git switch --orphan as well. Whenever necessary, Git handles this internally by using its semi-secret empty tree as if it were the current commit, or more precisely, the current commit's tree.

2The act of adding a new commit simply adds a new latest commit, updating the branch name appropriately. The commit that was latest becomes the second-latest (except that when using git commit --amend, the commit that was latest is kicked off the end of the branch instead).

3These snapshots are stored in a special, read-only, Git-only, compressed and—importantly—de-duplicated format, so that the repository doesn't bloat up even though—or perhaps I should say because—most commits mostly re-use earlier file contents.

4Technically, the storage is indirect, via a tree object—hence the fake empty tree.


Two kinds of transformation

Start with the the "three copies" model, where for each file, there's one in the current commit, one in Git's index / staging-area, and one in your working tree:

HEAD (commit)   staging area   working tree
-------------   ------------   ------------
README.md       README.md      README.md
Makefile        Makefile       Makefile
main.py         main.py        main.py

Remember that git checkout or git switch has to copy the committed files. Those copies go into both the index / staging-area and the working tree.

When Git is copying a file "out" like this, that's a great time for Git to do some transformations.

The git add command takes a working tree file—presumably, one you've updated—and compresses it and checks for duplicate contents. Then it updates Git's index/staging-area with the new copy. This compression requires reading the working tree copy, and writing to the index copy. That's also a great time for Git to do some transformations.

When you run git commit, to make a new commit, Git simply packages up the prepared index contents, without doing any transformations at all on the content of each file. So Git doesn't do any transformation here at all.

All the end-of-line transformations, then, happen in the copying from index to working tree or from working tree to index. In other words, they happen while checking out (or git restore-ing) a file, and while git add-ing a file. That's the natural place to put them, so that's where Git puts them.

Next, Git chooses—I emphasize that this is a choice, because it is—that if it's going to mess with line endings, it will always store the frozen archival committed copies with LF-only, Unix-style line endings. If you want Git to store, in the committed copies, CRLF / Windows-style endings, you can't have Git do this for you. That's just the way this is, and unlike some of Git's other earlier mess-with-line-endings decisions, nobody seems to have serious objections to this, so I don't see it changing in the future.5

The two kinds of transformation that Git is willing to do, then, are:

  • while copying to working tree (checkout et al), Git can turn LF-only to CRLF;
  • while copying from working tree (git add, mostly), Git can turn CRLF to LF-only.

All of the settings you can set are all about when to have Git do this. The text setting tells Git that it should treat a file as text (as opposed to binary, which should never be mangled this way), and the eol= tells Git what kind of line endings should appear in your working tree (with implied LF-only in the archival copies).

You can enable just the git add side CRLF-to-LF operation with the input setting, or you can enable both sides—checkout and git add—with the other settings. And that's it! Well, almost.


5On the other hand, it is difficult to make predictions, especially about the future.


Git tries to be lazy

Remember how I mentioned that Git will do nothing, if it thinks it can get away with it. Unfortunately, when you change the .gitattributes setting or other settings for line endings, Git often still thinks—incorrectly this time!—that it can get away with doing nothing.

When you run git add path/to/file, or even git add path\to\file on Windows, Git looks at cache information it stores in its index / staging-area entry for the file path/to/file (that's the file's name, complete with the forward slashes). This information hints to Git as to whether you've modified the file since Git extracted it earlier. If Git decides, based on this, that you haven't modified the file, Git does not bother to re-add the file.

This has historically been a big problem, so git add gained the --renormalize option in Git version 2.16. This option acts as a sort of big hammer, defeating the optimization entirely.

Seeing what's really in the repository

Whenever you look at a file by viewing it in an editor or file-browser, you're seeing the working tree copy. This isn't the copy that's in any commit! In particular, it's been de-compressed and un-de-duplicated, so that you can see it in the first place. In the process, it may have its line endings messed-with.

To view what's really in the repository, it's helpful to know that git cat-file -p will extract a blob object's data without doing any text conversions (unless, that is, you add --textconv and appropriate extra information). So if you have a hex editor, you can use git cat-file -p to read out the raw committed-file contents, and view those in your hex editor, to see what's actually in the repository. You can do the same with blob-ized contents in Git's index / staging-area. Remember that git rev-parse can tell you hash IDs:

git rev-parse HEAD:Makefile
git rev-parse :Makefile

(and of course git ls-files --stage dumps out the entire index contents, including the hash IDs). Most Git commands, including git cat-file, can take revision IDs that include pathspecs, so git cat-file -p HEAD:Makefile also works. (This also provides the path information for --textconv.)

(You'll always want something like a hex editor, so that your file-viewer doesn't make invisible characters invisible.)

  • Related