Size of .git folder order of magnitude greater than committed files-CodePudding

I have a folder project with large number of files and subfolders. I have created a repository of this folder via git init to obtain the folder structure below.

project
    --- .git/
    --- large number of files and folders .gitignore'd
    --- very few text and related files not under .gitignore
    --- .gitignore

The very first (and thus far only) commit in the repository only contained a few text and related files not .gitignore'd.

The raw size of the committed files on my disk (working tree) is just a few kilobytes.

More specifically, the committed files are:

3 .tex files of total size 9 KB
4 .lyx files of total size 32 KB
1 .gitignore file of size 1 KB
3 other .txt files of total size 4 KB

Yet, at this stage, the raw size of the .git folder is 84 MB. The size of the project folder itself is around 5 GB, most of which are .gitignore'd.

Is there a way I can try to figure out what is causing this large gap between the actual committed files and the size of the .git folder?

CodePudding user response：

If you made a prior commit with many more files, and then re-wrote it, the commit is still in the repo until it is garbage collected. But I'll take your word for it that you didn't do that:

The very first (and thus far only) commit in the repository only contained a few text and related files not .gitignore'd.

Therefore, the simplest explanation for this is that you staged a large number of files before getting your .gitignore file setup properly. Even staging files without committing them will take up space in the repository, at least temporarily. You can easily prove this is the cause with the prune command:

git prune -n # dry run, show what would be removed
# and to actually do it
git prune

Then check your repo size again.

Side Note: under normal circumstances you don't need to run the prune command because it happens automatically during garbage collection, however the prune default is 2 weeks. So if you wish to use gc to force a full pruning, then you could use:

git gc --prune=now

Side Side Note: I always advise people to commit early and often, because if they ever really mess something up, they can traverse their reflog to find old unreachable commits to recover lost (but previously committed) work. Since by default even unpacked objects sit around for 2 weeks, you could potentially recover files that were only staged in the last 2 weeks but never committed.