I have a folder project
with large number of files and subfolders. I have created a repository of this folder via git init
to obtain the folder structure below.
project
--- .git/
--- large number of files and folders .gitignore'd
--- very few text and related files not under .gitignore
--- .gitignore
The very first (and thus far only) commit in the repository only contained a few text and related files not .gitignore
'd.
The raw size of the committed files on my disk (working tree) is just a few kilobytes.
More specifically, the committed files are:
3 .tex files of total size 9 KB
4 .lyx files of total size 32 KB
1 .gitignore file of size 1 KB
3 other .txt files of total size 4 KB
Yet, at this stage, the raw size of the .git
folder is 84 MB. The size of the project
folder itself is around 5 GB, most of which are .gitignore
'd.
Is there a way I can try to figure out what is causing this large gap between the actual committed files and the size of the .git
folder?
CodePudding user response:
If you made a prior commit with many more files, and then re-wrote it, the commit is still in the repo until it is garbage collected. But I'll take your word for it that you didn't do that:
The very first (and thus far only) commit in the repository only contained a few text and related files not .gitignore'd.
Therefore, the simplest explanation for this is that you staged a large number of files before getting your .gitignore
file setup properly. Even staging files without committing them will take up space in the repository, at least temporarily. You can easily prove this is the cause with the prune
command:
git prune -n # dry run, show what would be removed
# and to actually do it
git prune
Then check your repo size again.
Side Note: under normal circumstances you don't need to run the prune
command because it happens automatically during garbage collection, however the prune default is 2 weeks. So if you wish to use gc
to force a full pruning, then you could use:
git gc --prune=now
Side Side Note: I always advise people to commit early and often, because if they ever really mess something up, they can traverse their reflog
to find old unreachable commits to recover lost (but previously committed) work. Since by default even unpacked objects sit around for 2 weeks, you could potentially recover files that were only staged in the last 2 weeks but never committed.