I know that files I backup to gitlab
are python
script files and jupyter
notebooks. However, my gitlab repo says I'm currently using 9.8GB (shocking!).
I really do not intend to commit large file to the repo
(e.g. data files). Visual inspection doesn't show me those large files so I can remove them. All I see are the python scripts files.
How do I clean my repo
free of those large files?
CodePudding user response:
The large files commit history is still available with gitlab, even though you deleted those 'large files'. You can view those files list using the following script from this answer.
git rev-list --objects --all |
git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' |
sed -n 's/^blob //p' |
sort --numeric-sort --key=2 |
cut -c 1-12,41- |
$(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest
First create a script file give and the file permission to executable as:
vim history.sh # and paste the above script into the file
chmod x history.ch # give file exe permission
./history.sh # to run file
This will report all the commit history and file sizes like so:
....
192e100aaf93 2.8MiB SMF/Checking/models/Model_0.h5
1b808a1a25ba 2.8MiB SMF/Checking/models/Model_2.h5
80168dc7ffb54 1.3GiB SMF/data/segments_instances_final.csv
775b60418498 1.5GiB Revised_KerasData_NoSmoothing.pickle
2341792d8c9b 4.2GiB geolife.sql
......
To Delete large files
Use the BFG-repo-cleaner to clean those files:
Note: assuming you already have java installed, download the bfg.jar file the above repo and copy it to your current directory.
- Clone your git repository (and make a backup of it):
$ git clone --mirror git://example.com/my-large-repo.git
- Run the BFG to clean your repository up (e.g to clean files larger than 50MB):
$ java -jar bfg.jar --strip-blobs-bigger-than 100M my-large-repo.git
....
Before After
-------------------------------------------
First modified commit | fc7cf2f9 | a772ae4a
Last dirty commit | d4a1a3d4 | 9b345832
Deleted files
-------------
Filename Git id
-------------------------------------------------------------------------------------------------------------------------
3Class_Instances.pkl | ceebb395 (558.1 MB)
Beijing_KerasData.pkl | 8681a270 (133.4 MB)
Filtered_Trajectory.pkl | bfe06d09 (137.8 MB)
....
- Strip out the unwanted dirty data
$ cd my-large-repo.git
$ git reflog expire --expire=now --all && git gc --prune=now --aggressive
Enumerating objects: 1306, done.
Counting objects: 100% (1306/1306), done.
Delta compression using up to 8 threads
Compressing objects: 78% (973/1238)238)
...
- Finally push back your clean repo:
$ git push
Source: here