Home > Enterprise >  Shell script command to find specific file extenions
Shell script command to find specific file extenions

Time:05-25

I am looking to locate large files within my GH commit history so that I may find these files and remove them to reduce my repo size.

From this stack overflow threadI have found a shell script that successfully lists files within my repo from largest to smallest

git rev-list --objects --all |
  git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' |
  sed -n 's/^blob //p' |
  sort --numeric-sort --key=2 |
  cut -c 1-12,41- |
  $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest

simply pasting this into my command line outputs the follwoing

etc.
etc.
3886fa03848b  9.8MiB Python_scripts/cache/3948da8721027dc20b065e90c40573feff0bd651.json
ecc305ad772f  9.8MiB Python_scripts/cache/8c0751d1550c66250a435e83117deb36dcfd77ba.json
d25525e0a60c  9.8MiB Python_scripts/cache/6becf6f5e0c1547f43ef1ff7356d486e5358cbde.json
bd1cdcf0c45f  9.8MiB Python_scripts/cache/b3d00f1524a2edfe9397f60b8400fb5ac62037e7.json
df01689f9074  9.8MiB Python_scripts/cache/a44395ec06f9451db5f03f042141458ae977c261.json
217a805355fb  9.8MiB Python_scripts/cache/9ad253e8419bcc49278bc8da8f81d3e1ecdadaf6.json
72fa31033b72  9.8MiB Python_scripts/cache/800c9f1fea258738c3d992495a8f2f2b15ecc576.json
ea86a352aaf2  9.8MiB Python_scripts/cache/4a34d6bd3b25243bbac28c50304181555be1d6a9.json
806729ee0224  9.9MiB Python_scripts/cache/d0ab10701a112ad55e3131d765decbc01a10dc88.json
7ded9e2268c8  9.9MiB Python_scripts/cache/f357efd0808e655071e19f7d4e4671f1adfaf407.json
6db94f66e641  9.9MiB Python_scripts/cache/4f2e5392ee1018b63ee8982dbcae36edfcbfa9bb.json
f67da0d97ff6  9.9MiB Python_scripts/cache/282d8e0660282f045e846c52dbb7fddd3a3b5670.json
cac7d279b112   10MiB Python_scripts/cache/fab764e344ae8680dc445f11512cf065d0a2ad9c.json
af8c4882734f   10MiB Python_scripts/cache/46395651237d0c497f2772595dc2c9e91702b49b.json
78ae7b236719   11MiB articles/openinfra.html

I would like to add the ability for this script to obtain only files with the .json extension so that I may find all commit hashes and purge from the repo


The thread top comment suggests that further filtering is possible (too low rep to post image):


Currently I have tried similar approaches to this find . -type f -name \*.rb but using .json over .rb and find . | grep *.json however when adding both of these to the script it fails to run within the command line

So really I am just looking for a line that will output only files with the .json extension, if at all possible (apoligies for my lack of experince with shell scripts!)

CodePudding user response:

Change the sed line to

sed -n '/\.json$/ s/^blob//p'

Read up on sed and awk, they're fundamental text-processing tools for any task that falls comfortably into metastasized-oneliner range.

  • Related