I wrote a cleanup Script to delete some certain files. The files are stored in Subfolders. I use find
to get those files into a Array and its recursive because of find. So an Array entry could look like this:
(path to File)
./2021_11_08_17_28_45_1733556/2021_11_12_04_15_51_1733556_0.jfr
As you can see the filenames are Timestamps. Find sorts by the Folder name only (./2021_11_08_17_28_45_1733556)
but I need to sort all Files which can be in different Folders by the timestamp only of the files and not of the folders (they can be completely ignored), so I can delete the oldest files first. Here you can find my Script at the not properly working state, I need to add some sorting to fix my problems.
Any Ideas?
#!/bin/bash
# handle -h (help)
if [[ "$1" == "-h" || "$1" == "" ]]; then
echo -e '-p [Pfad zum Zielordner] \n-f [Anzahl der Files welche noch im Ordner vorhanden sein sollen] \n-d [false um dryRun zu deaktivieren]'
exit 0
fi
# handle parameters
while getopts p:f:d: flag
do
case "${flag}" in
p) pathToFolder=${OPTARG};;
f) maxFiles=${OPTARG};;
d) dryRun=${OPTARG};;
*) echo -e '-p [Pfad zum Zielordner] \n-f [Anzahl der Files welche noch im Ordner vorhanden sein sollen] \n-d [false um dryRun zu deaktivieren]'
esac
done
if [[ -z $dryRun ]]; then
dryRun=true
fi
# fill array specified by .jfr files an sorted that the oldest files get deleted first
fillarray() {
files=($(find -name "*.jfr" -type f))
totalFiles=${#files[@]}
}
# Return size of file
getfilesize() {
filesize=$(du -k "$1" | cut -f1)
}
count=0
checkfiles() {
# Check if File matches the maxFiles parameter
if [[ ${#files[@]} -gt $maxFiles ]]; then
# Check if dryRun is enabled
if [[ $dryRun == "false" ]]; then
echo "msg=\"Removal result\", result=true, file=$(realpath $1) filesize=$(getfilesize $1), reason=\"outside max file boundary\""
((count ))
rm $1
else
((count ))
echo msg="\"Removal result\", result=true, file=$(realpath $1 ) filesize=$(getfilesize $1), reason=\"outside max file boundary\""
fi
# Remove the file from the files array
files=(${files[@]/$1})
else
echo msg="\"Removal result\", result=false, file=$( realpath $1), reason=\"within max file boundary\""
fi
}
# Scan for empty files
scanfornullfiles() {
for file in "${files[@]}"
do
filesize=$(! getfilesize $file)
if [[ $filesize == 0 ]]; then
files=(${files[@]/$file})
echo msg="\"Removal result\", result=false, file=$(realpath $file), reason=\"empty file\""
fi
done
}
echo msg="jfrcleanup.sh started", maxFiles=$maxFiles, dryRun=$dryRun, directory=$pathToFolder
{
cd $pathToFolder > /dev/null 2>&1
} || {
echo msg="no permission in directory"
echo msg="jfrcleanup.sh stopped"
exit 0
}
fillarray #> /dev/null 2>&1
scanfornullfiles
for file in "${files[@]}"
do
checkfiles $file
done
echo msg="\"jfrcleanup.sh finished\", totalFileCount=$totalFiles filesRemoved=$count"
CodePudding user response:
Assuming the file paths do not contain newline characters, would tou please try
the following Schwartzian transform
method:
#!/bin/bash
pat="/([0-9]{4}(_[0-9]{2}){5})[^/]*\.jfr$"
while IFS= read -r -d "" path; do
if [[ $path =~ $pat ]]; then
printf "%s\t%s\n" "${BASH_REMATCH[1]}" "$path"
fi
done < <(find . -type f -name "*.jfr" -print0) | sort -k1,1 | head -n 1 | cut -f2- | tr "\n" "\0" | xargs -0 echo rm
- The string
pat
is a regex pattern to extract the timestamp from the filename such as2021_11_12_04_15_51
. - Then the timestamp is prepended to the filename delimited by a tab character.
- The output lines are sorted by the timestamp in ascending order (oldest first).
head -n 1
picks the oldest line. If you want to change the number of files to remove, modify the number to the-n
option.cut -f2-
drops the timestamp to retrieve the filename.tr "\n" "\0"
protects the filenames which contain whitespaces or tab characters.xargs -0 echo rm
just outputs the command lines as a dry run. If the output looks good, dropecho
.
CodePudding user response:
If you have GNU find
, and pathnames don't contain new-line ('\n'
) and tab ('\t'
) characters, the output of this command will be ordered by basenames:
find path/to/dir -type f -printf '%f\t%p\n' | sort | cut -f2-
CodePudding user response:
TL;DR but Since you're using find
and if it supports the -printf
flag/option something like.
find . -type f -name '*.jfr' -printf '%f/%h/%f\n' | sort -k1 -n | cut -d '/' -f2-
Otherwise a while read
loop with another -printf
option.
#!/usr/bin/env bash
while IFS='/' read -rd '' time file; do
printf '%s\n' "$file"
done < <(find . -type f -name '*.jfr' -printf '%T@/%p\0' | sort -zn)
That is -printf
from find
and the -z
flag from sort
is a GNU extension.
Saving the file names you could change
printf '%s\n' "$file"
To something like, which is an array named files
files =("$file")
Then "${files[@]}"
has the file names as elements.
- The last code with a
while read
loop does not depend on the file names but the time stamp from GNUfind
.
CodePudding user response:
I solved the problem! I sort the array with the following so the oldest files will be deleted first:
files=($(printf '%s\n' "${files[@]}" | sort -t/ -k3))