--Newbie to UNIX--
I am looking for a UNIX command to remove some of the files in the directory based on date logic. Sample file structure:
Aug 30 01:30 Test20210830.ctl
Aug 30 01:30 Test20210830.txt
Aug 3 01:30 Test20210803.ctl
Aug 3 01:30 Test20210803.txt
Aug 2 01:30 Test20210802.ctl
Aug 2 01:30 Test20210802.txt
Aug 1 01:30 Test20210801.ctl
Aug 1 01:30 Test20210801.txt
Jul 1 01:30 Test20210701.ctl
Jul 1 01:30 Test20210701.txt
Jun 16 01:30 Test20210616.ctl
Jun 16 01:30 Test20210616.txt
Jun 15 01:30 Test20210615.ctl
Jun 15 01:30 Test20210615.txt
ls -ltr Test* | grep date %b
| head -2 -- this command gives the top 2 files that I want to keep (first 2 files of each month).
I want to remove the rest of the files from the same month (so the July and June 2 files still have to be there). There is a job that runs at end of each month so..
What is the best approach to remove the files and keep the files I want?
CodePudding user response:
If you wanted to keep only the first two lexicographically-sorted filenames from each month, you could use a simple loop and shell filename expansion:
for year in 2021
do
for month in {01..12}
do
set -- Test${year}${month}*
if [ "$#" -gt 2 ]
then
shift 2
echo would rm -- "$@"
fi
done
done
Note that this would remove Test20210616.ctl
and Test20210616.txt
as it's specifically shifting off the first two filenames of each month.
The core part of this script is in the set
and shift 2
portions. The set -- Test${year}${month}*
line expands the year and month variables and appends the *
wildcard to match any/all filenames that start with "Test" followed by the loop's current year and month values. Once those filenames are expanded -- or not! -- they are available in the special $@
array variable. Note that if there are no matches, there will be one value in the array -- the unexpanded "Test"... wildcard. This would be a special-case to watch out for if we needed to know if there was one actual filename match, but here we're only interested in the situation where there are more than two matches. The resulting filenames, if any, are added to the array in lexicographic order.
Once the filenames are in $@
, the shift 2
simply pops the first two elements off the front of the list -- here, the first two filenames (sorted lexicographically). What remains is one or more filenames that are candidates to be removed.
Remove the echo would
in order to enable this, if the results look correct.
CodePudding user response:
I have put your provided input in a file called input
.
This code is safe to run as it doesn't actually remove the file but you can see the output for yourself. It tells you which file would be removed and which would stay with some self-explanatory lines at the end of each section.
It simply counts how many times it have seen the month so far and if it's more than two it outputs removing file
. Otherwise it prints file stays!
.
Output attached below.
lastmon=""
counter=0
# replace "cat input" with your "ls -l" or so
cat input | egrep -o "\S $" | sort -r | # extract filenames
for filename in $(cat)
do
# extract pieces from filename
echo "filename: $filename"
echo -n "date: "
echo $filename | egrep -o '[0-9] '
echo -n "month: "
echo $filename | egrep -o '[0-9] ' | cut -b5-6
# compare month to last month
mon="$(echo $filename | egrep -o '[0-9] ' | cut -b5-6)"
if [ "x$mon" != "x$lastmon" ]
then counter=0
fi
lastmon="$mon"
# keep counting
counter=$((counter 1))
echo counter: $counter
# if more than two in a row decide to remove file
if [ $counter -gt 2 ]
then echo result: removing file
else echo result: file stays!
fi
echo # print empty line
done
Output
filename: Test20210830.txt [39/1872]
date: 20210830
month: 08
counter: 1
file stays!
filename: Test20210830.ctl
date: 20210830
month: 08
counter: 2
file stays!
filename: Test20210803.txt
date: 20210803
month: 08
counter: 3
removing file
filename: Test20210803.ctl
date: 20210803
month: 08
counter: 4
removing file
filename: Test20210802.txt
date: 20210802
month: 08
counter: 5
removing file
filename: Test20210802.ctl
date: 20210802
month: 08
counter: 6
removing file
filename: Test20210801.txt
date: 20210801
month: 08
counter: 7
removing file
filename: Test20210801.ctl
date: 20210801
month: 08
counter: 8
removing file
filename: Test20210701.txt
date: 20210701
month: 07
counter: 1
file stays!
filename: Test20210701.ctl
date: 20210701
month: 07
counter: 2
file stays!
filename: Test20210616.txt
date: 20210616
month: 06
counter: 1
file stays!
filename: Test20210616.ctl
date: 20210616
month: 06
counter: 2
file stays!
filename: Test20210615.txt
date: 20210615
month: 06
counter: 3
removing file
filename: Test20210615.ctl
date: 20210615
month: 06
counter: 4
removing file