path/mydir
contains a list of directories. The names of these directories tell me which database they relate to.
Inside each directory is a bunch of files, but the filenames tell me nothing of importance.
I'm trying to write a command in linux bash that accomplishes the following:
- For each directory in
path/mydir
, find the max timestamp of the last modified file within that directory - Print the last modified file's timestamp next to the parent directory's name
- Exclude any timestamps less than 30 days old
- Exclude specific directory names using regex
- Order by oldest timestamp
Given this directory structure in path/mydir
:
database_1
table_1.file (last modified 2021-11-01)
table_2.file (last modified 2021-11-01)
table_3.file (last modified 2021-11-05)
database_2
table_1.file (last modified 2021-05-01)
table_2.file (last modified 2021-05-01)
table_3.file (last modified 2021-08-01)
database_3
table_1.file (last modified 2020-01-01)
table_2.file (last modified 2020-01-01)
table_3.file (last modified 2020-06-01)
I would want to output:
database_3 2020-06-01
database_2 2021-08-01
This half works, but looks at the modified date of the parent directory instead of the max timestamp of files under the directory:
find . -maxdepth 1 -mtime 30 -type d -ls | grep -vE 'name1|name2'
I'm very much a novice with bash, so any help and guidance is appreciated!
CodePudding user response:
Would you please try the following
#!/bin/bash
cd "path/mydir/"
for d in */; do
dirname=${d%/}
mdate=$(find "$d" -maxdepth 1 -type f -mtime 30 -printf "%TY-%Tm-%Td\t%TT\t%p\n" | sort -rk1,2 | head -n 1 | cut -f1)
[[ -n $mdate ]] && echo -e "$mdate\t$dirname"
done | sort -k1,1 | sed -E $'s/^([^\t] )\t(. )/\\2 \\1/'
Output with the provided example:
database_3 2020-06-01
database_2 2021-08-01
for d in */; do
loops over the subdirectories inpath/mydir/
.dirname=${d%/}
removes the trailing slash just for the printing purpose.printf "%TY-%Tm-%Td\t%TT\t%p\n"
prepends the modification date and time to the filename delimited by a tab character. The result will look like:
2021-08-01 12:34:56 database_2/table_3.file
sort -rk1,2
sorts the output by the date and time fields in descending order.head -n 1
picks the line with the latest timestamp.cut -f1
extracts the first field with the modification date.[[ -n $mdate ]]
skips the emptymdate
.sort -k1,1
just afterdone
performs the global sorting across the outputs of the subdirectories.sed -E ...
swaps the timestamp and the dirname. It just considers the case the dirname may contain a tab character. If not, you can omit thesed
command by switching the order of timestamp and dirname in theecho
command and changing thesort
command tosort -k2,2
.
As for the mentioned Exclude specific directory names using regex
, add
your own logic to the find
command or whatever.
[Edit]
In order to print the directory name if the last modified file in the subdirectories is older than the specified date, please try instead:
#!/bin/bash
cd "path/mydir/"
now=$(date %s)
for d in */; do
dirname=${d%/}
read -r secs mdate < <(find "$d" -type f -printf "%T@\t%TY-%Tm-%Td\n" | sort -nrk1,1 | head -n 1)
if (( secs < now - 3600 * 24 * 30 )); then
echo -e "$secs\t$dirname $mdate"
fi
done | sort -nk1,1 | cut -f2-
now=$(date %s)
assigns the variablenow
to the current time as the seconds since the epoch.for d in */; do
loops over the subdirectories inpath/mydir/
.dirname=${d%/}
removes the trailing slash just for the printing purpose.-printf "%T@\t%TY-%Tm-%Td\n"
prints the modificaton time as seconds since the epoch and the modification date delimited by a tab character. The result will look like:
1627743600 2021-08-01
sort -nrk1,1
sorts the output by the modification time in descending order.head -n 1
picks the line with the latest timestamp.read -r secs mdate < <( stuff )
assignssecs
andmdate
to the outputs of the command in order.- The condition
(( secs < now - 3600 * 24 * 30 ))
meets ifsecs
is 30 days or more older thannow
. echo -e "$secs\t$dirname $mdate"
printsdirname
andmdate
prepending thesecs
for the sorting purpose.sort -nk1,1
just afterdone
performs the global sorting across the outputs of the subdirectories.cut -f2-
removessecs
portion.