Home > Blockchain >  Linux - Finding the max modified date of each set of files in each directory
Linux - Finding the max modified date of each set of files in each directory

Time:11-17

path/mydir contains a list of directories. The names of these directories tell me which database they relate to.

Inside each directory is a bunch of files, but the filenames tell me nothing of importance.

I'm trying to write a command in linux bash that accomplishes the following:

  • For each directory in path/mydir, find the max timestamp of the last modified file within that directory
  • Print the last modified file's timestamp next to the parent directory's name
  • Exclude any timestamps less than 30 days old
  • Exclude specific directory names using regex
  • Order by oldest timestamp

Given this directory structure in path/mydir:

database_1
   table_1.file (last modified 2021-11-01)
   table_2.file (last modified 2021-11-01)
   table_3.file (last modified 2021-11-05)
database_2
   table_1.file (last modified 2021-05-01)
   table_2.file (last modified 2021-05-01)
   table_3.file (last modified 2021-08-01)
database_3
   table_1.file (last modified 2020-01-01)
   table_2.file (last modified 2020-01-01)
   table_3.file (last modified 2020-06-01)

I would want to output:

database_3 2020-06-01
database_2 2021-08-01

This half works, but looks at the modified date of the parent directory instead of the max timestamp of files under the directory: find . -maxdepth 1 -mtime 30 -type d -ls | grep -vE 'name1|name2'

I'm very much a novice with bash, so any help and guidance is appreciated!

CodePudding user response:

Would you please try the following

#!/bin/bash

cd "path/mydir/"
for d in */; do
    dirname=${d%/}
    mdate=$(find "$d" -maxdepth 1 -type f -mtime  30 -printf "%TY-%Tm-%Td\t%TT\t%p\n" | sort -rk1,2 | head -n 1 | cut -f1)
    [[ -n $mdate ]] && echo -e "$mdate\t$dirname"
done | sort -k1,1 | sed -E $'s/^([^\t] )\t(. )/\\2 \\1/'

Output with the provided example:

database_3 2020-06-01
database_2 2021-08-01
  • for d in */; do loops over the subdirectories in path/mydir/.
  • dirname=${d%/} removes the trailing slash just for the printing purpose.
  • printf "%TY-%Tm-%Td\t%TT\t%p\n" prepends the modification date and time to the filename delimited by a tab character. The result will look like:
2021-08-01      12:34:56        database_2/table_3.file
  • sort -rk1,2 sorts the output by the date and time fields in descending order.
  • head -n 1 picks the line with the latest timestamp.
  • cut -f1 extracts the first field with the modification date.
  • [[ -n $mdate ]] skips the empty mdate.
  • sort -k1,1 just after done performs the global sorting across the outputs of the subdirectories.
  • sed -E ... swaps the timestamp and the dirname. It just considers the case the dirname may contain a tab character. If not, you can omit the sed command by switching the order of timestamp and dirname in the echo command and changing the sort command to sort -k2,2.

As for the mentioned Exclude specific directory names using regex, add your own logic to the find command or whatever.

[Edit]
In order to print the directory name if the last modified file in the subdirectories is older than the specified date, please try instead:

#!/bin/bash

cd "path/mydir/"
now=$(date  %s)
for d in */; do
    dirname=${d%/}
    read -r secs mdate < <(find "$d" -type f -printf "%T@\t%TY-%Tm-%Td\n" | sort -nrk1,1 | head -n 1)
    if (( secs < now - 3600 * 24 * 30 )); then
        echo -e "$secs\t$dirname $mdate"
    fi
done | sort -nk1,1 | cut -f2-
  • now=$(date %s) assigns the variable now to the current time as the seconds since the epoch.
  • for d in */; do loops over the subdirectories in path/mydir/.
  • dirname=${d%/} removes the trailing slash just for the printing purpose.
  • -printf "%T@\t%TY-%Tm-%Td\n" prints the modificaton time as seconds since the epoch and the modification date delimited by a tab character. The result will look like:
1627743600      2021-08-01
  • sort -nrk1,1 sorts the output by the modification time in descending order.
  • head -n 1 picks the line with the latest timestamp.
  • read -r secs mdate < <( stuff ) assigns secs and mdate to the outputs of the command in order.
  • The condition (( secs < now - 3600 * 24 * 30 )) meets if secs is 30 days or more older than now.
  • echo -e "$secs\t$dirname $mdate" prints dirname and mdate prepending the secs for the sorting purpose.
  • sort -nk1,1 just after done performs the global sorting across the outputs of the subdirectories.
  • cut -f2- removes secs portion.
  • Related