Home > other >  How to use bash to get unique dates from list of file names
How to use bash to get unique dates from list of file names

Time:01-07

I have a large number of file names. I need to create a bash script that gets all of the unique dates from the file names.

Example:

input:

opencomposition_dxxx_20201123.csv.gz     
opencomposition_dxxv_20201123.csv.gz
opencomposition_dxxu_20201123.csv.gz     
opencomposition_sxxv_20201123.csv.gz
opencomposition_sxxe_20211223.csv.gz 
opencomposition_sxxe_20211224.csv.gz  
opencomposition_sxxe_20211227.csv.gz  
opencomposition_sxxesgp_20230106.csv.gz

output:

20201123 20211224 20211227 20230106

Code:

for asof_dt in `find -H ./ -maxdepth 1 -nowarn -type f -name *open*.gz
| sort -r | cut -f3 -d "_" | cut -f1 -d"." | uniq`; do
    echo $asof_dt
done

Error:

line 20: /bin/find: Argument list too long

CodePudding user response:

Like this (GNU grep):

You need to add quotes on the glob: '*open*.gz', if not, the shell expand the wildcard * to any files / dirs from current directory !

find -H ./ -maxdepth 1 -nowarn -type f -name '*open*.gz' |
    grep -oP '_\K\d{8}(?=\.csv)' |
    sort -u

Output

20201123
20211223
20211224
20211227
20230106

The regular expression matches as follows:

Node Explanation
_ _
\K resets the start of the match (what is Kept) as a shorter alternative to using a look-behind assertion: perlmonks look arounds and Support of K in regex
\d{8} digits (0-9) (8 times)
(?= look ahead to see if there is:
\. .
csv 'csv'
) end of look-ahead

CodePudding user response:

Using tr:

find -H ./ -maxdepth 1 -nowarn -type f -name '*open*.gz' | tr -d 'a-z_.' | sort -u

CodePudding user response:

If filenames don't contain newline characters, a quick-and-dirty method, similar to your attempt, might be

printf '%s\n' open*.gz | cut -d_ -f3 | cut -d. -f1 | sort -u

Note that printf is a bash builtin command and argument list too long is not applied to bash builtins.

  •  Tags:  
  • bash
  • Related