Home > other >  bash generate a regexp to match any formated date between two dates
bash generate a regexp to match any formated date between two dates

Time:01-27

Using bash I would generate a regexp to match any formatted date between two dates. (I will later use it in restricted prod so I would use only bash as far as possible)

Random dates of course even crossing years for example the regexp might match any date between

  • 2022-12-27 and
  • 2023-02-05

so all of the dates

  • 2022-12-27
  • 2022-12-28
  • 2022-12-29
  • 2022-12-30
  • 2022-12-31
  • 2023-01-01
  • (…)
  • 2023-02-05

Time range comes from two parameters given as input to the future bash script.

Finaly will use that regexp to both find & manage filenames and to grep some datas.

Filename form-pattern are random but always contain a YYY-MM-DD time format, whatever the filename is from dd_YYYY-MM-DD.xx to aaaa__bbbb_dd_YYYY-MM-DD.xxx_ccc_zzz.log or anything else .

I tried to manage that by separating each year/month/day like

fromDate="$1"
toDate="$2"

# break the dates into array so that we can easily access the day, month and year
#
fdy=$( sed 's/.*\([0-9]\{4\}\).*/\1/' <<< $fromDate )
fdm=$( sed 's/.*-\([0-9]\{2\}\)-.*/\1/' <<< $fromDate )
fdd=$( sed 's/.*-.*-\([0-9]\{2\}\).*/\1/' <<< $fromDate )
#
edy=$( sed 's/.*\([0-9]\{4\}\).*/\1/' <<< $toDate )
edm=$( sed 's/.*-\([0-9]\{2\}\)-.*/\1/' <<< $toDate )
edd=$( sed 's/.*-.*-\([0-9]\{2\}\).*/\1/' <<< $toDate )

then to loop over that with some sort of

#[...]
printf -v date "%d-d-d" "${from[2]}" "${from[1]}" "${from[0]}"
    pattern="$pattern|$date"
    ((from[0]  ))
    # reset days and increment month if days exceed 31
    if [[ "${from[0]}" -gt 31 ]]
    then
        from[0]=1
        ((from[1]  ))
#[...]

but didn't find a way to work around and output a correct regexp matching any date inside the date range.

CodePudding user response:

This is not a regex solution but may be helpful.

When the date format is yyyy-mm-dd you can compare dates lexicographically.

The string "2023-01-05" is greater than "2022-08-05" because the first character they don't have in common is greater in the first string.

In bash you can therefore just do the comparison [[ "$date" > "$from" ]] && [[ "$to" > "$date" ]] to see if a date in yyyy-mm-dd string format is within the range $to-$from

CodePudding user response:

There is a nifty feature in bash / readline, that you can use called brace patterns and expansion of these. It's not exactly regex, but converting from those to regex is easy.

First, since it's parctical in this example to generate the files for those dates let's start with:

$ k=0 b=2022-12-27 a=2022-12-27; 
     until [[ $b > 2023-01-05  ]]; do
        b=$(date -d "$a   $k days"  %F); echo $b; k=$((k 1)); done \
           | xargs -t touch
touch 2022-12-27 2022-12-28 2022-12-29 2022-12-30 (...)

$ ls
2022-12-27  2022-12-29  2022-12-31  2023-01-02  2023-01-04  2023-01-06
2022-12-28  2022-12-30  2023-01-01  2023-01-03  2023-01-05

Then check if you have this brace thing enabled.

$ bind -p |any brace
"\e{": complete-into-braces

And if not, add it to the inputrc file:

$ grep braces ~/.inputrc 
"\e{":      complete-into-braces

Now, in the empty folder where those files were created, pressing escape { generates this string:

$ 202{2-12-{2{7,8,9},3{0,1}},3-01-0{1,2,3,4,5,6}} 

converting that to a regex should be a matter of replacing the commans and braces:

$ echo '202{2-12-{2{7,8,9},3{0,1}},3-01-0{1,2,3,4,5,6}}'  | tr '{},' '()|'
202(2-12-(2(7|8|9)|3(0|1))|3-01-0(1|2|3|4|5|6))

Let's test it:

$ (days 2022; days;) | grep -Ee '202(2-12-(2(7|8|9)|3(0|1))|3-01-0(1|2|3|4|5|6))'
361 2022-12-27 December 52 Tuesday 
362 2022-12-28 December 52 Wednesday 
363 2022-12-29 December 52 Thursday 
364 2022-12-30 December 52 Friday 
365 2022-12-31 December 52 Saturday 
001 2023-01-01 January 52 Sunday 
002 2023-01-02 January 01 Monday 
003 2023-01-03 January 01 Tuesday 
004 2023-01-04 January 01 Wednesday 
005 2023-01-05 January 01 Thursday 
006 2023-01-06 January 01 Friday 

The days script is from my local-bin repo on github.

So it works well enough, but the initial while loop was wrong. We just need to remove the last 6 from the regex.

$ (days 2022; days;) | grep -Ee '202(2-12-(2(7|8|9)|3(0|1))|3-01-0(1|2|3|4|5))' | sed -n '1p; $p'
361 2022-12-27 December 52 Tuesday 
005 2023-01-05 January 01 Thursday 

This takes care of the YYYY-MM-DD portion.


I think this answers the question "how to generate a regex for the dates". If you want to expand the question, and need more hints, leave a comment.

  • Related