Home > Mobile >  Perl Regex Query - filter the contents inside the file which are older than 18 months
Perl Regex Query - filter the contents inside the file which are older than 18 months

Time:02-16

I have a problem where I want to filter the contents inside the file which are older than 18 months.

File looks like this:

 > cat trans_file.txt

trans-02-2018
trans-03-2019
trans-04-2021
trans-01-2022

Output needed:

trans-02-2018
trans-03-2019

I am using below method:

export DT=`date  %m-%Y`
export DTLastYear=`date  %m-%Y -d '18 months ago'`
perl -ne 'print if grep {$_<$ENV{$DTLastYear}}  /(\d{2}-\d{4})/g' trans_file.txt

But it's not working. Can anyone help here?

CodePudding user response:

I'd do it in pure perl, using core time-processing modules instead of date(1) to get the 18 months ago date (And parse the date from the lines instead of using regular expressions):

As a one-liner:

$ perl -MTime::Piece -MTime::Seconds -lne '
  BEGIN { $when = localtime() - (ONE_MONTH * 18) }
  my $t = Time::Piece->strptime($_, "trans-%m-%Y");
  print if defined $t && $t < $when;' trans_file.txt
trans-02-2018
trans-03-2019

Or as a separate script that takes the input file name(s) as command line arguments, or reads from standard input if there are none:

#!/usr/bin/env perl
use strict;
use warnings;
use feature qw/say/;
use v5.22.0; # For <<>>; use the less-secure <> on older perls
use Time::Piece;
use Time::Seconds;

my $when = localtime() - (ONE_MONTH * 18);

while (my $line = <<>>) {
    chomp $line;
    my $t = Time::Piece->strptime($line, "trans-%m-%Y");
    say $line if defined $t && $t < $when;
}

Or using the non-core but often useful DateTime module mentioned by zdim:

#!/usr/bin/env perl
use warnings;
use strict;
use feature qw/say/;
use v5.22.0; # For <<>>; use the less-secure <> on older perls
use DateTime; # Install through your OS package manager or favorite CPAN client

my $when = DateTime->now->truncate(to => 'month')->subtract(months => 18);

while (my $line = <<>>) {
    chomp $line;
    if ($line =~ /(\d\d)-(\d{4})$/) {
        my $t = DateTime->new(month => $1, year => $2);
        say $line if $t < $when;
    }
}

Where you went wrong

All of the above convert times to objects that can be compared, instead of using strings like your attempt (Though in perl you need lt instead of < to compare strings). It can also be done that way, but you have to use a date format that can be meaningfully compared as strings. You're trying to use a 'MM-YYYY' format , but that doesn't sort properly - 01-2020 comes before 12-2019, for example, because 0 is before 1. If you switch it around to a 'YYYY-MM' format, you can make it work using string comparison.

bash example:

dt_last_year=$(date  %Y-%m -d '18 months ago')
while read -r line; do
    if [[ $line =~ ([0-9][0-9])-([0-9]{4})$ ]]; then
        # date in YYYY-MM format
        t="${BASH_REMATCH[2]}-${BASH_REMATCH[1]}"
        if [[ $t < $dt_last_year ]]; then
            printf "%s\n" "$line"
        fi
    fi
done < trans_file.txt

CodePudding user response:

With bash:

x=$(date  %Y-%m -d '18 months ago')

while IFS='-' read -r prefix month year; do
  [[ "$year-$month" < "$x" ]] && echo "$prefix-$month-$year";
done < file

Output:

trans-02-2018
trans-03-2019

CodePudding user response:

< is for numerical comparisons. You need to use lt for string comparisons.
You need to reorder the year and month to use a string comparison.
You want string DTLastYear, not variable $DTLastYear.

Fixed:

export DTLastYear=`date  %Y-%m -d '18 months ago'`
perl -ne'/(\d{2})-(\d{4})/ or next; print if "$2-$1" lt $ENV{DTLastYear}' trans_file.txt

Simplified:

export DTLastYear=`date  %Y-%m -d '18 months ago'`
perl -F- -lane'print if "$F[2]-$F[1]" lt $ENV{DTLastYear}' trans_file.txt
  • Related