Home > Back-end >  Subtract fixed number of days from date column using awk and add it to new column
Subtract fixed number of days from date column using awk and add it to new column

Time:01-10

Let's assume that we have a file with the values as seen bellow:

% head test.csv
20220601,A,B,1
20220530,A,B,1

And we want to add two new columns, one with the date minus 1 day and one with minus 7 days, resulting the following:

% head new_test.csv
20220601,A,B,20220525,20220531,1
20220530,A,B,20220523,20220529,1

The awk that was used to produce the above is:

% awk 'BEGIN{FS=OFS=","} { a="date -d \"$(date -d \""$1"\") -7 days\" '%Y%m%d'"; a | getline st ; close(a) ;b="date -d \"$(date -d \""$1"\") -1 days\" '%Y%m%d'"; b | getline cb ; close(b) ;print $1","$2","$3","st","cb","$4}' test.csv > new_test.csv

But after applying the above in a large file with more than 100K lines it runs for 20 minutes, is there any way to optimize the awk?

CodePudding user response:

One GNU awk approach:

awk '
BEGIN { FS=OFS=","
        secs_in_day = 60 * 60 * 24
      }
      { dt = mktime( substr($1,1,4) " " substr($1,5,2) " " substr($1,7,2) " 12 0 0" )
        dt1 = strftime("%Y%m%d",dt -  secs_in_day      )
        dt7 = strftime("%Y%m%d",dt - (secs_in_day * 7) )
        print $1,$2,$3,dt7,dt1,$4
      }
' test.csv

This generates:

20220601,A,B,20220525,20220531,1
20220530,A,B,20220523,20220529,1

NOTES:

  • requires GNU awk for the mktime() and strftime() functions; see GNU awk time functions for more details
  • other flavors of awk may have similar functions, ymmv
  • Related