Subtract fixed number of days from date column using awk and add it to new column-CodePudding

Let's assume that we have a file with the values as seen bellow:

% head test.csv
20220601,A,B,1
20220530,A,B,1

And we want to add two new columns, one with the date minus 1 day and one with minus 7 days, resulting the following:

% head new_test.csv
20220601,A,B,20220525,20220531,1
20220530,A,B,20220523,20220529,1

The awk that was used to produce the above is:

% awk 'BEGIN{FS=OFS=","} { a="date -d \"$(date -d \""$1"\") -7 days\" '%Y%m%d'"; a | getline st ; close(a) ;b="date -d \"$(date -d \""$1"\") -1 days\" '%Y%m%d'"; b | getline cb ; close(b) ;print $1","$2","$3","st","cb","$4}' test.csv > new_test.csv

But after applying the above in a large file with more than 100K lines it runs for 20 minutes, is there any way to optimize the awk?

CodePudding user response：

One GNU awk approach:

awk '
BEGIN { FS=OFS=","
        secs_in_day = 60 * 60 * 24
      }
      { dt = mktime( substr($1,1,4) " " substr($1,5,2) " " substr($1,7,2) " 12 0 0" )
        dt1 = strftime("%Y%m%d",dt -  secs_in_day      )
        dt7 = strftime("%Y%m%d",dt - (secs_in_day * 7) )
        print $1,$2,$3,dt7,dt1,$4
      }
' test.csv

This generates:

20220601,A,B,20220525,20220531,1
20220530,A,B,20220523,20220529,1

NOTES:

requires GNU awk for the mktime() and strftime() functions; see GNU awk time functions for more details
other flavors of awk may have similar functions, ymmv