I have tsv file with two column contains dates in YYYY-DD-MM HH:MM:SS format.
I want to count the day difference between these dates (as Truncated Integer)
for example:
input:
column1 column2
2013-04-26 12:38:33 2013-04-27 12:38:33.757
2019-06-07 17:46:02.543 2019-06-27 22:46:02.543
2013-02-06 12:38:33 2013-02-18 22:50:50
Output :
1
20
12
My dates in column 1 and 2 I tried
gawk -F "\t" '
{print$0 "\t"
split($1,a,"-" || ":" );
split($2,b,"-" || ":" );
t1 = mktime(sprintf("%d %d %d 0 0 0 0",a[1],a[2],a[3]));
t2 = mktime(sprintf("%d %d %d 0 0 0 0",b[1],b[2],b[3]));
print (t2-t1)/86400
}'
But I got strange number which not reflecting the accurate day difference.
CodePudding user response:
When you need to do it multiple times, it's cleaner/easier to define your own function for converting your date format to epoch time.
printf '%s\t%s\n' \
'2013-04-26 12:38:33' '2013-04-27 12:38:33.757' \
'2019-06-07 17:46:02.543' '2019-06-27 22:46:02.543' \
'2013-02-06 12:38:33' '2013-02-18 22:50:50' |
awk -F '\t' '
function epoch(str, i) {
gsub(/[-:]/," ",str)
return mktime(str) ((i = index(str,".")) ? substr(str,i) : "")
}
{ print int((epoch($2) - epoch($1)) / 86400) }
1
20
12
notes:
mktime
doesn't support floating point precision so I re-append the sub-zero part.print int(float)
truncates the sub-zero part offloat
; if your prefer a rounded output then useprintf "%d%s", float, ORS
Merry Christmas!
CodePudding user response:
I'm guessing you mean a .csv file (comma-separated value)? I would use R to do this. If you install R (it is a free as in beer download), you should get the Rscript command with it, then you can can use the as.Date()
method and do this from the command line or a script
Rscript -e 'as.Date("2013-04-27 12:38:33.757") - as.Date("2013-04-26 12:38:33")'
Time difference of 1 days
Dealing with dates and time differences is devilishly complicated.