I have a data file where dates are alphanumeric at each 10 minutes. e.g.
00 hour 00 minute (00:00H)
00 hour 10 minute (00:10H)
00 hour 20 minute (00:20H)
and so on
$ ifile.txt
00:00H01JUN2021 1.900
00:10H01JUN2021 2.400
00:20H01JUN2021 2.100
00:30H01JUN2021 2.300
00:40H01JUN2021 2.00
00:50H01JUN2021 2.300
01:00H01JUN2021 2.300
01:10H01JUN2021 0.000
01:20H01JUN2021 2.200
01:30H01JUN2021 0.100
To understand the data:
1st column is date; second column is the value at that time
First 6 letters YY:XXH indicats as YY -> Hour; XX -> Minute (as explained in the begining)
I would like to convert it into a CSV file with numeric dates. The desire outfile is
$ ofile.txt
yyyy-mm-dd hh-mn-sc,val
2021-06-01 00:00:00,1.900
2021-06-01 00:10:00,2.400
2021-06-01 00:20:00,2.100
2021-06-01 00:30:00,2.300
2021-06-01 00:40:00,2.000
2021-06-01 00:50:00,2.300
2021-06-01 01:00:00,2.300
2021-06-01 01:10:00,0.000
2021-06-01 01:20:00,2.200
2021-06-01 01:30:00,0.100
My script is:
#!/bin/sh
gawk '
BEGIN {
month["Jan"] = "01"; month["Feb"] = "02"; month["Mar"] = "03";
month["Apr"] = "04"; month["May"] = "05"; month["Jun"] = "06";
month["Jul"] = "07"; month["Aug"] = "08"; month["Sep"] = "09";
month["Oct"] = "10"; month["Nov"] = "11"; month["Dec"] = "12";
}
function timestamp_to_numeric(s) {
# 00:00H01JUN2021 => 2021-06-01 00:00:00
return substr(s,12,4)"-"month[substr(s,9,3)]"-"substr(s,7,2) substr(s,1,2)":"substr(s,4,2)":""00"
}
NR==1 {next}
END {
printf "%s",timestamp_to_numeric($1),$2
printf "\n"
}
' ifile.txt
This script is not priniting my desired output.
CodePudding user response:
Change
return substr(s,12,4)"-"month[substr(s,9,3)]"-"substr(s,7,2) substr(s,1,2)":"substr(s,4,2)":""00"
to
return substr(s,12,4)"-"month[substr(s,9,3)]"-"substr(s,7,2)" "substr(s,1,2)":"substr(s,4,2)":""00"
# .................................,........................^^^
so that you actually have a space between the date and the time.
Perhaps more readable would be:
return sprintf("M-d-d d:d:00", substr(s,12,4), month[substr(s,9,3)], substr(s,7,2), substr(s,1,2), substr(s,4,2))
CodePudding user response:
Using GNU awk (since you're already using it) for the 4th arg to split()
:
$ cat tst.awk
function timestamp_to_numeric(s, mthNr,t,m) {
# 00:00H01JUN2021 => 2021-06-01 00:00:00
split(s,t,/[[:alpha:]] /,m)
mthNr = index(" JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC",m[2]) / 3
return sprintf("d-d-d %s:00", t[3], mthNr, t[2], t[1])
}
BEGIN {
OFS=","
print "yyyy-mm-dd hh-mn-sc","val"
}
{ print timestamp_to_numeric($1), $2 }
$ awk -f tst.awk ifile.txt
yyyy-mm-dd hh-mn-sc,val
2021-06-01 00:00:00,1.900
2021-06-01 00:10:00,2.400
2021-06-01 00:20:00,2.100
2021-06-01 00:30:00,2.300
2021-06-01 00:40:00,2.00
2021-06-01 00:50:00,2.300
2021-06-01 01:00:00,2.300
2021-06-01 01:10:00,0.000
2021-06-01 01:20:00,2.200
2021-06-01 01:30:00,0.100
CodePudding user response:
To map english month names (full or abbr.) of any casing to month #, this extremely-odd-looking lookup string suffices —
- it pre-segregates the input by whether 2nd letter is
A|a
- i.e.Jan / March / May
- then performs reference string position lookup of the 3rd letter
function month_name_to_num(__,_) {
return \
index(substr("n_r_yb_r_nlgptvc",
((_ = _)- - _)^(__!~"^.[Aa]")),
tolower(substr(__,_--,--_) ) )
}
OCT 10
AUGUST 8
March 3
May 5
October 10
November 11
February 2
JUNE 6
NOV 11
JUL 7
December 12
OCTOBER 10
FEBRUARY 2
JANUARY 1
MARCH 3
APRIL 4
June 6
April 4
September 9
NOVEMBER 11
January 1
FEB 2
MAY 5
DEC 12
MAY 5
JAN 1
JULY 7
SEP 9
August 8
SEPTEMBER 9
July 7
DECEMBER 12
MAR 3
APR 4
JUN 6
AUG 8
if you don't want to use regex
, this function variant bypasses the need to allocate extra temp variable(s) by repurposing the data input one(s) —- something uniquely convenient in weakly-typed languages like awk
:
function monthname2num(_) {
return \
index("=anebarprayunulugepctovec",
tolower(substr(_ "",_ =_^=_,_)))/_
}