Home > Mobile >  Alphanumeric date to numeric in shell
Alphanumeric date to numeric in shell

Time:01-19

I have a data file where dates are alphanumeric at each 10 minutes. e.g.

00 hour 00 minute (00:00H)
00 hour 10 minute (00:10H)
00 hour 20 minute (00:20H)
and so on

$ ifile.txt

00:00H01JUN2021 1.900
00:10H01JUN2021 2.400
00:20H01JUN2021 2.100
00:30H01JUN2021 2.300
00:40H01JUN2021 2.00
00:50H01JUN2021 2.300
01:00H01JUN2021 2.300
01:10H01JUN2021 0.000
01:20H01JUN2021 2.200
01:30H01JUN2021 0.100

To understand the data:

1st column is date; second column is the value at that time

First 6 letters YY:XXH indicats as YY -> Hour; XX -> Minute (as explained in the begining)

I would like to convert it into a CSV file with numeric dates. The desire outfile is

$ ofile.txt

yyyy-mm-dd hh-mn-sc,val
2021-06-01 00:00:00,1.900
2021-06-01 00:10:00,2.400
2021-06-01 00:20:00,2.100
2021-06-01 00:30:00,2.300
2021-06-01 00:40:00,2.000
2021-06-01 00:50:00,2.300
2021-06-01 01:00:00,2.300
2021-06-01 01:10:00,0.000
2021-06-01 01:20:00,2.200
2021-06-01 01:30:00,0.100

My script is:

#!/bin/sh
gawk '
    BEGIN {
        month["Jan"] = "01"; month["Feb"] = "02"; month["Mar"] = "03";
        month["Apr"] = "04"; month["May"] = "05"; month["Jun"] = "06";
        month["Jul"] = "07"; month["Aug"] = "08"; month["Sep"] = "09";
        month["Oct"] = "10"; month["Nov"] = "11"; month["Dec"] = "12";
    }
    function timestamp_to_numeric(s) {
        # 00:00H01JUN2021 => 2021-06-01 00:00:00
        return substr(s,12,4)"-"month[substr(s,9,3)]"-"substr(s,7,2) substr(s,1,2)":"substr(s,4,2)":""00"
    }
    NR==1 {next}
    END {
            printf "%s",timestamp_to_numeric($1),$2
            printf "\n"
        }
   ' ifile.txt

This script is not priniting my desired output.

CodePudding user response:

Change

return substr(s,12,4)"-"month[substr(s,9,3)]"-"substr(s,7,2) substr(s,1,2)":"substr(s,4,2)":""00"

to

return substr(s,12,4)"-"month[substr(s,9,3)]"-"substr(s,7,2)" "substr(s,1,2)":"substr(s,4,2)":""00"
# .................................,........................^^^

so that you actually have a space between the date and the time.

Perhaps more readable would be:

return sprintf("M-d-d d:d:00", substr(s,12,4), month[substr(s,9,3)], substr(s,7,2), substr(s,1,2), substr(s,4,2))

CodePudding user response:

Using GNU awk (since you're already using it) for the 4th arg to split():

$ cat tst.awk
function timestamp_to_numeric(s,        mthNr,t,m) {
    # 00:00H01JUN2021 => 2021-06-01 00:00:00
    split(s,t,/[[:alpha:]] /,m)
    mthNr = index("  JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC",m[2]) / 3
    return sprintf("d-d-d %s:00", t[3], mthNr, t[2], t[1])
}

BEGIN {
    OFS=","
    print "yyyy-mm-dd hh-mn-sc","val"
}
{ print timestamp_to_numeric($1), $2 }

$ awk -f tst.awk ifile.txt
yyyy-mm-dd hh-mn-sc,val
2021-06-01 00:00:00,1.900
2021-06-01 00:10:00,2.400
2021-06-01 00:20:00,2.100
2021-06-01 00:30:00,2.300
2021-06-01 00:40:00,2.00
2021-06-01 00:50:00,2.300
2021-06-01 01:00:00,2.300
2021-06-01 01:10:00,0.000
2021-06-01 01:20:00,2.200
2021-06-01 01:30:00,0.100

CodePudding user response:

To map english month names (full or abbr.) of any casing to month #, this extremely-odd-looking lookup string suffices —

  • it pre-segregates the input by whether 2nd letter is A|a - i.e. Jan / March / May
  • then performs reference string position lookup of the 3rd letter
function month_name_to_num(__,_) {
    return \
    index(substr("n_r_yb_r_nlgptvc",
    ((_ =  _)- -  _)^(__!~"^.[Aa]")),
       tolower(substr(__,_--,--_) ) )
}

OCT 10
AUGUST 8
March 3
May 5
October 10
November 11
February 2
JUNE 6
NOV 11
JUL 7
December 12
OCTOBER 10
FEBRUARY 2
JANUARY 1
MARCH 3
APRIL 4
June 6
April 4
September 9
NOVEMBER 11
January 1
FEB 2
MAY 5
DEC 12
MAY 5
JAN 1
JULY 7
SEP 9
August 8
SEPTEMBER 9
July 7
DECEMBER 12
MAR 3
APR 4
JUN 6
AUG 8

if you don't want to use regex, this function variant bypasses the need to allocate extra temp variable(s) by repurposing the data input one(s) —- something uniquely convenient in weakly-typed languages like awk :

function monthname2num(_) {
    return \
    index("=anebarprayunulugepctovec",
    tolower(substr(_ "",_ =_^=_,_)))/_
}
  • Related