Home > Blockchain >  bash more efficient way to convert odd date format to be recognized by linux date
bash more efficient way to convert odd date format to be recognized by linux date

Time:04-06

Here is the date format in a bunch of files I have

$cat ./file.log
20220405T130001 message1
20220405T130002 message2
20220405T130003 message3
20220405T130004 message4
20220405T130005 message5

I am able to convert it to a usable date format by doing this:

$cat ./file.log | sed 's/^\(.\{4\}\)/\1-/' | sed 's/^\(.\{7\}\)/\1-/' | sed 's/\(.\{10\}\)./\1 /' |  sed 's/^\(.\{13\}\)/\1:/' | sed 's/^\(.\{16\}\)/\1:/
2022-04-05 13:00:01 message1
2022-04-05 13:00:02 message2
2022-04-05 13:00:03 message3
2022-04-05 13:00:04 message4
2022-04-05 13:00:05 message5

This seems very inefficient. Is there an easier / better way to accomplish this in bash?

The rules to change would be the following

  • insert - after first 4 characters
  • insert - after next 2 characters after previous rule
  • insert - after next 2 characters after previous rule
  • replace T with after next 2 characters after previous rule
  • insert : after next 2 characters after previous rule
  • insert : after next 2 characters after previous rule

CodePudding user response:

Assuming you are always getting same format in input file, a single sed can handle this with multiple capture groups:

sed -E 's/^(.{4})(..)(..)T(..)(..)/\1-\2-\3 \4:\5:/' file

2022-04-05 13:00:01 message1
2022-04-05 13:00:02 message2
2022-04-05 13:00:03 message3
2022-04-05 13:00:04 message4
2022-04-05 13:00:05 message5

CodePudding user response:

With your shown samples, please try following awk code. Simple explanation would be, setting field separator as T OR spaces. In main program printing sub strings(using substr function of awk) where printing respective sub strings using 1st, 2nd and 3rd fields as per required output.

awk -F'T| ' '
{
  print substr($1,1,4)"-"substr($1,5,2)"-"substr($1,7,2),substr($2,1,2)":"substr($2,3,2)":"substr($2,5,2),$3
}
' Input_file
  • Related