I'm not prof in bash and Linux but I need to preprocess some finance data (OHLC data) in JSON syntax like below:
$ data='
[
{ "t": "2022-09-01T00:00:00", "o": 1.3800, "c": 1.3800, "h": 1.3800, "l": 1.3800, "v": 46.900, "n": 1 },
{ "t": "2022-09-01T00:00:15", "o": 1.3700, "c": 1.3700, "h": 1.3700, "l": 1.3700, "v": 299.100, "n": 1 },
{ "t": "2022-09-01T00:00:45", "o": 1.3800, "c": 1.3800, "h": 1.3800, "l": 1.3800, "v": 2.900, "n": 1 },
{ "t": "2022-09-01T00:02:45", "o": 1.3700, "c": 1.3735, "h": 1.3735, "l": 1.3700, "v": 450.443, "n": 7 },
{ "t": "2022-09-01T00:03:00", "o": 1.3743, "c": 1.3744, "h": 1.3744, "l": 1.3743, "v": 15.128, "n": 2 },
{ "t": "2022-09-01T00:03:45", "o": 1.3773, "c": 1.3776, "h": 1.3776, "l": 1.3773, "v": 32.078, "n": 3 },
{ "t": "2022-09-01T00:04:45", "o": 1.3700, "c": 1.3700, "h": 1.3700, "l": 1.3700, "v": 380.000, "n": 1 },
{ "t": "2022-09-01T00:05:00", "o": 1.3783, "c": 1.3783, "h": 1.3783, "l": 1.3783, "v": 8.191, "n": 1 },
{ "t": "2022-09-01T00:05:15", "o": 1.3800, "c": 1.3800, "h": 1.3800, "l": 1.3800, "v": 5654.400, "n": 14 },
{ "t": "2022-09-01T00:05:45", "o": 1.3800, "c": 1.3800, "h": 1.3800, "l": 1.3800, "v": 427.100, "n": 2 },
...
]'
I want to use DATE command to replace "time" field from current format into timestamp format as bellow:
new_data=
[
{ "t": 1661974200, "o": 1.3800, "c": 1.3800, "h": 1.3800, "l": 1.3800, "v": 46.900, "n": 1 },
{ "t": 1661974215, "o": 1.3700, "c": 1.3700, "h": 1.3700, "l": 1.3700, "v": 299.100, "n": 1 },
{ "t": 1661974245, "o": 1.3800, "c": 1.3800, "h": 1.3800, "l": 1.3800, "v": 2.900, "n": 1 },
{ "t": 1661974365, "o": 1.3700, "c": 1.3735, "h": 1.3735, "l": 1.3700, "v": 450.443, "n": 7 },
{ "t": 1661974380, "o": 1.3743, "c": 1.3744, "h": 1.3744, "l": 1.3743, "v": 15.128, "n": 2 },
{ "t": 1661974435, "o": 1.3773, "c": 1.3776, "h": 1.3776, "l": 1.3773, "v": 32.078, "n": 3 },
{ "t": 1661974495, "o": 1.3700, "c": 1.3700, "h": 1.3700, "l": 1.3700, "v": 380.000, "n": 1 },
{ "t": 1661974510, "o": 1.3783, "c": 1.3783, "h": 1.3783, "l": 1.3783, "v": 8.191, "n": 1 },
{ "t": 1661974525, "o": 1.3800, "c": 1.3800, "h": 1.3800, "l": 1.3800, "v": 5654.400, "n": 14 },
{ "t": 1661974555, "o": 1.3800, "c": 1.3800, "h": 1.3800, "l": 1.3800, "v": 427.100, "n": 2 },
...
]
With help from google I tried to run this command
$ echo "$data" | sed "s/\"([0-9] -[0-9] -[0-9] T[0-9] :[0-9] :[0-9] (\.[0-9]*Z)?)\"/$(date --date=\1 '%s')/g"
but output result has same timestamp for all records!
output=
[
{ "t": 1666733400, "o": 1.3800, "c": 1.3800, "h": 1.3800, "l": 1.3800, "v": 46.900, "n": 1 },
{ "t": 1666733400, "o": 1.3700, "c": 1.3700, "h": 1.3700, "l": 1.3700, "v": 299.100, "n": 1 },
{ "t": 1666733400, "o": 1.3800, "c": 1.3800, "h": 1.3800, "l": 1.3800, "v": 2.900, "n": 1 },
{ "t": 1666733400, "o": 1.3700, "c": 1.3735, "h": 1.3735, "l": 1.3700, "v": 450.443, "n": 7 },
{ "t": 1666733400, "o": 1.3743, "c": 1.3744, "h": 1.3744, "l": 1.3743, "v": 15.128, "n": 2 },
{ "t": 1666733400, "o": 1.3773, "c": 1.3776, "h": 1.3776, "l": 1.3773, "v": 32.078, "n": 3 },
{ "t": 1666733400, "o": 1.3700, "c": 1.3700, "h": 1.3700, "l": 1.3700, "v": 380.000, "n": 1 },
{ "t": 1666733400, "o": 1.3783, "c": 1.3783, "h": 1.3783, "l": 1.3783, "v": 8.191, "n": 1 },
{ "t": 1666733400, "o": 1.3800, "c": 1.3800, "h": 1.3800, "l": 1.3800, "v": 5654.400, "n": 14 },
{ "t": 1666733400, "o": 1.3800, "c": 1.3800, "h": 1.3800, "l": 1.3800, "v": 427.100, "n": 2 },
...
]
finally after much and much failed tries I found that the replace part of SED doesn't pass the matched sub-string and just pass string "\1" instead.
please guide me anyone how to fix the problem. tanks guys
CodePudding user response:
I would suggest to process the file line by line. Also, keep in mind that it's not advised to parse or process JSON in the shell. A simple script such as the following can do the trick for you:
#!/bin/bash
while read -r line; do
if [[ $line == {* ]]; then
timestamp=$(echo "$line" | awk '/t/ {t=$3; gsub(/[,"]/, "", t); print t }')
epoch=$(date --date="$timestamp" '%s')
echo "$line" | sed "s/$timestamp/$epoch/"
else
echo "$line"
fi
done < data
It is a basic (and inelegant) solution but you'd avoid having to deal with regexes. Note that in the example above data
is a text file and not a env var.
So, for instance, if the data
file contains:
[
{ "t": "2022-09-01T00:00:00", "o": 1.3800, "c": 1.3800, "h": 1.3800, "l": 1.3800, "v": 46.900, "n": 1 },
{ "t": "2022-09-01T00:00:15", "o": 1.3700, "c": 1.3700, "h": 1.3700, "l": 1.3700, "v": 299.100, "n": 1 },
{ "t": "2022-09-01T00:00:45", "o": 1.3800, "c": 1.3800, "h": 1.3800, "l": 1.3800, "v": 2.900, "n": 1 },
{ "t": "2022-09-01T00:02:45", "o": 1.3700, "c": 1.3735, "h": 1.3735, "l": 1.3700, "v": 450.443, "n": 7 },
{ "t": "2022-09-01T00:03:00", "o": 1.3743, "c": 1.3744, "h": 1.3744, "l": 1.3743, "v": 15.128, "n": 2 },
{ "t": "2022-09-01T00:03:45", "o": 1.3773, "c": 1.3776, "h": 1.3776, "l": 1.3773, "v": 32.078, "n": 3 },
{ "t": "2022-09-01T00:04:45", "o": 1.3700, "c": 1.3700, "h": 1.3700, "l": 1.3700, "v": 380.000, "n": 1 },
{ "t": "2022-09-01T00:05:00", "o": 1.3783, "c": 1.3783, "h": 1.3783, "l": 1.3783, "v": 8.191, "n": 1 },
{ "t": "2022-09-01T00:05:15", "o": 1.3800, "c": 1.3800, "h": 1.3800, "l": 1.3800, "v": 5654.400, "n": 14 },
{ "t": "2022-09-01T00:05:45", "o": 1.3800, "c": 1.3800, "h": 1.3800, "l": 1.3800, "v": 427.100, "n": 2 },
...
]
The output when running the script will be:
[
{ "t": "1661986800", "o": 1.3800, "c": 1.3800, "h": 1.3800, "l": 1.3800, "v": 46.900, "n": 1 },
{ "t": "1661986815", "o": 1.3700, "c": 1.3700, "h": 1.3700, "l": 1.3700, "v": 299.100, "n": 1 },
{ "t": "1661986845", "o": 1.3800, "c": 1.3800, "h": 1.3800, "l": 1.3800, "v": 2.900, "n": 1 },
{ "t": "1661986965", "o": 1.3700, "c": 1.3735, "h": 1.3735, "l": 1.3700, "v": 450.443, "n": 7 },
{ "t": "1661986980", "o": 1.3743, "c": 1.3744, "h": 1.3744, "l": 1.3743, "v": 15.128, "n": 2 },
{ "t": "1661987025", "o": 1.3773, "c": 1.3776, "h": 1.3776, "l": 1.3773, "v": 32.078, "n": 3 },
{ "t": "1661987085", "o": 1.3700, "c": 1.3700, "h": 1.3700, "l": 1.3700, "v": 380.000, "n": 1 },
{ "t": "1661987100", "o": 1.3783, "c": 1.3783, "h": 1.3783, "l": 1.3783, "v": 8.191, "n": 1 },
{ "t": "1661987115", "o": 1.3800, "c": 1.3800, "h": 1.3800, "l": 1.3800, "v": 5654.400, "n": 14 },
{ "t": "1661987145", "o": 1.3800, "c": 1.3800, "h": 1.3800, "l": 1.3800, "v": 427.100, "n": 2 },
...
]
CodePudding user response:
Using GNU sed
and GNU date
$ new_data=$(sed -E 's/"/#/g;s/([^:]*[^#]*#)([^#]*)(.*)/echo "\1$(date -d \2 '%s')\3"/e;s/#/"/g' <<< "$data")
$ echo "$new_data"
[
{ "t": "1661986800", "o": 1.3800, "c": 1.3800, "h": 1.3800, "l": 1.3800, "v": 46.900, "n": 1 },
{ "t": "1661986815", "o": 1.3700, "c": 1.3700, "h": 1.3700, "l": 1.3700, "v": 299.100, "n": 1 },
{ "t": "1661986845", "o": 1.3800, "c": 1.3800, "h": 1.3800, "l": 1.3800, "v": 2.900, "n": 1 },
{ "t": "1661986965", "o": 1.3700, "c": 1.3735, "h": 1.3735, "l": 1.3700, "v": 450.443, "n": 7 },
{ "t": "1661986980", "o": 1.3743, "c": 1.3744, "h": 1.3744, "l": 1.3743, "v": 15.128, "n": 2 },
{ "t": "1661987025", "o": 1.3773, "c": 1.3776, "h": 1.3776, "l": 1.3773, "v": 32.078, "n": 3 },
{ "t": "1661987085", "o": 1.3700, "c": 1.3700, "h": 1.3700, "l": 1.3700, "v": 380.000, "n": 1 },
{ "t": "1661987100", "o": 1.3783, "c": 1.3783, "h": 1.3783, "l": 1.3783, "v": 8.191, "n": 1 },
{ "t": "1661987115", "o": 1.3800, "c": 1.3800, "h": 1.3800, "l": 1.3800, "v": 5654.400, "n": 14 },
{ "t": "1661987145", "o": 1.3800, "c": 1.3800, "h": 1.3800, "l": 1.3800, "v": 427.100, "n": 2 },
...
]