Probably asked (many times) before but cannot find anything helpful.
I have a csv file with timeseries. Dates are local without indicating timezone or DST. I use bash and date utility to convert all dates in the file to UTC. However, during the DST end in October there are two identical timestamps with different meaning. How to handle this, e.g. is there any argument to date command to indicate that the original local date is before/after DST end?
Here is the command I use to convert (in the loop, date is stored in variable):
date -u -Iseconds --date='TZ="Europe/Rome" 2021-10-31T01:55:00'
2021-10-31T01:55:00,values # converts to 2021-10-30T23:55:00 00:00, OK
2021-10-31T01:56:00,values # converts to 2021-10-30T23:56:00 00:00, OK
2021-10-31T01:57:00,values # converts to 2021-10-30T23:57:00 00:00, OK
2021-10-31T01:58:00,values # converts to 2021-10-30T23:58:00 00:00, OK
2021-10-31T01:59:00,values # converts to 2021-10-30T23:59:00 00:00, OK
2021-10-31T02:00:00,values # converts to 2021-10-31T01:00:00 00:00, wrong
2021-10-31T02:01:00,values # converts to 2021-10-31T01:01:00 00:00, wrong
2021-10-31T02:02:00,values # converts to 2021-10-31T01:02:00 00:00, wrong
2021-10-31T02:03:00,values # converts to 2021-10-31T01:03:00 00:00, wrong
2021-10-31T02:04:00,values # converts to 2021-10-31T01:04:00 00:00, wrong
2021-10-31T02:05:00,values # converts to 2021-10-31T01:05:00 00:00, wrong
...
2021-10-31T02:55:00,values # converts to 2021-10-31T01:55:00 00:00, wrong
2021-10-31T02:56:00,values # converts to 2021-10-31T01:56:00 00:00, wrong
2021-10-31T02:57:00,values # converts to 2021-10-31T01:57:00 00:00, wrong
2021-10-31T02:58:00,values # converts to 2021-10-31T01:58:00 00:00, wrong
2021-10-31T02:59:00,values # converts to 2021-10-31T01:59:00 00:00, wrong
2021-10-31T02:00:00,values # converts to 2021-10-31T01:00:00 00:00, OK
2021-10-31T02:01:00,values # converts to 2021-10-31T01:01:00 00:00, OK
2021-10-31T02:02:00,values # converts to 2021-10-31T01:02:00 00:00, OK
2021-10-31T02:03:00,values # converts to 2021-10-31T01:03:00 00:00, OK
2021-10-31T02:04:00,values # converts to 2021-10-31T01:04:00 00:00, OK
2021-10-31T02:05:00,values # converts to 2021-10-31T01:05:00 00:00, OK
...
EDITED (as requested)
Input file:
2021-10-31T01:45:00,value1,value2
2021-10-31T02:00:00,value1,value2
2021-10-31T02:15:00,value1,value2
2021-10-31T02:30:00,value1,value2
2021-10-31T02:45:00,value1,value2
2021-10-31T02:00:00,value1,value2
2021-10-31T02:15:00,value1,value2
2021-10-31T02:30:00,value1,value2
2021-10-31T02:45:00,value1,value2
2021-10-31T03:00:00,value1,value2
Output file:
2021-10-30T23:45:00 00:00,value1,value2
2021-10-31T00:00:00 00:00,value1,value2
2021-10-31T00:15:00 00:00,value1,value2
2021-10-31T00:30:00 00:00,value1,value2
2021-10-31T00:45:00 00:00,value1,value2
2021-10-31T01:00:00 00:00,value1,value2
2021-10-31T01:15:00 00:00,value1,value2
2021-10-31T01:30:00 00:00,value1,value2
2021-10-31T01:45:00 00:00,value1,value2
2021-10-31T02:00:00 00:00,value1,value2
How the script will know if 2021-10-31T02:15:00 in input file is CET or CEST. Obviously by position in the file, first occurence is CEST and the second one is CET. Ha, this is easy to explain but complex to implement in bash, and all this because someone used local time without timezone in the csv file.
CodePudding user response:
The question is:
is there any argument to date command to indicate that the original local date is before/after DST end?
Yes, you add the timezone. In this case: CET
from Central European Time or CEST
from Central Eastern European Time.
$ date -u -Iseconds --date 'TZ="Europe/Rome" 2021-10-31T02:59:00 CET'
2021-10-31T01:59:00 00:00
$ date -u -Iseconds --date 'TZ="Europe/Rome" 2021-10-31T02:59:00 CEST'
2021-10-31T00:59:00 00:00
CodePudding user response:
Here's how I'd do it in shell with date
zone='Europe/Paris'
while IFS=, read -r localDate rest; do
epoch=$(TZ="$zone" date -d "$localDate" ' %s')
utcDate=$(date --utc -d "@$epoch" -Iseconds)
printf '%s,%s\n' "$utcDate" "$rest"
done < input.file
But I'd rather do it in a language that can convert a timestamp between time zones a little more elegantly
Here's perl (keep your "elegantly" comments to yourself...)
perl -MDateTime -MDateTime::Format::ISO8601 -F, -ane '
$F[0] = DateTime::Format::ISO8601->parse_datetime($F[0])
->set_time_zone("Europe/Paris")
->set_time_zone("UTC");
print join ",", @F
' input.file
or, not as a one-liner:
#!perl
use DateTime;
use DateTime::Format::ISO8601;
use autodie;
my $parser = DateTime::Format::ISO8601->new;
my $file = shift @ARGV;
open my $fh, '<', $file;
while (<$fh>) {
my ($timestamp, $rest) = split /,/, $_, 2;
my $utc_date = $parser->parse_datetime($timestamp)
->set_time_zone('Europe/Paris')
->set_time_zone('UTC');
printf '%s,%s', $parser->format_datetime($utc_date), $rest;
}
close $fh;
perl convert.pl input.file
2021-10-30T23:45:00Z,value1,value2
2021-10-31T01:00:00Z,value1,value2
2021-10-31T01:15:00Z,value1,value2
...