I have a column name Date_and_Time having thousands of data in csv shown below:
Input:
Date_and_Time
2021-06-02T13:00:00.000 05:30
2021-06-02T15:00:00.000 05:30
2021-06-02T19:00:00.000 05:30
2021-06-02T21:00:00.000 05:30
2021-06-02T22:00:00.000 05:30
I want to change in 6 columns shown below in csv file. 'New Time' change by adding 5.30 hrs. Because of this, 'New Date' also change. There is a 'Time Gap' column too. Please help in Python. Thanks.
Output:
Date and Time Old Date Old Time New Date New Time Time Gap
2021-06-02T13:00:00.000 05:30 2021-06-02 13:00:00 2021-06-02 18:30:00 NA
2021-06-02T15:00:00.000 05:30 2021-06-02 15:00:00 2021-06-02 20:30:00 2
2021-06-02T19:00:00.000 05:30 2021-06-02 19:00:00 2021-06-03 0:30:00 4
2021-06-02T21:00:00.000 05:30 2021-06-02 21:00:00 2021-06-03 2:30:00 2
2021-06-02T22:00:00.000 05:30 2021-06-02 22:00:00 2021-06-03 3:30:00 1
CodePudding user response:
It appears that you want to convert an ISO formatted date that already contains a correct timezone indication to the equivalent wall clock time in IST.
The first thing to do is to get a datetime
from the string:
>>> a = datetime.datetime.fromisoformat("2021-06-02T13:00:00.000 05:30")
>>> a
datetime.datetime(2021, 6, 2, 13, 0, tzinfo=datetime.timezone(datetime.timedelta(seconds=19800)))
Now you want to add the UTC offset (5h30m) to the already correct IST time to give IST wall clock time, that is, you want to move the UTC offset out of the timezone and into the time. This a rather strange thing to want to do, but you can do
>>> new_time = a datetime.timedelta(hours=5, minutes=30)
or equivalently
>>> new_time = a a.utcoffset()
and either will give you
>>> new_time
datetime.datetime(2021, 6, 2, 18, 30, tzinfo=datetime.timezone(datetime.timedelta(seconds=19800)))
But now you have double counting of the UTC offset, so you need to take the UTC offset out of the timezone:
>>> new_time = new_time.replace(tzinfo=None)
>>> new_time
datetime.datetime(2021, 6, 2, 18, 30)
And you can format that as a string get the wall clock time you expect:
>>> new_time.strftime("%Y-%m-%d %H:%M:%S")
'2021-06-02 18:30:00'
Though I do think that storing a time in a .csv
as wall-clock time without a timezone, rather than the way it already is, is because you want to force the on-screen representation of a date into your underlying data. That is about as sensible as storing years as two digits because that is how you want to display them. Many of today's programmers are too young to remember how much trouble that caused.
The normal approach is to store data at the file level as locale and convention independent, and convert it on input and output.
Your question explains nothing about what Time Gap is supposed to mean, other than that it exists. So I haven't attempted to answer that.
CodePudding user response:
If you are trying to create the output format from the input column the following approach might help:
import pandas as pd
add_time = pd.Timedelta(hours=5, minutes=30)
df = pd.read_csv('input.csv')
df['OldDateTime'] = pd.to_datetime(df['Date_and_Time'])
df['Old Date'] = df['OldDateTime'].dt.date
df['Old Time'] = df['OldDateTime'].dt.time
df['New Date'] = (df['OldDateTime'] add_time).dt.date
df['New Time'] = (df['OldDateTime'] add_time).dt.time
df['Time Gap'] = (df['OldDateTime'] - df['OldDateTime'].shift(1)).dt.total_seconds() / 3600
del df['OldDateTime']
print(df)
Giving you:
Date_and_Time Old Date Old Time New Date New Time Time Gap
0 2021-06-02T13:00:00.000 05:30 2021-06-02 13:00:00 2021-06-02 18:30:00 NaN
1 2021-06-02T15:00:00.000 05:30 2021-06-02 15:00:00 2021-06-02 20:30:00 2.0
2 2021-06-02T19:00:00.000 05:30 2021-06-02 19:00:00 2021-06-03 00:30:00 4.0
3 2021-06-02T21:00:00.000 05:30 2021-06-02 21:00:00 2021-06-03 02:30:00 2.0
4 2021-06-02T22:00:00.000 05:30 2021-06-02 22:00:00 2021-06-03 03:30:00 1.0