Home > Net >  Timeseries split datetime data type to seperate date and time columns
Timeseries split datetime data type to seperate date and time columns

Time:12-04

I am importing a CSV file that contains a nonformatted dataset, the date and time are separated but the data type is an object, and the times' time zone is incorrect.

The timezone of this original dataset is EET which is currently 7-hour difference from eastern standard time, (sometimes it is 6 hours during daylight savings time)

I am trying to convert the objects to date time format and convert the timezone to Eastern standard time, having the date and time in separate columns.

a snippet of my code is as follows:

Transform the date column to a datetime format

df['date'] = pd.to_datetime(df['date'])

the above code successfully converts the date column to datetime data type, great.

My trouble is when I work with the time column.

I am unable to separate the date and time using .split(),

the timezone is not accurate so I have compensated by using a different time zone to account for the -7hrs that I am looking for (us/mountain seems to produce the -7hr that I need)

Transform the time column to a datetime data type

df["time"] = pd.to_datetime(
df["time"],
infer_datetime_format = True
)

Convert the time column to the US/Eastern timezone

df["time"] = df["time"].dt.tz_localize("US/Mountain")

As it turns out, the output is: 2022-12-03 23:55:00-07:00

I am looking for an output of 16:55 or even 16:55:00 is fine as well.


My question is how can I separate the time from date while in datetime format and subtract the -7:00 so the output is 16:55 (or 16:55:00)

I have tried using:

df['time'].to_datetime().strftime('%h-%m') 

and receive the following error:

AttributeError: 'Series' object has no attribute 'to_datetime'
df['time'].apply(lambda x:time.strptime(x, "%H:%M"))

gives the following output

64999    (1900, 1, 1, 23, 55, 0, 0, 1, -1)
df['Time'] = pd.to_datetime(df['time']).dt.time

gives me the original time '23:55:00'

To be clear, I am looking for an output of just the converted time, for example 16:55. I do not want 23:55:00-07:00

I am looking for an output of date time split into separate columns in the correct timezone

example:

date |                time
------               ------
2022-12-02 |          16:55

CodePudding user response:

sample:

data = {
    "date": ["2022-12-02"],
    "time": ["23:55:00"]
}
df = pd.DataFrame(data)

code:

df["datetime"] = (
    pd.to_datetime(df["date"].str.cat(df["time"], sep=" "))
    .dt.tz_localize("UTC")
    .dt.tz_convert("US/Mountain")
    .dt.tz_localize(None)
)
df["date"], df["time"] = zip(*[(x.date(), x.time()) for x in df.pop("datetime")])
print(df)

output:

         date      time
0  2022-12-02  16:55:00
  • Related