I have a dataset of 100,000 rows and 15 column in a 10mb csv.
the column I am working on is a : Date/Time column in a string format
source code
import pandas as pd
import datetime as dt
trupl = pd.DataFrame({'Time/Date' : ['12/1/2021 2:09','22/4/2021 21:09','22/6/2021 9:09']})
trupl['Time/Date'] = pd.to_datetime(trupl['Time/Date'])
print(trupl)
Output
Time/Date
0 2021-12-02 02:09:00
1 2021-04-22 21:09:00
2 2021-06-22 09:09:00
What I need to do is a bit confusing but I'll try to make it simple :
if the time of the date is between 12 am and 8 am ; subtract one day from the Time/Date and put the new timestamp in a new column.
if not, put it as it is.
Expected output
Time/Date Date_adjusted
0 12/2/2021 2:09 12/1/2021 2:09
1 22/4/2021 21:09 22/4/2021 21:09
2 22/6/2021 9:09 22/6/2021 9:09
I tried the below code :
trupl['Date_adjusted'] = trupl['Time/Date'].map(lambda x:x- dt.timedelta(days=1) if x >= dt.time(0,0,0) and x < dt.time(8,0,0) else x)
i get a TypeError: '>=' not supported between 'Timestamp' and 'datetime.time'
and when applying dt.time to x , i get an error " Timestamp" object has no attribute 'dt'
so how can i convert x to time in order to compare it ? or there is a better workaround ?
I searched a lot for a fix but I couldn't find a similar case.
CodePudding user response:
Try:
trupl['Date_adjusted'] = trupl['Time/Date'].map(lambda x: x - dt.timedelta(days=1) if (x.hour >= 0 and x.hour < 8) else x)