I have a dataframe that looks like this
trip_id start_date start_station_id end_date end_station_id subscription_type journey_duration weekday
0 913460 2019-08-31 23:26:00 50 2019-08-31 23:39:00 70 Subscriber 0 days 00:13:00 Sat
1 913459 2019-08-31 23:11:00 31 2019-08-31 23:28:00 27 Subscriber 0 days 00:17:00 Sat
2 913455 2019-08-31 23:13:00 47 2019-08-31 23:18:00 64 Subscriber 0 days 00:05:00 Sat
3 913454 2019-08-31 23:10:00 10 2019-08-31 23:17:00 8 Subscriber 0 days 00:07:00 Sat
4 913453 2019-08-31 23:09:00 51 2019-08-31 23:22:00 60 Customer 0 days 00:13:00 Sat
Essentially I used
trip_data['journey_duration'] = trip_data['end_date'] - trip_data['start_date']
to get the journey duration, now I want to remove rows where the journey duration exceeds say 36 hours
I have tried this without success
trip_data2 = trip_data[(trip_data['journey_duration'] < 1days 12:00:00) ].copy()
Any suggestions would be greatly appreciated
Thanks
CodePudding user response:
Try:
# convert to datetime:
df["start_date"] = pd.to_datetime(df["start_date"])
df["end_date"] = pd.to_datetime(df["end_date"])
# get only rows where the time difference is less than 36*60*60 seconds (36 hours):
df_out = df[
(df["end_date"] - df["start_date"]).dt.total_seconds() < 36 * 60 * 60
]
print(df_out)