My data frame dft2022
is:
Start_Time End_Time
9:55:00 10:55:00
5:41:00 14:55:00
9:01:00 12:55:00
9:02:00 7:55:00
8:55:00 N/A
11:55:00 N/A
I want to add a duration column in this dataframe, the duration = End_Time - Start_Time. I used the following:
dft2022["duration"] = pd.to_datetime(dft2022["End_Time"]) - pd.to_datetime(dft2022["Start_Time"])
However, I only want to keep the "duration" value where End_Time - Start_Time
is positive value or doesn't have a N/A
in End_Time.
CodePudding user response:
The output that you expect is unclear, bit you could use clip
to set the negative deltas to 0:
df['duration'] = (pd.to_datetime(dft2022["End_Time"])
.sub(pd.to_datetime(dft2022["Start_Time"]))
.clip(lower='0')
)
Output:
Start_Time End_Time duration
0 9:55:00 10:55:00 0 days 01:00:00
1 5:41:00 14:55:00 0 days 09:14:00
2 9:01:00 12:55:00 0 days 03:54:00
3 9:02:00 7:55:00 0 days 00:00:00
4 8:55:00 NaN NaT
5 11:55:00 NaN NaT
To filter the date, you can use:
df[pd.to_datetime(dft2022["End_Time"])
.sub(pd.to_datetime(dft2022["Start_Time"]))
.gt('0')]
Output:
Start_Time End_Time
0 9:55:00 10:55:00
1 5:41:00 14:55:00
2 9:01:00 12:55:00
CodePudding user response:
Can try something like this, needs Numpy.
import numpy as np
import pandas as pd
data = {'Start_Date': ['9:55:00', '5:41:00', '9:01:00', '9:02:00', '8:55:00', '11:55:00'], 'End_Date': ['10:55:00', '14:55:00', '12:55:00', '7:55:00', 'N/A', 'N/A']}
df = pd.DataFrame(data)
df['_end_time'] = pd.to_datetime(df.End_Date, errors='coerce')
df['_start_time'] = pd.to_datetime(df.Start_Date, errors='coerce')
# Coerce sets NaT if unable to parse as datetime.
df["duration"] = np.where(
df._end_time >= df._start_time,
df._end_time - df._start_time,
pd.NaT
)
df.drop(columns=['_start_time', '_end_time'], inplace=True)
print(df)
Output:
Start_Date End_Date duration
0 9:55:00 10:55:00 3600000000000
1 5:41:00 14:55:00 33240000000000
2 9:01:00 12:55:00 14040000000000
3 9:02:00 7:55:00 NaT
4 8:55:00 N/A NaT
5 11:55:00 N/A NaT