Home > Software design >  Keep the conditional calculation result in a dataframe
Keep the conditional calculation result in a dataframe

Time:04-09

My data frame dft2022 is:

Start_Time     End_Time
9:55:00         10:55:00
5:41:00         14:55:00
9:01:00         12:55:00
9:02:00         7:55:00
8:55:00          N/A
11:55:00         N/A

I want to add a duration column in this dataframe, the duration = End_Time - Start_Time. I used the following:

dft2022["duration"] = pd.to_datetime(dft2022["End_Time"]) - pd.to_datetime(dft2022["Start_Time"])

However, I only want to keep the "duration" value where End_Time - Start_Time is positive value or doesn't have a N/A in End_Time.

CodePudding user response:

The output that you expect is unclear, bit you could use clip to set the negative deltas to 0:

df['duration'] = (pd.to_datetime(dft2022["End_Time"])
 .sub(pd.to_datetime(dft2022["Start_Time"]))
 .clip(lower='0')
 )

Output:

  Start_Time  End_Time        duration
0    9:55:00  10:55:00 0 days 01:00:00
1    5:41:00  14:55:00 0 days 09:14:00
2    9:01:00  12:55:00 0 days 03:54:00
3    9:02:00   7:55:00 0 days 00:00:00
4    8:55:00       NaN             NaT
5   11:55:00       NaN             NaT

To filter the date, you can use:

df[pd.to_datetime(dft2022["End_Time"])
     .sub(pd.to_datetime(dft2022["Start_Time"]))
     .gt('0')]

Output:

  Start_Time  End_Time
0    9:55:00  10:55:00
1    5:41:00  14:55:00
2    9:01:00  12:55:00

CodePudding user response:

Can try something like this, needs Numpy.

import numpy as np
import pandas as pd

data = {'Start_Date': ['9:55:00',  '5:41:00', '9:01:00', '9:02:00', '8:55:00', '11:55:00'], 'End_Date': ['10:55:00', '14:55:00', '12:55:00', '7:55:00', 'N/A', 'N/A']}

df = pd.DataFrame(data)

df['_end_time'] = pd.to_datetime(df.End_Date, errors='coerce')
df['_start_time'] = pd.to_datetime(df.Start_Date, errors='coerce')
# Coerce sets NaT if unable to parse as datetime.

df["duration"] = np.where(
    df._end_time >= df._start_time,
    df._end_time - df._start_time,
    pd.NaT
)
df.drop(columns=['_start_time', '_end_time'], inplace=True)
print(df)

Output:

  Start_Date  End_Date        duration
0    9:55:00  10:55:00   3600000000000
1    5:41:00  14:55:00  33240000000000
2    9:01:00  12:55:00  14040000000000
3    9:02:00   7:55:00             NaT
4    8:55:00       N/A             NaT
5   11:55:00       N/A             NaT

  • Related