Let's say i have a dataframe like this:
date_1 date_2
0 2022-08-01 2022-08-05
1 2022-08-20 NaN
2 NaN NaN
I want to have another column which tells the difference in business days and have a daframe like this (in case date_2 is empty, it will be compared to today's date (2022-08-28
)):
date_1 date_2 diff
0 2022-08-01 2022-08-05 4
1 2022-08-20 NaN 5
2 NaN NaN Empty
I tried to use this one:
df["diff"] = df.apply(
lambda x: np.busday_count(x.date_1, x.date_2) if (x.date_1 != '' and x.date_2 != '') else (np.busday_count(x.date_1, np.datetime64('today')) if (x.date_1 != '' and x.date_2 == '') else ''), axis=1)
but im getting this error:
Iterator operand 0 dtype could not be cast from dtype('<M8[us]') to dtype('<M8[D]') according to the rule 'safe'
Any idea how to get the desired dataframe?
CodePudding user response:
I think you just need to coerce the types:
def busday_diff(x):
if pd.isna(x.date_1) and pd.isna(x.date_2):
return ''
date2_to_use = pd.Timestamp("today") if pd.isna(x.date_2) else x.date_2
return np.busday_count(np.datetime64(x.date_1, "D"), np.datetime64(date2_to_use, "D"))
df["diff"] = df.apply( lambda x: busday_diff(x) , axis=1)
I opened up the lambda for readability also
CodePudding user response:
I want to have another column which tells the difference in days
If you just want days:
df.assign(
diff=lambda df: (
pd.to_datetime(df["date_2"]).fillna(pd.Timestamp.now())
- pd.to_datetime(df["date_1"])
).dt.days
)
which outputs
date_1 date_2 diff
0 2022-08-01 2022-08-05 4.0
1 2022-08-20 NaN 8.0
2 NaN NaN NaN
EDIT: it was later clarified that they want business days, so please refer to the other answer instead