I have two dataframes; available_df and delete_df.
<available_df>
Person start_day end_day available
1 2012-07-13 2012-07-27 0
1 2012-07-20 2012-08-03 0
1 2012-07-27 2012-08-10 0
2 2012-05-06 2012-05-20 0
2 2012-05-13 2012-05-27 0
2 2012-06-20 2012-07-03 0
2 2012-06-27 2012-07-10 0
2 2012-07-04 2012-07-11 0
<delete_df>
Person start_day end_day
1 2012-05-18 2012-05-24
1 2012-07-13 2012-07-20
2 2012-05-18 2012-06-23
<wanted_results>
Person start_day end_day available
1 2012-07-27 2012-08-10 0
2 2012-06-27 2012-07-10 0
2 2012-07-04 2012-07-11 0
What I want to do is, among available_df, I want to groupby person's records and remain rows which are not included in delete_df periods. If a person's start_day and end_day from delete_df is included in available_df, then delete.
I tried to use 'enumerate' methods but I failed it.. Is there anyone who can help me?
Thank you.
CodePudding user response:
I would use pandas.merge_asof
here:
out= (pd
.merge_asof(available_df.sort_values('end_day'),
delete_df.sort_values('start_day'),
by='Person', left_on='end_day', right_on='start_day',
suffixes=(None, '_'),
)
.loc[lambda d: d['start_day'].gt(d['end_day_'])]
.drop(columns=['start_day_', 'end_day_'])
)
output:
Person start_day end_day available
3 2 2012-06-27 2012-07-10 0
4 2 2012-07-04 2012-07-11 0
7 1 2012-07-27 2012-08-10 0