dataframe
id beg end
guest1 2021-10-21 17:00:00 2021-10-21 18:00:00
guest2 2021-10-21 10:00:00 2021-10-22 10:00:00
I want to test
if the client appears exactly one time and beg and end are not the same day then do like (guest1). col1 gets 2021-10-21 00:00:00 and col2 gets beg
if the client appears exactly one time and beg and end are the same day then do like (guest2).
id beg end col1 col2
guest1 2021-10-21 17:00:00 2021-10-21 18:00:00 2021-10-21 00:00:00 2021-10-21 17:00:00
guest1 2021-10-21 17:00:00 2021-10-21 18:00:00 2021-10-21 18:00:00 2021-10-21 23:59:59
guest2 2021-10-21 10:00:00 2021-10-22 11:00:00 2021-10-21 00:00:00 2021-10-21 10:00:00
CodePudding user response:
You can add another column to see the difference in days and then filter out what you need.
data = [('guest1', '2021-10-21 17:00:00', '2021-10-21 18:00:00' ), ('guest2', '2021-10-21 17:00:00', '2021-10-22 18:00:00' )]
cols = ['id', 'beg', 'end' ]
pdf = pd.DataFrame(data, columns=cols)
pdf[['beg', 'end']] = pdf[['beg', 'end']].apply(pd.to_datetime)
pdf['n_days'] = (pdf['end'] - pdf['beg']).dt.days
id beg end n_days
0 guest1 2021-10-21 17:00:00 2021-10-21 18:00:00 0
1 guest2 2021-10-21 17:00:00 2021-10-22 18:00:00 1
CodePudding user response:
You can use value_counts, callable indexing and isin to filter your data to only include ones with a single row and use another callable index to filter your data according to your greater than 1 day condition.
df[df['id'].isin(df['id'].value_counts().loc[lambda x: x==1].index)].loc[lambda x: x['end'] - x['beg'] > pd.Timedelta('1 days')]