Below is a table with 4 columns: Place(object type) Dates (datetime) id (object type) event_datetime(datetime)
Place | dates | id | event_datetime |
---|---|---|---|
aaa | 2022-01-03 | 1234 | 2022-01-03 14:42:56 |
bbb | 2022-03-16 | 4567 | 2022-03-16 15:31:00 |
bbb | 2022-03-16 | 4567 | 2022-03-16 16:13:28 |
bbb | 2022-03-16 | 7890 | 2022-03-16 15:31:00 |
bbb | 2022-03-16 | 7890 | 2022-03-16 16:13:28 |
ccc | 2021-09-12 | 2345 | 2021-09-12 08:01:48 |
ccc | 2021-09-12 | 5486 | 2021-09-12 08:02:38 |
My task is to 1) group by place and dates and if there is more than 1 unique ID, then check if all the event_datetimes are the same. This could be done at the hour and min level and doesn't need to include seconds. The result could be in a new column Check
that could have values "Yes" and "No".
For example, group place 'bbb' and date '2022-03-16', there are two ID's, '4567', '7890'. We can see that these two IDs have same event_datetimes '2022-03-16 15:31:00', '2022-03-16 16:13:28'. Therefore, the final Check
column should say yes.
Whereas, for group 'ccc' and date '2021-09-12', the event_datetimes for the ID's are not the same therefore, the check
column should say no.
Final output:
Place | dates | id | event_datetime | Check |
---|---|---|---|---|
aaa | 2022-01-03 | 1234 | 2022-01-03 14:42:56 | no |
bbb | 2022-03-16 | 4567 | 2022-03-16 15:31:00 | yes |
bbb | 2022-03-16 | 4567 | 2022-03-16 16:13:28 | yes |
bbb | 2022-03-16 | 7890 | 2022-03-16 15:31:00 | yes |
bbb | 2022-03-16 | 7890 | 2022-03-16 16:13:28 | yes |
ccc | 2021-09-12 | 2345 | 2021-09-12 08:01:48 | no |
ccc | 2021-09-12 | 5486 | 2021-09-12 08:02:38 | no |
I appreciate any help or advice.
d = {'Place': ['aaa', 'bbb', 'bbb', 'bbb', 'bbb', 'ccc', 'ccc'],
'dates': ['2022-01-03', '2022-03-16', '2022-03-16',
'2022-03-16', '2022-03-16', '2021-09-12',
'2021-09-12'],
'id': ['1234', '4567', '4567',
'7890', '7890', '2345',
'2345'],
'event_date_time': ['2022-01-03 14:42:56', '2022-03-16 15:31:00', '2022-03-16 16:13:28',
'2022-03-16 15:31:00', '2022-03-16 16:13:28', '2021-09-12 08:01:48',
'2021-09-12 08:01:48']
}
sample_df = pd.DataFrame(d)
sample_df['event_date_time'] = pd.to_datetime(sample_df['event_date_time'])
sample_df['dates'] = pd.to_datetime(sample_df['dates'])
sample_df
CodePudding user response:
One way using pandas.DataFrame.groupby
with duplicated
:
cols = ["Place", "id"]
m = df.groupby(cols)["event_datetime"].agg(set).duplicated(keep=False)
new_df = df.merge(m.rename("Check"), left_on=cols, right_index=True)
print(new_df)
Output:
Place dates id event_datetime Check
0 aaa 2022-01-03 1234 2022-01-03 14:42:56 False
1 bbb 2022-03-16 4567 2022-03-16 15:31:00 True
2 bbb 2022-03-16 4567 2022-03-16 16:13:28 True
3 bbb 2022-03-16 7890 2022-03-16 15:31:00 True
4 bbb 2022-03-16 7890 2022-03-16 16:13:28 True
5 ccc 2021-09-12 2345 2021-09-12 08:01:48 False
6 ccc 2021-09-12 5486 2021-09-12 08:02:38 False