Home > Mobile >  check if datetime is the same across ID's after grouping by other columns in python
check if datetime is the same across ID's after grouping by other columns in python

Time:09-22

Below is a table with 4 columns: Place(object type) Dates (datetime) id (object type) event_datetime(datetime)

Place dates id event_datetime
aaa 2022-01-03 1234 2022-01-03 14:42:56
bbb 2022-03-16 4567 2022-03-16 15:31:00
bbb 2022-03-16 4567 2022-03-16 16:13:28
bbb 2022-03-16 7890 2022-03-16 15:31:00
bbb 2022-03-16 7890 2022-03-16 16:13:28
ccc 2021-09-12 2345 2021-09-12 08:01:48
ccc 2021-09-12 5486 2021-09-12 08:02:38

My task is to 1) group by place and dates and if there is more than 1 unique ID, then check if all the event_datetimes are the same. This could be done at the hour and min level and doesn't need to include seconds. The result could be in a new column Check that could have values "Yes" and "No".

For example, group place 'bbb' and date '2022-03-16', there are two ID's, '4567', '7890'. We can see that these two IDs have same event_datetimes '2022-03-16 15:31:00', '2022-03-16 16:13:28'. Therefore, the final Check column should say yes.

Whereas, for group 'ccc' and date '2021-09-12', the event_datetimes for the ID's are not the same therefore, the check column should say no.

Final output:

Place dates id event_datetime Check
aaa 2022-01-03 1234 2022-01-03 14:42:56 no
bbb 2022-03-16 4567 2022-03-16 15:31:00 yes
bbb 2022-03-16 4567 2022-03-16 16:13:28 yes
bbb 2022-03-16 7890 2022-03-16 15:31:00 yes
bbb 2022-03-16 7890 2022-03-16 16:13:28 yes
ccc 2021-09-12 2345 2021-09-12 08:01:48 no
ccc 2021-09-12 5486 2021-09-12 08:02:38 no

I appreciate any help or advice.

d = {'Place': ['aaa', 'bbb', 'bbb', 'bbb', 'bbb', 'ccc', 'ccc'],
        'dates': ['2022-01-03', '2022-03-16', '2022-03-16', 
                  '2022-03-16', '2022-03-16', '2021-09-12',
                 '2021-09-12'],
       'id': ['1234', '4567', '4567',
                 '7890', '7890', '2345', 
                 '2345'],
     'event_date_time': ['2022-01-03 14:42:56', '2022-03-16 15:31:00', '2022-03-16 16:13:28', 
                  '2022-03-16 15:31:00', '2022-03-16 16:13:28', '2021-09-12 08:01:48',
                 '2021-09-12 08:01:48']
    }
sample_df = pd.DataFrame(d)
sample_df['event_date_time'] = pd.to_datetime(sample_df['event_date_time'])
sample_df['dates'] = pd.to_datetime(sample_df['dates'])
sample_df

CodePudding user response:

One way using pandas.DataFrame.groupby with duplicated:

cols = ["Place", "id"]
m = df.groupby(cols)["event_datetime"].agg(set).duplicated(keep=False)
new_df = df.merge(m.rename("Check"), left_on=cols, right_index=True)
print(new_df)

Output:

  Place       dates    id       event_datetime  Check
0   aaa  2022-01-03  1234  2022-01-03 14:42:56  False
1   bbb  2022-03-16  4567  2022-03-16 15:31:00   True
2   bbb  2022-03-16  4567  2022-03-16 16:13:28   True
3   bbb  2022-03-16  7890  2022-03-16 15:31:00   True
4   bbb  2022-03-16  7890  2022-03-16 16:13:28   True
5   ccc  2021-09-12  2345  2021-09-12 08:01:48  False
6   ccc  2021-09-12  5486  2021-09-12 08:02:38  False
  • Related