I have a pandas dataframe
ID | api_timestamp | event_level | |
---|---|---|---|
0 | A9 | 2022-02-28 13:20:19.305 | NaN |
1 | A9 | 2022-02-28 13:20:21.703 | L1_1 |
2 | A9 | 2022-02-28 11:54:55.897 | L1_2 |
3 | A9 | 2022-02-28 12:18:00.882 | L1_3 |
4 | B1 | 2022-02-28 12:18:04.882 | L1_1 |
5 | B1 | 2022-02-28 12:18:05.882 | L1_6 |
6 | A9 | 2022-02-28 12:18:07.882 | L2_1 |
7 | A9 | 2022-02-28 12:18:08.882 | L1_8 |
Here I have events at two levels L1 eg: L1_1,L1_2 and L2
How to calculate number of L2 events between L1 event pairs for each ID
ie Here for ID A9 , there is one L2_1 between L1_3 and L1_8
CodePudding user response:
Idea is to sort the api_timestamp
column and find the L2
event between L1
event by comparing the index of L2
event and the max and min index of L1
event.
df['api_timestamp'] = pd.to_datetime(df['api_timestamp'])
for name, g in df.dropna(subset='event_level').sort_values('api_timestamp', ignore_index=True).groupby('ID'):
l1_indices = g[g['event_level'].str[:2].eq('L1')].index.tolist()
l2_indices = g[g['event_level'].str.startswith('L2')].index.tolist()
l2_between = [l2 for l2 in l2_indices if l2 > min(l1_indices) and l2 < max(l1_indices)]
print(name, len(l2_between))
A9 1
B1 0