Lets say we have a sample dataframe with ID and dates. We also have a date_list = ['2021-12-16', 2021-11-20'] (yyyy-mm-dd)
sample_df: id (str) = unique id dates (datetime) period (str) = 24h is divided into four 6h period. Therefore, period 2 refers to time between 6am - 12 noon for example.
id | dates | period |
---|---|---|
aaaa | 2021-11-16 | 2 |
aaaa | 2021-12-16 | 1 |
aaaa | 2022-09-05 | 2 |
bbbb | 2021-11-20 | 1 |
bbbb | 2022-01-22 | 3 |
bbbb | 2021-09-12 | 1 |
bbbb | 2022-03-23 | 3 |
bbbb | 2021-11-21 | 2 |
bbbb | 2021-11-21 | 1 |
For group ID, bbbb
I would like to check:
if the dates in the dataframe are three previous consecutive dates from date_list[1] = '2021-11-20'(i.e., Nov 19, 2021, Nov 18, 2021, Nov 17, 2021).
If there are consecutive dates in the 3 day window, check if these consecutive dates have at least 2 periods per day, i.e., two rows with same dates but different periods.
If so, then calculate the length of rows and add to the table as a column. So in this example for bbbb, there should be two rows that fall within 3-day period of the given date and has 2 periods.
Similarly, for aaaa
ID, we need to check 1) if the dates are 3 previous consecutive dates from date_list[0] = '2021-12-16'. Repeat same steps as above.
Any help is appreciated!
d = {'id': ['aaaa', 'aaaa', 'aaaa', 'bbbb', 'bbbb', 'bbbb', 'bbbb', 'bbbb', 'bbbb'],
'dates': ['2021-11-16', '2021-12-16', '2022-09-05',
'2021-11-20', '2022-01-22', '2021-09-12',
'2022-03-23', '2021-11-21', '2021-11-21'],
'period': ['2', '1', '2',
'1', '3', '1',
'3', '2', '1']}
sample_df = pd.DataFrame(d)
sample_df
CodePudding user response:
df.dates = pd.to_datetime(df.dates)
df.loc[df.id.eq('aaaa'), 'match'] = date_list[0]
df.loc[df.id.eq('bbbb'), 'match'] = date_list[1]
df.match = pd.to_datetime(df.match)
mask = (df.dates.sub(df.match).abs().between('1d', '3d') &
df.groupby(['id', 'dates'])['period'].transform('count').ge(2))
print(df[mask])
# Output - Rows that match your conditions:
id dates period match
7 bbbb 2021-11-21 2 2021-11-20
8 bbbb 2021-11-21 1 2021-11-20