I have two dataframes. The first contains a series of ID numbers with two dates, and the second contains a series of ID numbers with a date for an event.
import pandas as pd
df1 = pd.DataFrame({'id': ['19403', '59208', '45290'], 'date_1': ['2021-01-04', '2021-01-05', '2021-05-08'], 'date_2': ['2021-03-07', '2021-01-12', '2021-06-15']})
df2 = pd.DataFrame({'id': ['19403', '19403', '19403', '59208', '59208', '45290', '45290', '45290', '45290', '45290', '45290', '45290'], 'date': ['2021-01-05', '2021-02-12', '2021-03-17', '2021-01-06', '2021-01-08', '2021-03-14', '2021-03-17', '2021-05-09', '2021-05-12', '2021-05-23', '2021-06-01', '2021-08-14' ]})
I'm trying to merge the two dataframes so that for every row in the first dataframe, it counts the number of events in the second where the date is between the two dates in the first and the ID number matches.
output = pd.DataFrame({'id': ['19403', '59208', '45290'], 'date_1': ['2021-01-04', '2021-01-05', '2021-05-08'], 'date_2': ['2021-03-07', '2021-01-12', '2021-06-15'], 'count': [2, 2, 4]})
I've been able to mock it up one row at a time (below), but I'm struggling to apply it to the whole dataframe.
count = df2[(df2['id'] == '19403') & (df2['date'].between('2021-01-04', '2021-03-07'))]['date'].count()
Any suggestions? I'm probably missing something obvious here.
CodePudding user response:
(df1.merge(df2, on='id', how='outer')
.query("(date_1 <= date <= date_2)")
.groupby(['id', 'date_1', 'date_2'], as_index=False)
.size())
Output:
id date_1 date_2 size
0 19403 2021-01-04 2021-03-07 2
1 45290 2021-05-08 2021-06-15 4
2 59208 2021-01-05 2021-01-12 2