I would like to left join table_B using table_A on column ID and groupby the number of records table A has when the date in table_A is smaller than the one in talbe_B. For example,
table_A =
| ID | Date |
| ----| -------------|
| 0 | 2022-03-01 |
| 1 | 2022-03-02 |
| 0 | 2022-05-02 |
| 1 | 2022-04-02 |
table_B =
| ID | Date |
| -- | ------------ |
| 0 | 2022-05-04 |
| 1 | 2022-03-20 |
| 1 | 2022-05-02 |
| 2 | 2022-02-22 |
The resulting table should be:
| ID | Date | number_of_records |
| ---| -----------| ---------------- |
| 0 | 2022-05-04 | 2 |
| 1 | 2022-03-20 | 1 |
| 1 | 2022-05-02 | 2 |
| 2 | 2022-02-22 | 0 |
import datetime
table_A = pd.DataFrame({'ID': [0,1,0,1],
'Date':[datetime.date(2022,3,1), datetime.date(2022,3,2), datetime.date(2022,5,2), datetime.date(2022,4,2)]})
table_B = pd.DataFrame({'ID': [0,1,1,2],
'Date':[datetime.date(2022,5,4), datetime.date(2022,3,20), datetime.date(2022,5,2), datetime.date(2022,2,22)]})
# The desired output
table_C = pd.DataFrame({'ID': [0,1,1,2],
'Date':[datetime.date(2022,5,4), datetime.date(2022,3,20), datetime.date(2022,5,2), datetime.date(2022,2,22)],
'number_of_records': [2,1,2,0]})
I tried to find the unique ID in table B and use groupby to count in table_A but I am not sure how to apply the condition "smaller than the one in talbe_B".
CodePudding user response:
table_A.Date = pd.to_datetime(table_A.Date)
table_B.Date = pd.to_datetime(table_B.Date)
table_A.merge(table_B, on='ID', how='right', suffixes=['_count', None])[lambda x: x.Date_count.lt(x.Date) | x.Date_count.isna()].groupby(['ID', 'Date'], as_index=False).count()
Output:
ID Date Date_count
0 0 2022-05-04 2
1 1 2022-03-20 1
2 1 2022-05-02 2
3 2 2022-02-22 0