Hi I will try to explain the issue I am facing.
I have one dataframe (df) with the following:
ID | Date (dd-mm-yyyy) |
---|---|
AAA | 01-09-2020 |
AAA | 01-11-2020 |
AAA | 18-03-2021 |
AAA | 10-10-2022 |
BBB | 01-01-2019 |
BBB | 01-03-2019 |
CCC | 01-05-2020 |
CCC | 01-07-2020 |
CCC | 01-08-2020 |
CCC | 01-10-2021 |
I have created another dataframe (df2) with the first date (t) registered per ID and t 3months:
ID | T (First Date Occurred) | T 3 |
---|---|---|
AAA | 01-09-2020 | 01-12-2020 |
BBB | 01-01-2019 | 01-03-2020 |
CCC | 01-05-2020 | 01-08-2020 |
The desired output where I am struggling is to filter the df based on the two date filters defined in df2("T" & "T 3):
e.g.AAA = AAA > T & AAA < T 3
ID | Date (dd-mm-yyyy) |
---|---|
AAA | 01-11-2020 |
BBB | 01-03-2019 |
CCC | 01-07-2020 |
CCC | 01-08-2020 |
What is the best way to approach this? Any help is appreciated!
CodePudding user response:
IIUC, you can use pandas.merge_asof
with allow_exact_matches=False
:
(pd.merge_asof(df1.sort_values(by='Date'), df2.sort_values(by='T'),
allow_exact_matches=False,
by='ID', left_on='Date', right_on='T')
.loc[lambda d: d['Date'] <= d['T 3']]
)
NB. the exact condition on the T 3 is unclear as you describe "< T 3" but the shown output has "<= T 3", just chose what you want (< or <=) in the loc
output:
ID Date T T 3
1 BBB 2019-03-01 2019-01-01 2020-03-01
3 CCC 2020-07-01 2020-05-01 2020-08-01
4 CCC 2020-08-01 2020-05-01 2020-08-01
6 AAA 2020-11-01 2020-09-01 2020-12-01