Home > front end >  Filter dataframe per ID based on conditional timerange
Filter dataframe per ID based on conditional timerange

Time:04-21

Hi I will try to explain the issue I am facing.

I have one dataframe (df) with the following:

ID Date (dd-mm-yyyy)
AAA 01-09-2020
AAA 01-11-2020
AAA 18-03-2021
AAA 10-10-2022
BBB 01-01-2019
BBB 01-03-2019
CCC 01-05-2020
CCC 01-07-2020
CCC 01-08-2020
CCC 01-10-2021

I have created another dataframe (df2) with the first date (t) registered per ID and t 3months:

ID T (First Date Occurred) T 3
AAA 01-09-2020 01-12-2020
BBB 01-01-2019 01-03-2020
CCC 01-05-2020 01-08-2020

The desired output where I am struggling is to filter the df based on the two date filters defined in df2("T" & "T 3):

e.g.AAA = AAA > T & AAA < T 3

ID Date (dd-mm-yyyy)
AAA 01-11-2020
BBB 01-03-2019
CCC 01-07-2020
CCC 01-08-2020

What is the best way to approach this? Any help is appreciated!

CodePudding user response:

IIUC, you can use pandas.merge_asof with allow_exact_matches=False:

(pd.merge_asof(df1.sort_values(by='Date'), df2.sort_values(by='T'),
               allow_exact_matches=False,
               by='ID', left_on='Date', right_on='T')
   .loc[lambda d: d['Date'] <= d['T 3']]
)

NB. the exact condition on the T 3 is unclear as you describe "< T 3" but the shown output has "<= T 3", just chose what you want (< or <=) in the loc

output:

    ID       Date          T        T 3
1  BBB 2019-03-01 2019-01-01 2020-03-01
3  CCC 2020-07-01 2020-05-01 2020-08-01
4  CCC 2020-08-01 2020-05-01 2020-08-01
6  AAA 2020-11-01 2020-09-01 2020-12-01
  • Related