Home > Software design >  Iterate over a df and delete all rows with date before a given date point
Iterate over a df and delete all rows with date before a given date point

Time:05-13

I have a pd dataframe that looks like the following:

id projet_id date_cod date year month p50 p90
1 DCLT 30-03-2022 01-01-2022 2022 1 5313.79 4571.03
2 DLCT 01-02-2022 2022 2 2350.25 1880.70
3 DLCT 01-03-2022 2022 3 2450.25 1763.90
4 DLCT 01-01-2023 2023 1 2180.25 1280.70
5 DLCT 01-02-2023 2023 2 4871.03 5224.03
6 MADD 01-01-2023 01-01-2022 2022 1 4575.03 1280.70
7 MADD 01-02-2022 2022 2 4331.03 5718.03
8 MADD 01-03-2022 2022 3 4331.03 1235.75
9 MADD 01-04-2023 2023 4 1224.00 1280.70
10 MADD 01-05-2023 2023 5 1480.70 1330.70
11 PEYRS 01-03-2024 01-01-2024 2024 1 1280.70 1280.70
12 PEYRS 01-05-2024 2024 5 1200.70 1235.75

Based on the date_cod,

For each projet_id (DCLT, MADD, PEYRS), I want to delete the p50 and p90 records that have their date value before the date_cod.

The output df should look like this.

id projet_id date_cod date year month p50 p90
1 DCLT 30-03-2022 01-01-2022 2022 1
2 DLCT 01-02-2022 2022 2
3 DLCT 01-03-2022 2022 3
4 DLCT 01-01-2023 2023 1 2180.25 1280.70
5 DLCT 01-02-2023 2023 2 4871.03 5224.03
6 MADD 01-01-2023 01-01-2022 2022 1
7 MADD 01-02-2022 2022 2
8 MADD 01-03-2022 2022 3
9 MADD 01-04-2023 2023 4 1224.00 1280.70
10 MADD 01-05-2023 2023 5 1480.70 1330.70
11 PEYRS 01-03-2024 01-01-2024 2024 1
12 PEYRS 01-05-2024 2024 5 1200.70 1235.75

CodePudding user response:

Something like this should work:

#Make sure dates are in to_datetime
df['date_cod']=pd.to_datetime(df['date_cod'])
df['date']=pd.to_datetime(df['date'])

#Condition:date column is less (in total seconds) than first date for each projet_id's first date_cod value
cond=((df['date']-df.groupby('projet_id')['date_cod'].transform('first')).dt.total_seconds())<0

import numpy as np

#replace by '' where condition defined above is true
df['p50']=np.where(cond,'',df['p50'])
df['p90']=np.where(cond,'',df['p90'])
df
  • Related