Iterate over a df and delete all rows with date before a given date point-CodePudding

I have a pd dataframe that looks like the following:

id	projet_id	date_cod	date	year	month	p50	p90
1	DCLT	30-03-2022	01-01-2022	2022	1	5313.79	4571.03
2	DLCT		01-02-2022	2022	2	2350.25	1880.70
3	DLCT		01-03-2022	2022	3	2450.25	1763.90
4	DLCT		01-01-2023	2023	1	2180.25	1280.70
5	DLCT		01-02-2023	2023	2	4871.03	5224.03
6	MADD	01-01-2023	01-01-2022	2022	1	4575.03	1280.70
7	MADD		01-02-2022	2022	2	4331.03	5718.03
8	MADD		01-03-2022	2022	3	4331.03	1235.75
9	MADD		01-04-2023	2023	4	1224.00	1280.70
10	MADD		01-05-2023	2023	5	1480.70	1330.70
11	PEYRS	01-03-2024	01-01-2024	2024	1	1280.70	1280.70
12	PEYRS		01-05-2024	2024	5	1200.70	1235.75

Based on the date_cod,

For each projet_id (DCLT, MADD, PEYRS), I want to delete the p50 and p90 records that have their date value before the date_cod.

The output df should look like this.

id	projet_id	date_cod	date	year	month	p50	p90
1	DCLT	30-03-2022	01-01-2022	2022	1
2	DLCT		01-02-2022	2022	2
3	DLCT		01-03-2022	2022	3
4	DLCT		01-01-2023	2023	1	2180.25	1280.70
5	DLCT		01-02-2023	2023	2	4871.03	5224.03
6	MADD	01-01-2023	01-01-2022	2022	1
7	MADD		01-02-2022	2022	2
8	MADD		01-03-2022	2022	3
9	MADD		01-04-2023	2023	4	1224.00	1280.70
10	MADD		01-05-2023	2023	5	1480.70	1330.70
11	PEYRS	01-03-2024	01-01-2024	2024	1
12	PEYRS		01-05-2024	2024	5	1200.70	1235.75

CodePudding user response：

Something like this should work:

#Make sure dates are in to_datetime
df['date_cod']=pd.to_datetime(df['date_cod'])
df['date']=pd.to_datetime(df['date'])

#Condition:date column is less (in total seconds) than first date for each projet_id's first date_cod value
cond=((df['date']-df.groupby('projet_id')['date_cod'].transform('first')).dt.total_seconds())<0

import numpy as np

#replace by '' where condition defined above is true
df['p50']=np.where(cond,'',df['p50'])
df['p90']=np.where(cond,'',df['p90'])
df