I have a pd dataframe that looks like the following:
id | projet_id | date_cod | date | year | month | p50 | p90 |
---|---|---|---|---|---|---|---|
1 | DCLT | 30-03-2022 | 01-01-2022 | 2022 | 1 | 5313.79 | 4571.03 |
2 | DLCT | 01-02-2022 | 2022 | 2 | 2350.25 | 1880.70 | |
3 | DLCT | 01-03-2022 | 2022 | 3 | 2450.25 | 1763.90 | |
4 | DLCT | 01-01-2023 | 2023 | 1 | 2180.25 | 1280.70 | |
5 | DLCT | 01-02-2023 | 2023 | 2 | 4871.03 | 5224.03 | |
6 | MADD | 01-01-2023 | 01-01-2022 | 2022 | 1 | 4575.03 | 1280.70 |
7 | MADD | 01-02-2022 | 2022 | 2 | 4331.03 | 5718.03 | |
8 | MADD | 01-03-2022 | 2022 | 3 | 4331.03 | 1235.75 | |
9 | MADD | 01-04-2023 | 2023 | 4 | 1224.00 | 1280.70 | |
10 | MADD | 01-05-2023 | 2023 | 5 | 1480.70 | 1330.70 | |
11 | PEYRS | 01-03-2024 | 01-01-2024 | 2024 | 1 | 1280.70 | 1280.70 |
12 | PEYRS | 01-05-2024 | 2024 | 5 | 1200.70 | 1235.75 |
Based on the date_cod
,
For each projet_id
(DCLT, MADD, PEYRS), I want to delete the p50
and p90
records that have their date
value before the date_cod
.
The output df should look like this.
id | projet_id | date_cod | date | year | month | p50 | p90 |
---|---|---|---|---|---|---|---|
1 | DCLT | 30-03-2022 | 01-01-2022 | 2022 | 1 | ||
2 | DLCT | 01-02-2022 | 2022 | 2 | |||
3 | DLCT | 01-03-2022 | 2022 | 3 | |||
4 | DLCT | 01-01-2023 | 2023 | 1 | 2180.25 | 1280.70 | |
5 | DLCT | 01-02-2023 | 2023 | 2 | 4871.03 | 5224.03 | |
6 | MADD | 01-01-2023 | 01-01-2022 | 2022 | 1 | ||
7 | MADD | 01-02-2022 | 2022 | 2 | |||
8 | MADD | 01-03-2022 | 2022 | 3 | |||
9 | MADD | 01-04-2023 | 2023 | 4 | 1224.00 | 1280.70 | |
10 | MADD | 01-05-2023 | 2023 | 5 | 1480.70 | 1330.70 | |
11 | PEYRS | 01-03-2024 | 01-01-2024 | 2024 | 1 | ||
12 | PEYRS | 01-05-2024 | 2024 | 5 | 1200.70 | 1235.75 |
CodePudding user response:
Something like this should work:
#Make sure dates are in to_datetime
df['date_cod']=pd.to_datetime(df['date_cod'])
df['date']=pd.to_datetime(df['date'])
#Condition:date column is less (in total seconds) than first date for each projet_id's first date_cod value
cond=((df['date']-df.groupby('projet_id')['date_cod'].transform('first')).dt.total_seconds())<0
import numpy as np
#replace by '' where condition defined above is true
df['p50']=np.where(cond,'',df['p50'])
df['p90']=np.where(cond,'',df['p90'])
df