Home > Software engineering >  How to backfill an interval from first value to last once answering a condition
How to backfill an interval from first value to last once answering a condition

Time:12-22

I want to identify incline/decile if a column. So, in order to achieve this, I create a data frame with some values and conditions. The goal is to get the 2 res_e1,res_e2 following this logic:

if condition1 is met, the row is relevant, otherwise the value is NaN if relevant - take the first row value and the last value and decrease the last from the first. the results is either <0 or >0 for the entire interval. In the example below, e1 relevant values are idx 2 and idx 8, the corresponding values are 4,3 and therefore in res_e1 all True relevant rows are marked with 3-4 < 0 => "-" The same for e2, the values are 3,8 for the same indexes, the entire interval is " " Of course this should apply to all "True" intervals in the df

import pandas as pd 

df = pd.DataFrame({
    'e1': [1,2,4,4,5,6,5,4,3,2,0],
    'e2': [1,2,3,4,3,2,-5,-10,8,10,11],
    'condition1': ['A','A','B','B','B','B','B','B','B','A','A']
})
mask_c = df.condition1 == 'B'
df['relevant'] = mask_c

df['res_e1'] = ['NaN','NaN','-','-','-','-','-','-','-','NaN','NaN']
df['res_e2'] = ['NaN','NaN',' ',' ',' ',' ',' ',' ',' ','NaN','NaN']

print(df)

After print:

    e1  e2 condition1  relevant res_e1 res_e2
0    1   1          A     False    NaN    NaN
1    2   2          A     False    NaN    NaN
2    4   3          B      True      -       
3    4   4          B      True      -       
4    5   3          B      True      -       
5    6   2          B      True      -       
6    5  -5          B      True      -       
7    4 -10          B      True      -       
8    3   8          B      True      -       
9    2  10          A     False    NaN    NaN
10   0  11          A     False    NaN    NaN

I have tried to add a mask in order to detect the relevant frames, and some aggregation methods but I have no pretty way to get the first and last relevant values only and "color" the entire interval with their values. I added 2 examples e1,e2 and results res_e1, res_e2 just to simplify the question.

CodePudding user response:

You can use boolean indexing and groupby.transform:

df = df.join(
 df[df['relevant']]
 .groupby(df['condition1'])[['e1', 'e2']]
 .transform(lambda s: ' ' if s.iloc[-1]-s.iloc[0]>0 else '-')
 .add_prefix('res_')
)

Output:

    e1  e2 condition1  relevant res_e1 res_e2
0    1   1          A     False    NaN    NaN
1    2   2          A     False    NaN    NaN
2    4   3          B      True      -       
3    4   4          B      True      -       
4    5   3          B      True      -       
5    6   2          B      True      -       
6    5  -5          B      True      -       
7    4 -10          B      True      -       
8    3   8          B      True      -       
9    2  10          A     False    NaN    NaN
10   0  11          A     False    NaN    NaN
  • Related