Home > Net >  Filter Dataframe Based on Differnce Between Multiple Columns
Filter Dataframe Based on Differnce Between Multiple Columns

Time:12-14

I am working on the following dataframe, df:

name         val_1       val_2      val_3

AAA           20         25          30       
BBB           15         20          35
CCC           25         40          45
DDD           20         20          25

I need to keep only that name where any val column increase by more than 10 from the previous val column. If one column increases by less than 10 from the previous column or doesn't even increase, we need to drop that name.

Desired output:

name

BBB #val3 increases by 15  
CCC #val2 increases by 15 

What would be the smartest way of doing it? Any suggestions would be appreciated. Thanks!

CodePudding user response:

subset = df[df[['val_1', 'val_2', 'val_3']].diff().ge(10).any(axis=1)]

Output (assuming the AAA of val_3 is 20 instead of 30):

>>> subset
  name  val_1  val_2  val_3
1  BBB     15     20     35
2  CCC     25     40     45

CodePudding user response:

The way I understand it, you want to keep the two rows when they have a difference of at least 10 on all columns.

For this, you need to build a mask with diff ge all and combine the mask with its shift:

m = df.filter(like='val_').diff().ge(10).all(1)
out = df[m|m.shift(-1)]

output:

  name  val_1  val_2  val_3
1  BBB     15     20     35
2  CCC     25     40     45
  • Related