I am working on the following dataframe, df
:
name val_1 val_2 val_3
AAA 20 25 30
BBB 15 20 35
CCC 25 40 45
DDD 20 20 25
I need to keep only that name where any val
column increase by more than 10 from the previous val
column. If one column increases by less than 10 from the previous column or doesn't even increase, we need to drop that name.
Desired output:
name
BBB #val3 increases by 15
CCC #val2 increases by 15
What would be the smartest way of doing it? Any suggestions would be appreciated. Thanks!
CodePudding user response:
subset = df[df[['val_1', 'val_2', 'val_3']].diff().ge(10).any(axis=1)]
Output (assuming the AAA
of val_3
is 20 instead of 30):
>>> subset
name val_1 val_2 val_3
1 BBB 15 20 35
2 CCC 25 40 45
CodePudding user response:
The way I understand it, you want to keep the two rows when they have a difference of at least 10 on all columns.
For this, you need to build a mask with diff
ge
all
and combine the mask with its shift
:
m = df.filter(like='val_').diff().ge(10).all(1)
out = df[m|m.shift(-1)]
output:
name val_1 val_2 val_3
1 BBB 15 20 35
2 CCC 25 40 45