Home > Mobile >  A function in pandas or numpy to filter column by list of values that follows it
A function in pandas or numpy to filter column by list of values that follows it

Time:07-16

Given a dataframe df:

import pandas as pd
df = pd.DataFrame({"changes":['increase', 'constant', 'constant', 'constant', 'decline', 'constant', 'constant', 'increase', 'constant', 'constant', 'constant','decline', 'constant', 'constant', 'constant',})

output:

changes
increase
constant
constant
constant
decline
constant
constant
increase
constant
constant
constant
decline
constant
constant
constant

The task is to delete the rows with decline and the constant that comes after it. I do not want to remove increase and the constant coming after it.

The expected output in this case should look like:

changes
increase
constant
constant
constant
increase
constant
constant
constant

CodePudding user response:

df = pd.DataFrame({"changes":['increase', 'constant', 'constant', 'constant', 'decline', 'constant', 'constant', 'increase', 'constant', 'constant', 'constant','decline', 'constant', 'constant', 'constant']})

### Build group
df['group'] = df['changes'].ne(df['changes'].shift()).cumsum()
df
###
     changes  group
0   increase      1
1   constant      2
2   constant      2
3   constant      2
4    decline      3
5   constant      4
6   constant      4
7   increase      5
8   constant      6
9   constant      6
10  constant      6
11   decline      7
12  constant      8
13  constant      8
14  constant      8

Create masks to filter out unwanted data

mask_1 = df['changes'].eq('decline') & df['changes'].shift(-1).eq('constant')
mask_2 = df['changes'].eq('constant') & df['changes'].shift().eq('decline')
groups = df.loc[mask_1 | mask_2, 'group']
groups

enter image description here

It indicates group 3,4,7,8 should be excluded




Assign filtered data to result

result = df[~df['group'].isin(groups)].drop(columns=['group'])
result
###
    changes
0  increase
1  constant
2  constant
3  constant
4  increase
5  constant
6  constant
7  constant

CodePudding user response:

You could use shift() but that won't give proper result IMO. For consistent and robust output, you can do:

decline_idx = df.query("changes == 'decline'").index
constant_idx = df.loc[decline_idx 1].query("changes == 'constant'").index
df = df.drop(decline_idx.union(constant_idx)) if not constant_idx.empty else df

Or if you want to drop decline anyway you can just drop without checking constant:

df.drop(decline_idx.union(constant_idx), inplace=True)

print(df):

     changes
0   increase
1   constant
2   constant
3   constant
6   constant
7   increase
8   constant
9   constant
10  constant
13  constant
14  constant
  • Related