Given a dataframe df:
import pandas as pd
df = pd.DataFrame({"changes":['increase', 'constant', 'constant', 'constant', 'decline', 'constant', 'constant', 'increase', 'constant', 'constant', 'constant','decline', 'constant', 'constant', 'constant',})
output:
changes |
---|
increase |
constant |
constant |
constant |
decline |
constant |
constant |
increase |
constant |
constant |
constant |
decline |
constant |
constant |
constant |
The task is to delete the rows with decline
and the constant
that comes after it.
I do not want to remove increase
and the constant
coming after it.
The expected output in this case should look like:
changes |
---|
increase |
constant |
constant |
constant |
increase |
constant |
constant |
constant |
CodePudding user response:
df = pd.DataFrame({"changes":['increase', 'constant', 'constant', 'constant', 'decline', 'constant', 'constant', 'increase', 'constant', 'constant', 'constant','decline', 'constant', 'constant', 'constant']})
### Build group
df['group'] = df['changes'].ne(df['changes'].shift()).cumsum()
df
###
changes group
0 increase 1
1 constant 2
2 constant 2
3 constant 2
4 decline 3
5 constant 4
6 constant 4
7 increase 5
8 constant 6
9 constant 6
10 constant 6
11 decline 7
12 constant 8
13 constant 8
14 constant 8
Create masks to filter out unwanted data
mask_1 = df['changes'].eq('decline') & df['changes'].shift(-1).eq('constant')
mask_2 = df['changes'].eq('constant') & df['changes'].shift().eq('decline')
groups = df.loc[mask_1 | mask_2, 'group']
groups
It indicates group
3
,4
,7
,8
should be excluded
Assign filtered data to result
result = df[~df['group'].isin(groups)].drop(columns=['group'])
result
###
changes
0 increase
1 constant
2 constant
3 constant
4 increase
5 constant
6 constant
7 constant
CodePudding user response:
You could use shift() but that won't give proper result IMO. For consistent and robust output, you can do:
decline_idx = df.query("changes == 'decline'").index
constant_idx = df.loc[decline_idx 1].query("changes == 'constant'").index
df = df.drop(decline_idx.union(constant_idx)) if not constant_idx.empty else df
Or if you want to drop decline
anyway you can just drop without checking constant
:
df.drop(decline_idx.union(constant_idx), inplace=True)
print(df):
changes
0 increase
1 constant
2 constant
3 constant
6 constant
7 increase
8 constant
9 constant
10 constant
13 constant
14 constant