i want to know if there is a way to take the data from a dataframe after a specific condition, and keep taking that data until another condition is applied. I have the following dataframe:
column_1 column_2
0 1 a
1 1 a
2 1 b
3 4 b
4 4 c
5 4 c
6 0 d
7 0 d
8 0 e
9 4 e
10 4 f
11 4 f
12 1 g
13 1 g
I want to select from this dataframe only the rows when in column_1 when it changes from 1->4 and stays 4 until it changes to another value, as follow:
column_1 column_2
3 4 b
4 4 c
5 4 c
Is there a way to do this in Pandas and not make them lists?
CodePudding user response:
You can create helper column for groups by duplicated values new
first, then test if shifted values is 1
compare with actual row and for these rows get new
values. Last compare new
column by filtered values for all duplicated 4
rows:
df['new'] = df['column_1'].ne(df['column_1'].shift()).cumsum()
s = df.loc[df['column_1'].shift().eq(1) & df['column_1'].eq(4), 'new']
df = df[df['new'].isin(s)]
print (df)
column_1 column_2 new
3 4 b 2
4 4 c 2
5 4 c 2
CodePudding user response:
Another option is to find the cut off points using shift
eq
; then use groupby.cummax
to create a boolean filter:
df[(df['column_1'].shift().eq(1) & df['column_1'].eq(4)).groupby(df['column_1'].diff().ne(0).cumsum()).cummax()]
Output:
column_1 column_2
3 4 b
4 4 c
5 4 c