Home > Software engineering >  Is there a way to get the data after a specific condition in Pandas?
Is there a way to get the data after a specific condition in Pandas?

Time:05-03

i want to know if there is a way to take the data from a dataframe after a specific condition, and keep taking that data until another condition is applied. I have the following dataframe:

    column_1 column_2
0          1        a
1          1        a
2          1        b
3          4        b
4          4        c
5          4        c
6          0        d
7          0        d
8          0        e
9          4        e
10         4        f
11         4        f
12         1        g
13         1        g

I want to select from this dataframe only the rows when in column_1 when it changes from 1->4 and stays 4 until it changes to another value, as follow:

    column_1 column_2
3          4        b
4          4        c
5          4        c

Is there a way to do this in Pandas and not make them lists?

CodePudding user response:

You can create helper column for groups by duplicated values new first, then test if shifted values is 1 compare with actual row and for these rows get new values. Last compare new column by filtered values for all duplicated 4 rows:

df['new'] = df['column_1'].ne(df['column_1'].shift()).cumsum()
s = df.loc[df['column_1'].shift().eq(1) & df['column_1'].eq(4), 'new']

df = df[df['new'].isin(s)]
print (df)
   column_1 column_2  new
3         4        b    2
4         4        c    2
5         4        c    2

CodePudding user response:

Another option is to find the cut off points using shift eq; then use groupby.cummax to create a boolean filter:

df[(df['column_1'].shift().eq(1) & df['column_1'].eq(4)).groupby(df['column_1'].diff().ne(0).cumsum()).cummax()]

Output:

   column_1 column_2
3         4        b
4         4        c
5         4        c
  • Related