Drop All Rows After First Occurrence of Column Value-CodePudding

My problem is the same as this one, with an additional constraint I can't figure how to solve :

Python PANDAS: Drop All Rows After First Occurrence of Column Value

In this post, the issue is to remove all lines once Open occurred for the 1st time :

Start :

Rank Status
1    Closed
5    Closed
6    Open
9    Closed
10   Open

Result :

Rank Status
 1    Closed
 5    Closed
 6    Open

Here's the best answer to the problem :

df = df.sort('Rank').reset_index()
df.loc[: df[(df['Status'] == 'Open')].index[0], :]

I have the same problem, but I have multiple Shops in the same dataframe, and want it computed for all of them :

Start :

Shop Rank  Status
A    1     Closed
A    5     Closed
A    6     Open
A    9     Closed
A    10    Open
A    1     Closed
B    3     Closed
B    8     Closed
B    12    Open
B    15    Closed
...

The result I want :

Shop Rank  Status
A    1     Closed
A    5     Closed
A    6     Open
B    3     Closed
B    8     Closed
B    12    Open
...

How shall I modify the past answer to adapt it to all my Shops at the same time?

Thanks in advance

CodePudding user response：

Idea is processing column per groups by compare Open and Series.cummax, because need also first Open is necessary shift:

mask = (df['Status'].eq('Open')
                    .groupby(df['Shop'])
                    .transform(lambda x: x.shift(fill_value=False).cummax()))

df = df[~mask]
print (df)
  Shop  Rank  Status
0    A     1  Closed
1    A     5  Closed
2    A     6    Open
6    B     3  Closed
7    B     8  Closed
8    B    12    Open

If always Open per groups is possible use:

def f(x):
    
    return x.loc[: x[(x['Status'] == 'Open')].index[0], :]

df = df.groupby('Shop', group_keys=False).apply(f)

def f(x):
    
    return x.loc[: (x['Status'] == 'Open').idxmax()]

df = df.groupby('Shop', group_keys=False).apply(f)

EDIT:

Test with general data:

print (df)
   Shop  Rank  Status
0     A     1  Closed
1     A     5  Closed
2     A     6    Open
3     A     9  Closed
4     A    10    Open
5     A     1  Closed
6     B     3  Closed
7     B     8  Closed
8     B    12    Open
9     B    15  Closed
10    C     8  Closed <- in C group no Open
11    C    12  Closed
12    D    15    Open <- in D group first Open
13    D    15  Closed

This solution failed, because no Open in C group:

def f(x):
    
    return x.loc[: x[(x['Status'] == 'Open')].index[0], :]

df = df.groupby('Shop', group_keys=False).apply(f)

IndexError: index 0 is out of bounds for axis 0 with size 0

def f(x):
    
    return x.loc[: (x['Status'] == 'Open').idxmax()]

df = df.groupby('Shop', group_keys=False).apply(f)

print (df)
   Shop  Rank  Status
0     A     1  Closed
1     A     5  Closed
2     A     6    Open
6     B     3  Closed
7     B     8  Closed
8     B    12    Open
10    C     8  Closed <- incorrect return first False row
12    D    15    Open

First solution working well:

mask = (df['Status'].eq('Open')
                    .groupby(df['Shop'])
                    .transform(lambda x: x.shift(fill_value=False).cummax()))

df1 = df[~mask]
print (df1)
   Shop  Rank  Status
0     A     1  Closed
1     A     5  Closed
2     A     6    Open
6     B     3  Closed
7     B     8  Closed
8     B    12    Open
10    C     8  Closed
11    C    12  Closed
12    D    15    Open