Home > Net >  Drop All Rows After First Occurrence of Column Value
Drop All Rows After First Occurrence of Column Value

Time:12-09

My problem is the same as this one, with an additional constraint I can't figure how to solve :

Python PANDAS: Drop All Rows After First Occurrence of Column Value

In this post, the issue is to remove all lines once Open occurred for the 1st time :

Start :

Rank Status
1    Closed
5    Closed
6    Open
9    Closed
10   Open

Result :

Rank Status
 1    Closed
 5    Closed
 6    Open

Here's the best answer to the problem :

df = df.sort('Rank').reset_index()
df.loc[: df[(df['Status'] == 'Open')].index[0], :]

I have the same problem, but I have multiple Shops in the same dataframe, and want it computed for all of them :

Start :

Shop Rank  Status
A    1     Closed
A    5     Closed
A    6     Open
A    9     Closed
A    10    Open
A    1     Closed
B    3     Closed
B    8     Closed
B    12    Open
B    15    Closed
...

The result I want :

Shop Rank  Status
A    1     Closed
A    5     Closed
A    6     Open
B    3     Closed
B    8     Closed
B    12    Open
...

How shall I modify the past answer to adapt it to all my Shops at the same time?

Thanks in advance

CodePudding user response:

Idea is processing column per groups by compare Open and Series.cummax, because need also first Open is necessary shift:

mask = (df['Status'].eq('Open')
                    .groupby(df['Shop'])
                    .transform(lambda x: x.shift(fill_value=False).cummax()))

df = df[~mask]
print (df)
  Shop  Rank  Status
0    A     1  Closed
1    A     5  Closed
2    A     6    Open
6    B     3  Closed
7    B     8  Closed
8    B    12    Open

If always Open per groups is possible use:

def f(x):
    
    return x.loc[: x[(x['Status'] == 'Open')].index[0], :]

df = df.groupby('Shop', group_keys=False).apply(f)

def f(x):
    
    return x.loc[: (x['Status'] == 'Open').idxmax()]

df = df.groupby('Shop', group_keys=False).apply(f)

EDIT:

Test with general data:

print (df)
   Shop  Rank  Status
0     A     1  Closed
1     A     5  Closed
2     A     6    Open
3     A     9  Closed
4     A    10    Open
5     A     1  Closed
6     B     3  Closed
7     B     8  Closed
8     B    12    Open
9     B    15  Closed
10    C     8  Closed <- in C group no Open
11    C    12  Closed
12    D    15    Open <- in D group first Open
13    D    15  Closed

This solution failed, because no Open in C group:

def f(x):
    
    return x.loc[: x[(x['Status'] == 'Open')].index[0], :]

df = df.groupby('Shop', group_keys=False).apply(f)

IndexError: index 0 is out of bounds for axis 0 with size 0


def f(x):
    
    return x.loc[: (x['Status'] == 'Open').idxmax()]

df = df.groupby('Shop', group_keys=False).apply(f)

print (df)
   Shop  Rank  Status
0     A     1  Closed
1     A     5  Closed
2     A     6    Open
6     B     3  Closed
7     B     8  Closed
8     B    12    Open
10    C     8  Closed <- incorrect return first False row
12    D    15    Open

First solution working well:

mask = (df['Status'].eq('Open')
                    .groupby(df['Shop'])
                    .transform(lambda x: x.shift(fill_value=False).cummax()))

df1 = df[~mask]
print (df1)
   Shop  Rank  Status
0     A     1  Closed
1     A     5  Closed
2     A     6    Open
6     B     3  Closed
7     B     8  Closed
8     B    12    Open
10    C     8  Closed
11    C    12  Closed
12    D    15    Open
  • Related