My problem is the same as this one, with an additional constraint I can't figure how to solve :
Python PANDAS: Drop All Rows After First Occurrence of Column Value
In this post, the issue is to remove all lines once Open occurred for the 1st time :
Start :
Rank Status
1 Closed
5 Closed
6 Open
9 Closed
10 Open
Result :
Rank Status
1 Closed
5 Closed
6 Open
Here's the best answer to the problem :
df = df.sort('Rank').reset_index()
df.loc[: df[(df['Status'] == 'Open')].index[0], :]
I have the same problem, but I have multiple Shops in the same dataframe, and want it computed for all of them :
Start :
Shop Rank Status
A 1 Closed
A 5 Closed
A 6 Open
A 9 Closed
A 10 Open
A 1 Closed
B 3 Closed
B 8 Closed
B 12 Open
B 15 Closed
...
The result I want :
Shop Rank Status
A 1 Closed
A 5 Closed
A 6 Open
B 3 Closed
B 8 Closed
B 12 Open
...
How shall I modify the past answer to adapt it to all my Shops at the same time?
Thanks in advance
CodePudding user response:
Idea is processing column per groups by compare Open
and Series.cummax
, because need also first Open
is necessary shift:
mask = (df['Status'].eq('Open')
.groupby(df['Shop'])
.transform(lambda x: x.shift(fill_value=False).cummax()))
df = df[~mask]
print (df)
Shop Rank Status
0 A 1 Closed
1 A 5 Closed
2 A 6 Open
6 B 3 Closed
7 B 8 Closed
8 B 12 Open
If always Open
per groups is possible use:
def f(x):
return x.loc[: x[(x['Status'] == 'Open')].index[0], :]
df = df.groupby('Shop', group_keys=False).apply(f)
def f(x):
return x.loc[: (x['Status'] == 'Open').idxmax()]
df = df.groupby('Shop', group_keys=False).apply(f)
EDIT:
Test with general data:
print (df)
Shop Rank Status
0 A 1 Closed
1 A 5 Closed
2 A 6 Open
3 A 9 Closed
4 A 10 Open
5 A 1 Closed
6 B 3 Closed
7 B 8 Closed
8 B 12 Open
9 B 15 Closed
10 C 8 Closed <- in C group no Open
11 C 12 Closed
12 D 15 Open <- in D group first Open
13 D 15 Closed
This solution failed, because no Open
in C
group:
def f(x):
return x.loc[: x[(x['Status'] == 'Open')].index[0], :]
df = df.groupby('Shop', group_keys=False).apply(f)
IndexError: index 0 is out of bounds for axis 0 with size 0
def f(x):
return x.loc[: (x['Status'] == 'Open').idxmax()]
df = df.groupby('Shop', group_keys=False).apply(f)
print (df)
Shop Rank Status
0 A 1 Closed
1 A 5 Closed
2 A 6 Open
6 B 3 Closed
7 B 8 Closed
8 B 12 Open
10 C 8 Closed <- incorrect return first False row
12 D 15 Open
First solution working well:
mask = (df['Status'].eq('Open')
.groupby(df['Shop'])
.transform(lambda x: x.shift(fill_value=False).cummax()))
df1 = df[~mask]
print (df1)
Shop Rank Status
0 A 1 Closed
1 A 5 Closed
2 A 6 Open
6 B 3 Closed
7 B 8 Closed
8 B 12 Open
10 C 8 Closed
11 C 12 Closed
12 D 15 Open