How to filter rows by group in a pandas dataframe-CodePudding

Suppose now I have some group data like

GroupID	ID	Rank	target
A	1	1	0
A	2	3	0
A	3	2	1
B	1	1	0
B	2	4	0
B	3	3	1
B	4	2	0
C	1	1	1
C	2	4	0
C	3	3	1
C	4	2	0
D	1	1	0
D	2	4	0
D	3	3	0
D	4	2	0

For each group,

I want to filter the group which has no rows which target=1.
Then I want to keep the row which target==1 and the rows which rank is higher than it. Some group may have many rows which target==1, and we choose the one which rank is lower as our target. For example for group C, the ID=1 and ID=3 all have target==1, we will keep the rows which the rank<=3. So we will get

GroupID	ID	Rank	target
A	1	1	0
A	3	2	1
B	1	1	0
B	3	3	1
B	4	2	0
C	1	1	1
C	3	3	1
C	4	2	0

CodePudding user response：

Replace Rank in Series.where if target is not 1 and then use GroupBy.transform for maximal Rank per group, so possible compare Rank column in boolean indexing by Series.le for less or equal:

s = df['Rank'].where(df['target'].eq(1)).groupby(df['GroupID']).transform('max')
df = df[df['Rank'].le(s)]
print (df)
   GroupID  ID  Rank  target
0        A   1     1       0
2        A   3     2       1
3        B   1     1       0
5        B   3     3       1
6        B   4     2       0
7        C   1     1       1
9        C   3     3       1
10       C   4     2       0

Details:

print (df['Rank'].where(df['target'].eq(1)))
0     NaN
1     NaN
2     2.0
3     NaN
4     NaN
5     3.0
6     NaN
7     1.0
8     NaN
9     3.0
10    NaN
Name: Rank, dtype: float64

print (s)
0     2.0
1     2.0
2     2.0
3     3.0
4     3.0
5     3.0
6     3.0
7     3.0
8     3.0
9     3.0
10    3.0
Name: Rank, dtype: float64

CodePudding user response：

IIUC, make a first pass to slice the rows with target == 1 (using eq), then get the max rank per group using GroupBy.max and select the rows with this maximum rank per group with classical boolean indexing using le:

thresh = df[df['target'].eq(1)].groupby('GroupID')['Rank'].max()

out = df[df['Rank'].le(df['GroupID'].map(thresh))]

output:

   GroupID  ID  Rank  target
0        A   1     1       0
2        A   3     2       1
3        B   1     1       0
5        B   3     3       1
6        B   4     2       0
7        C   1     1       1
9        C   3     3       1
10       C   4     2       0

thresholds:

>>> thresh
GroupID
A    2
B    3
C    3

GroupID	ID	Rank	target
A	1	1	0
A	2	3	0
A	3	2	1
B	1	1	0
B	2	4	0
B	3	3	1
B	4	2	0
C	1	1	1
C	2	4	0
C	3	3	1
C	4	2	0
D	1	1	0
D	2	4	0
D	3	3	0
D	4	2	0

GroupID	ID	Rank	target
A	1	1	0
A	2	3	0
A	3	2	1
B	1	1	0
B	2	4	0
B	3	3	1
B	4	2	0
C	1	1	1
C	2	4	0
C	3	3	1
C	4	2	0
D	1	1	0
D	2	4	0
D	3	3	0
D	4	2	0

GroupID	ID	Rank	target
A	1	1	0
A	2	3	0
A	3	2	1
B	1	1	0
B	2	4	0
B	3	3	1
B	4	2	0
C	1	1	1
C	2	4	0
C	3	3	1
C	4	2	0
D	1	1	0
D	2	4	0
D	3	3	0
D	4	2	0