Suppose now I have some group data like
GroupID | ID | Rank | target |
---|---|---|---|
A | 1 | 1 | 0 |
A | 2 | 3 | 0 |
A | 3 | 2 | 1 |
B | 1 | 1 | 0 |
B | 2 | 4 | 0 |
B | 3 | 3 | 1 |
B | 4 | 2 | 0 |
C | 1 | 1 | 1 |
C | 2 | 4 | 0 |
C | 3 | 3 | 1 |
C | 4 | 2 | 0 |
D | 1 | 1 | 0 |
D | 2 | 4 | 0 |
D | 3 | 3 | 0 |
D | 4 | 2 | 0 |
For each group,
I want to filter the group which has no rows which target=1.
Then I want to keep the row which target==1 and the rows which rank is higher than it. Some group may have many rows which target==1, and we choose the one which rank is lower as our target. For example for group C, the ID=1 and ID=3 all have target==1, we will keep the rows which the rank<=3. So we will get
GroupID | ID | Rank | target |
---|---|---|---|
A | 1 | 1 | 0 |
A | 3 | 2 | 1 |
B | 1 | 1 | 0 |
B | 3 | 3 | 1 |
B | 4 | 2 | 0 |
C | 1 | 1 | 1 |
C | 3 | 3 | 1 |
C | 4 | 2 | 0 |
CodePudding user response:
Replace Rank
in Series.where
if target is not 1
and then use GroupBy.transform
for maximal Rank
per group, so possible compare Rank
column in boolean indexing
by Series.le
for less or equal:
s = df['Rank'].where(df['target'].eq(1)).groupby(df['GroupID']).transform('max')
df = df[df['Rank'].le(s)]
print (df)
GroupID ID Rank target
0 A 1 1 0
2 A 3 2 1
3 B 1 1 0
5 B 3 3 1
6 B 4 2 0
7 C 1 1 1
9 C 3 3 1
10 C 4 2 0
Details:
print (df['Rank'].where(df['target'].eq(1)))
0 NaN
1 NaN
2 2.0
3 NaN
4 NaN
5 3.0
6 NaN
7 1.0
8 NaN
9 3.0
10 NaN
Name: Rank, dtype: float64
print (s)
0 2.0
1 2.0
2 2.0
3 3.0
4 3.0
5 3.0
6 3.0
7 3.0
8 3.0
9 3.0
10 3.0
Name: Rank, dtype: float64
CodePudding user response:
IIUC, make a first pass to slice the rows with target == 1 (using eq
), then get the max rank per group using GroupBy.max
and select the rows with this maximum rank per group with classical boolean indexing using le
:
thresh = df[df['target'].eq(1)].groupby('GroupID')['Rank'].max()
out = df[df['Rank'].le(df['GroupID'].map(thresh))]
output:
GroupID ID Rank target
0 A 1 1 0
2 A 3 2 1
3 B 1 1 0
5 B 3 3 1
6 B 4 2 0
7 C 1 1 1
9 C 3 3 1
10 C 4 2 0
thresholds:
>>> thresh
GroupID
A 2
B 3
C 3