my df:
id1 id2 uid . . .
1 100 0
1 101 1000
1 101 1000
2 102 0
2 103 0
3 104 1002
3 104 1002
3 104 1002
3 104 0
3 105 0
3 106 0
4 107 0
4 107 0
4 108 0
4 108 0
I would to group by id1
and filter out id1
s where all the uid
s are zero.
I tried the following:
df = df.groupby(by = 'id1').filter(lambda x: x['uid'].sum() > 0).reset_index(drop = True)
But the issue is that it sums up the non-zero uid
s and creates random uid
s by doing so.
Desired result:
id1 id2 uid . . .
1 100 0
1 101 1000
1 101 1000
3 104 1002
3 104 1002
3 104 1002
3 104 0
3 105 0
3 106 0
CodePudding user response:
You could find the "uid"s not equal to 0; then select the corresponding "id1"s by transforming the max
, which assigns True if for an "id1", the corresponding set of "uid"s contain a non-0 (so for example, "id1"=4 is dropped):
out = df[df['uid'].ne(0).groupby(df['id1']).transform('max')]
Output:
id1 id2 uid
0 1 100 0
1 1 101 1000
2 1 101 1000
5 3 104 1002
6 3 104 1002
7 3 104 1002
8 3 104 0
9 3 105 0
10 3 106 0