Home > Net >  Drop duplicate if one duplicate cell contains one value, and the other duplicate contains another
Drop duplicate if one duplicate cell contains one value, and the other duplicate contains another

Time:07-11

So I have this type of dataset while working in python with pandas:

    id   pos    result
0   1     1       AB
1   1     1       --
2   1     1       BC
3   1     1       AB
4   1     2       CA
5   2     1       CA
6   2     2       --
7   2     2       BA
8   3     1       --
9   3     1       --

Desired result:

    id   pos    result
0   1     1       AB

2   1     1       BC
3   1     1       AB
4   1     2       CA
5   2     1       CA

7   2     2       BA
8   3     1       --
9   3     1       --

I would like to drop duplicate rows on [id] AND [pos] where the result are '--', but only if there already exist a result on the same [id] AND [pos] that has a A, B or C combination.

If no A, B or C combination exist on the same [id] AND [pos], then I want to keep the result '--'.

I know how to drop duplicate rows, but this problem is really beyond my knowledge and I am hoping for your help.

CodePudding user response:

You can use masks and boolean indexing:

# is the result not a "--"?
m = df['result'].ne('--')

# is there at least a non "--" in the group?
m2 = (m
  .groupby([df['id'], df['pos']])
  .transform('max')
 )

# keep if both conditions are equal
out = df[m==m2]

Alternative for the last step:

# keep if not "--" or not a non "--" in group
out = df[m|~m2]

Output:

   id  pos result
0   1    1     AB
2   1    1     BC
3   1    1     AB
4   1    2     CA
5   2    1     CA
7   2    2     BA
8   3    1     --
9   3    1     --
  • Related