Home > database >  How to create new column with nuniqe values per ID in other column and do not count defined value in
How to create new column with nuniqe values per ID in other column and do not count defined value in

Time:09-12

I have DataFrame in Python Pandas like below (ID and COL1 is "object" data type):

ID COL1
123 XX
123 XX
123 L
456 AP
456 AP
77 C
88 LACK
66 LACK
66 G

And I need to create new column "COL1_bis" where will be infromation how many unique values in "COL1" has each ID. But if some ID has value "LACK" in "COL1" then we do not count that

So as a result I need something like below:

ID COL1 COL1_bis
123 XX 2
123 XX 2
123 L 2
456 AP 1
456 AP 1
77 C 1
88 LACK 0
66 LACK 1
66 GG 1

Because:

  • ID = 123 has 2 in COL1_bis, because this ID has 2 different (unique) values in COL1: "XX" and "L"
  • ID = 456 has 1 in COL1_bis, because this ID has 1 unique value in COL1: "AP"
  • ID = 77 has 1 in COL1_bis, because this ID has 1 unique value in COL1: "C"
  • ID = 88 has 0 in COL1_bis, because this ID has value "LACK" in COL1, so we do not count this value
  • ID = 66 has 1 in COL1_bis, because this ID has "LACK" in COL1 and "GG" so we count only "GG"

How can I do that in Python Pandas?

CodePudding user response:

Use Series.mask for replace LACK to missing values and then use GroupBy.transform with DataFrameGroupBy.nunique:

df['COL1_bis'] = (df['COL1'].mask(df['COL1'].eq('LACK'))
                            .groupby(df['ID'])
                            .transform('nunique'))
print (df)
    ID  COL1  COL1_bis
0  123    XX         2
1  123    XX         2
2  123     L         2
3  456    AP         1
4  456    AP         1
5   77     C         1
6   88  LACK         0
7   66  LACK         1
8   66     G         1
  • Related