How to create new column with nuniqe values per ID in other column and do not count defined value in-CodePudding

I have DataFrame in Python Pandas like below (ID and COL1 is "object" data type):

ID	COL1
123	XX
123	XX
123	L
456	AP
456	AP
77	C
88	LACK
66	LACK
66	G

And I need to create new column "COL1_bis" where will be infromation how many unique values in "COL1" has each ID. But if some ID has value "LACK" in "COL1" then we do not count that

So as a result I need something like below:

ID	COL1	COL1_bis
123	XX	2
123	XX	2
123	L	2
456	AP	1
456	AP	1
77	C	1
88	LACK	0
66	LACK	1
66	GG	1

Because:

ID = 123 has 2 in COL1_bis, because this ID has 2 different (unique) values in COL1: "XX" and "L"
ID = 456 has 1 in COL1_bis, because this ID has 1 unique value in COL1: "AP"
ID = 77 has 1 in COL1_bis, because this ID has 1 unique value in COL1: "C"
ID = 88 has 0 in COL1_bis, because this ID has value "LACK" in COL1, so we do not count this value
ID = 66 has 1 in COL1_bis, because this ID has "LACK" in COL1 and "GG" so we count only "GG"

How can I do that in Python Pandas?

CodePudding user response：

Use Series.mask for replace LACK to missing values and then use GroupBy.transform with DataFrameGroupBy.nunique:

df['COL1_bis'] = (df['COL1'].mask(df['COL1'].eq('LACK'))
                            .groupby(df['ID'])
                            .transform('nunique'))
print (df)
    ID  COL1  COL1_bis
0  123    XX         2
1  123    XX         2
2  123     L         2
3  456    AP         1
4  456    AP         1
5   77     C         1
6   88  LACK         0
7   66  LACK         1
8   66     G         1