Pandas number of unique/distinct characters in a string-CodePudding

How do we calculate the number of unique characters in a string in pandas dataframe? I have data in column like this:

Number	phone (type string)
1	100012
2	121111
3	121127
4	465222

I want to add another column which mentions the length of unique characters in each string.

Expected output:

Number	phone (type string)	unique_characters
1	100012	3
2	121111	2
3	121127	3
4	465222	4

So far, I have tried:

df['unique_characters'] = len(set(df['phone']))

However, the above code gives me this result:

Number	phone (type string)	unique_characters
1	100012	159378
2	121111	159378
3	121127	159378
4	465222	159378

Please help.

CodePudding user response：

No lambda:

>>> df['phone'].apply(set)
0       {0, 1, 2}
1          {1, 2}
2       {7, 1, 2}
3    {5, 6, 4, 2}
Name: phone, dtype: object

and

>>> df['phone'].apply(set).apply(len)
0    3
1    2
2    3
3    4
Name: phone, dtype: int64

CodePudding user response：

You can use len np.unique:

df['unique_characters'] = df['phone'].apply(lambda x: len(np.unique([*x])))

Output:

   Number   phone  unique_characters
0       1  100012                  3
1       2  121111                  2
2       3  121127                  3
3       4  465222                  4