Home > Back-end >  Pandas number of unique/distinct characters in a string
Pandas number of unique/distinct characters in a string

Time:12-24

How do we calculate the number of unique characters in a string in pandas dataframe? I have data in column like this:

Number phone (type string)
1 100012
2 121111
3 121127
4 465222

I want to add another column which mentions the length of unique characters in each string.

Expected output:

Number phone (type string) unique_characters
1 100012 3
2 121111 2
3 121127 3
4 465222 4

So far, I have tried:

df['unique_characters'] = len(set(df['phone']))

However, the above code gives me this result:

Number phone (type string) unique_characters
1 100012 159378
2 121111 159378
3 121127 159378
4 465222 159378

Please help.

CodePudding user response:

No lambda:

>>> df['phone'].apply(set)
0       {0, 1, 2}
1          {1, 2}
2       {7, 1, 2}
3    {5, 6, 4, 2}
Name: phone, dtype: object

and

>>> df['phone'].apply(set).apply(len)
0    3
1    2
2    3
3    4
Name: phone, dtype: int64

CodePudding user response:

You can use len np.unique:

df['unique_characters'] = df['phone'].apply(lambda x: len(np.unique([*x])))

Output:

   Number   phone  unique_characters
0       1  100012                  3
1       2  121111                  2
2       3  121127                  3
3       4  465222                  4
  • Related