How do we calculate the number of unique characters in a string in pandas dataframe? I have data in column like this:
Number | phone (type string) |
---|---|
1 | 100012 |
2 | 121111 |
3 | 121127 |
4 | 465222 |
I want to add another column which mentions the length of unique characters in each string.
Expected output:
Number | phone (type string) | unique_characters |
---|---|---|
1 | 100012 | 3 |
2 | 121111 | 2 |
3 | 121127 | 3 |
4 | 465222 | 4 |
So far, I have tried:
df['unique_characters'] = len(set(df['phone']))
However, the above code gives me this result:
Number | phone (type string) | unique_characters |
---|---|---|
1 | 100012 | 159378 |
2 | 121111 | 159378 |
3 | 121127 | 159378 |
4 | 465222 | 159378 |
Please help.
CodePudding user response:
No lambda:
>>> df['phone'].apply(set)
0 {0, 1, 2}
1 {1, 2}
2 {7, 1, 2}
3 {5, 6, 4, 2}
Name: phone, dtype: object
and
>>> df['phone'].apply(set).apply(len)
0 3
1 2
2 3
3 4
Name: phone, dtype: int64
CodePudding user response:
You can use len
np.unique
:
df['unique_characters'] = df['phone'].apply(lambda x: len(np.unique([*x])))
Output:
Number phone unique_characters
0 1 100012 3
1 2 121111 2
2 3 121127 3
3 4 465222 4