Home > Back-end >  How to find the number of unique values in comma separated strings stored in an pandas data frame co
How to find the number of unique values in comma separated strings stored in an pandas data frame co

Time:06-08

x Unique_in_x
5,5,6,7,8,6,8 4
5,9,8,0 4
5,9,8,0 4
3,2 2
5,5,6,7,8,6,8 4

Unique_in_x is my expected column.Sometime x column might be string also.

CodePudding user response:

You can use a list comprehension with a set

df['Unique_in_x'] = [len(set(x.split(','))) for x in df['x']]

Or using a split and nunique:

df['Unique_in_x'] = df['x'].str.split(',', expand=True).nunique(1)

Output:

               x  Unique_in_x
0  5,5,6,7,8,6,8            4
1        5,9,8,0            4
2        5,9,8,0            4
3            3,2            2
4  5,5,6,7,8,6,8            4

CodePudding user response:

You can find the unique value of the list with np.unique() and then just use the length

import pandas as pd
import numpy as np

df['Unique_in_x'] = df['X'].apply(lambda x : len(np.unique(x.split(','))))
  • Related