Home > Back-end >  Count number of unique items in a pandas dataframe column
Count number of unique items in a pandas dataframe column

Time:05-22

Assume I have a data frame such as

import pandas as pd

df = pd.DataFrame({'ProductList':[["ABC", "ABC", "CDE", "CDF"],
                                  ["CDE", "XYZ", "XYZ"],
                                  ["UGH", "ABC", "CDE", "ABC"]],
                   'Customer': ['John', 'Alex', 'Mary']})

df
        ProductList           Customer
    0   [ABC, ABC, CDE, CDF]    John
    1   [CDE, XYZ, XYZ]         Alex
    2   [UGH, ABC, CDE, ABC]    Mary

My question is how to count # of unique items in the ProductList column. Based on this toy data, there should be 5 unique items, which are 'ABC', 'CDE', 'CDF', 'XYZ', & 'UGH'.

I am thinking to use 'set' to get # of unique items into one set then count the length of this big set. But not sure how to do it. Any suggestions? Thanks

CodePudding user response:

A quick option would be: explode nunique:

df.ProductList.explode().nunique()
# 5

CodePudding user response:

>>> import pandas as pd
>>> df = pd.DataFrame({'ProductList':[["ABC", "ABC", "CDE", "CDF"],
...                                   ["CDE", "XYZ", "XYZ"],
...                                   ["UGH", "ABC", "CDE", "ABC"]],
...                    'Customer': ['John', 'Alex', 'Mary']})
>>> print(df["ProductList"].values.tolist())
[['ABC', 'ABC', 'CDE', 'CDF'], ['CDE', 'XYZ', 'XYZ'], ['UGH', 'ABC', 'CDE', 'ABC']]
>>> import itertools
>>> a = df["ProductList"].values.tolist()
>>> print(list(itertools.chain.from_iterable(a)))
['ABC', 'ABC', 'CDE', 'CDF', 'CDE', 'XYZ', 'XYZ', 'UGH', 'ABC', 'CDE', 'ABC']
>>> print(set(list(itertools.chain.from_iterable(a))))
{'XYZ', 'CDF', 'UGH', 'CDE', 'ABC'}
  • Related