Assume I have a data frame such as
import pandas as pd
df = pd.DataFrame({'ProductList':[["ABC", "ABC", "CDE", "CDF"],
["CDE", "XYZ", "XYZ"],
["UGH", "ABC", "CDE", "ABC"]],
'Customer': ['John', 'Alex', 'Mary']})
df
ProductList Customer
0 [ABC, ABC, CDE, CDF] John
1 [CDE, XYZ, XYZ] Alex
2 [UGH, ABC, CDE, ABC] Mary
My question is how to count # of unique items in the ProductList column. Based on this toy data, there should be 5 unique items, which are 'ABC', 'CDE', 'CDF', 'XYZ', & 'UGH'.
I am thinking to use 'set' to get # of unique items into one set then count the length of this big set. But not sure how to do it. Any suggestions? Thanks
CodePudding user response:
A quick option would be: explode
nunique
:
df.ProductList.explode().nunique()
# 5
CodePudding user response:
>>> import pandas as pd
>>> df = pd.DataFrame({'ProductList':[["ABC", "ABC", "CDE", "CDF"],
... ["CDE", "XYZ", "XYZ"],
... ["UGH", "ABC", "CDE", "ABC"]],
... 'Customer': ['John', 'Alex', 'Mary']})
>>> print(df["ProductList"].values.tolist())
[['ABC', 'ABC', 'CDE', 'CDF'], ['CDE', 'XYZ', 'XYZ'], ['UGH', 'ABC', 'CDE', 'ABC']]
>>> import itertools
>>> a = df["ProductList"].values.tolist()
>>> print(list(itertools.chain.from_iterable(a)))
['ABC', 'ABC', 'CDE', 'CDF', 'CDE', 'XYZ', 'XYZ', 'UGH', 'ABC', 'CDE', 'ABC']
>>> print(set(list(itertools.chain.from_iterable(a))))
{'XYZ', 'CDF', 'UGH', 'CDE', 'ABC'}