I have a column that contains lists of varying size but a limited number of items.
print(df['channels'].value_counts(), '\n')
Output:
[web, email, mobile, social] 77733
[web, email, mobile] 43730
[email, mobile, social] 32367
[web, email] 13751
So I want the total number of times that web, email, mobile and social each occur.
These should be:
web = 77733 43730 13751 135,214
email = 77733 43730 13751 32367 167,581
mobile = 77733 43730 32367 153,830
social = 77733 32367 110,100
I have tried the following two methods:
sum_channels_items = pd.Series([x for item in df['channels'] for x in item]).value_counts()
print(sum_channels_items)
from itertools import chain
test = pd.Series(list(chain.from_iterable(df['channels']))).value_counts()
print(test)
Both fail with the same error (just the second one shown).
Traceback (most recent call last):
File "C:/Users/Mark/PycharmProjects/main/main.py", line 416, in <module>
test = pd.Series(list(chain.from_iterable(df['channels']))).value_counts()
TypeError: 'float' object is not iterable
CodePudding user response:
One option is to explode
, then count values:
out = df['channels'].explode().value_counts()
Another could be to use collections.Counter
. Note that your error suggests you have missing values in the column, so you could drop them first:
from itertools import chain
from collections import Counter
out = pd.Series(Counter(chain.from_iterable(df['channels'].dropna())))