I have a column of strings, where each row is a list of strings. I want to count the elements of the column in its entirety and not just the rows which one gets with the value.counts() in pandas. I want to apply the Counter() from the Collections module, but that runs only on a list. My column in the DataFrame looks like this:
[['FollowFriday', 'Awesome'],
['Covid_19', 'corona', 'Notagain'],
['Awesome'],
['FollowFriday', 'Awesome'],
[],
['corona', Notagain],
....]
I want to get the counts, such as
[('FollowFriday', 2),
('Awesome', 3),
('Corona', 2),
('Covid19'),
('Notagain', 2),
.....]
The basic command that I am using is:
from collection import Counter
Counter(df['column'])
OR
from collections import Counter
Counter(" ".join(df['column']).split()).most_common()
Any help would be greatly appreciated!
CodePudding user response:
IIUC, your comparison to pandas was only to explain your goal and you want to work with lists?
You can use:
l = [['FollowFriday', 'Awesome'],
['Covid_19', 'corona', 'Notagain'],
['Awesome'],
['FollowFriday', 'Awesome'],
[],
['corona', 'Notagain'],
]
from collections import Counter
from itertools import chain
out = Counter(chain.from_iterable(l))
or if you have a Series of lists, use explode
:
out = Counter(df['column'].explode())
# OR
out = df['column'].explode().value_counts()
output:
Counter({'FollowFriday': 2,
'Awesome': 3,
'Covid_19': 1,
'corona': 2,
'Notagain': 2})