I have a problem, I want to count the unique words from a dataframe, but unfortunately it only counts the first sentences.
text
0 hello is a unique sentences
1 hello this is a test
2 does this works
import pandas as pd
d = {
"text": ["hello is a unique sentences",
"hello this is a test",
"does this works"],
}
df = pd.DataFrame(data=d)
from collections import Counter
# Count unique words
def counter_word(text_col):
print(len(text_col.values))
count = Counter()
for i, text in enumerate(text_col.values):
print(i)
for word in text.split():
count[word] = 1
return count
counter = counter_word(df['text'])
len(counter)
CodePudding user response:
I think simplier is join values by space, then split for words and count:
counter = Counter((' '.join(df['text'])).split())
print (counter)
Counter({'hello': 2, 'is': 2, 'a': 2, 'this': 2, 'unique': 1, 'sentences': 1, 'test': 1, 'does': 1, 'works': 1})
CodePudding user response:
You can use itertools.chain
to have a generator to feed to Counter
:
from itertools import chain
counter = Counter(chain.from_iterable(map(str.split, df['text'])))
output:
Counter({'hello': 2,
'is': 2,
'a': 2,
'unique': 1,
'sentences': 1,
'this': 2,
'test': 1,
'does': 1,
'works': 1})
CodePudding user response:
It may be easier and more efficient to stack
the words into a single column then use pandas
value_counts
to count them, instead of Counter
:
df["text"].str.split(expand=True).stack().value_counts()