I am trying to for iterate a column to achieve the count of each word in a sentence.
I have a column:
words |
---|
"one two three four four six" |
"seven eight nine ten eleven" |
"twelve thirteen fourteen" |
"..." |
I have used this code for a single row:
text = df['word'][0]
wordss = []
wordss = text.split()
wfreq=[wordss.count(w) for w in wordss]
ini_dict = dict(zip(wordss,wfreq))
keys, values = zip(*ini_dict.items())
print ("keys : ", str(keys))
print ("values : ", str(values))
The output I receive:
keys : ('one', 'two', 'three', 'four', 'four', 'six')
values : (1, 1, 1, 2, 1)
My objective is to iterate in the whole list to then create a dataframe.
I have used this code at the end to achieve the desired dataframe.
df = pd.DataFrame.from_dict(ini_dict.items())
df.columns = ['Words', 'n']
df
Words | n |
---|---|
one | 1 |
two | 1 |
three | 1 |
four | 2 |
six | 1 |
I would like to first iterate the whole 'word' column to create a dictionary and finally have a dataframe that contains all the keys and values of the iterated column. Anyone has a solution?
CodePudding user response:
from collections import Counter
# get a list of lists with sentences
sentences = df['words'].values.tolist()
# split the sentences into the words and flatten the list
words = [i for j in sentences for i in j.split()]
# get counts of each unique word
counts = Counter(words).most_common()
# make dataframe
result = pd.DataFrame(counts , columns=['Words', 'n'])
CodePudding user response:
You can split the column then explode
list to rows. At last use value_counts
to count word frequency in column
out = (df['words'].str.split().explode().value_counts()
.to_frame().reset_index().rename(columns={'index': 'Words', 'words': 'n'}))
print(out)
Words n
0 four 2
1 one 1
2 two 1
3 three 1
4 six 1
5 seven 1
6 eight 1
7 nine 1
8 ten 1
9 eleven 1
10 twelve 1
11 thirteen 1
12 fourteen 1