Iterate a column to create a dictionary and create a data frame-CodePudding

I am trying to for iterate a column to achieve the count of each word in a sentence.

I have a column:

words
"one two three four four six"
"seven eight nine ten eleven"
"twelve thirteen fourteen"
"..."

I have used this code for a single row:

text = df['word'][0]
wordss = []
wordss = text.split()
wfreq=[wordss.count(w) for w in wordss]
ini_dict = dict(zip(wordss,wfreq))

keys, values = zip(*ini_dict.items())

print ("keys : ", str(keys))
print ("values : ", str(values))

The output I receive:

keys :  ('one', 'two', 'three', 'four', 'four', 'six')
values :  (1, 1, 1, 2, 1)

My objective is to iterate in the whole list to then create a dataframe.

I have used this code at the end to achieve the desired dataframe.

df = pd.DataFrame.from_dict(ini_dict.items())
df.columns = ['Words', 'n']
df

Words	n
one	1
two	1
three	1
four	2
six	1

I would like to first iterate the whole 'word' column to create a dictionary and finally have a dataframe that contains all the keys and values of the iterated column. Anyone has a solution?

CodePudding user response：

from collections import Counter
# get a list of lists with sentences
sentences = df['words'].values.tolist()
# split the sentences into the words and flatten the list
words = [i for j in sentences for i in j.split()]
# get counts of each unique word
counts = Counter(words).most_common()
# make dataframe
result = pd.DataFrame(counts , columns=['Words', 'n'])

CodePudding user response：

You can split the column then explode list to rows. At last use value_counts to count word frequency in column

out = (df['words'].str.split().explode().value_counts()
       .to_frame().reset_index().rename(columns={'index': 'Words', 'words': 'n'}))

print(out)

       Words  n
0       four  2
1        one  1
2        two  1
3      three  1
4        six  1
5      seven  1
6      eight  1
7       nine  1
8        ten  1
9     eleven  1
10    twelve  1
11  thirteen  1
12  fourteen  1