I am trying to create a wordcloud using the frequencies of words from a Pandas column. I have a dataframe like so:
PageNumber Top_words_only
1 people trees like instagram ...
2 people yellow like flickrioapp people level water...
...
78 teatree instagram water leith circuits...
I have calculated the frequencies of words from the top_words_only
column and put it into a tuple so that wordcloud can process the data into a visualisation like so:
tuples = tuple([tuple(x) for x in df.top_words_only.str.split(expand=True).stack().value_counts().reset_index().values])
print(tuples)
<OUT>
(('instagram', 3), ('plant', 3), ('shadow', 3), ('rise', 3), .... ('hibs', 1), ('bud', 1), ('insect', 1),
('warriston', 1), ('garage', 1))
wordcloud = WordCloud()
wordcloud.generate_from_frequencies(tuples)
plt.figure()
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()
However, it comes up with an attribute error saying:
AttributeError: 'tuple' object has no attribute 'items'
Does anyone know what is wrong with the code I have?
CodePudding user response:
Use a dictionary:
d = dict([tuple(x) for x in df.Top_words_only.str.split(expand=True).stack().value_counts().reset_index().values])
from wordcloud import WordCloud
wordcloud = WordCloud()
wordcloud.generate_from_frequencies(d)
plt.figure()
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()
output:
Alternative to generate the dictionary:
from collections import Counter
d = Counter(w for x in df['Top_words_only'] for w in x.split())