Home > Net >  Wordcloud with text from a column with list of strings
Wordcloud with text from a column with list of strings

Time:10-03

My dataset has 10 columns, one of which has texts as lists of strings.

Dataset:

Col1 Col2 Col3 Text
...   ...  ... ['I','have', 'a','dream']
...   ...  ... ['My', 'mom', 'is','Spanish']

The code

wordcloud = WordCloud(stopwords=stopwords, max_font_size=50, max_words=100, background_color="white").generate(' '.join(df['Text']))
plt.figure()
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()

returns the

TypeError: sequence item 0: expected str instance, list found

It is clear that it expects strings, not lists. How can I transform the lists within Text column in strings?

CodePudding user response:

You can try to first concatenate the lists within the column df['Text'] with .sum(), then join:

combined_text = ' '.join(df['Text'].sum())

wordcloud = (
    WordCloud(stopwords=stopwords, 
              max_font_size=50, 
              max_words=100,       
              background_color="white")
    .generate(combined_text)
)

CodePudding user response:

Since you have lists as value in the dataset, try exploding them first:

wordcloud = (WordCloud(stopwords=stopwords, 
                       max_font_size=50, 
                       max_words=100, 
                       background_color="white")
                       .generate(' '.join(df['Text'].explode())))

Or join them first:

wordcloud = (WordCloud(stopwords=stopwords, 
                       max_font_size=50, 
                       max_words=100, 
                       background_color="white")
                       .generate(' '.join(df['Text'].agg(' '.join)))
  • Related