Home > Back-end >  How to plot wordcloud based on multiple columns?
How to plot wordcloud based on multiple columns?

Time:07-15

How to make wordcloud plot based on two columns values? I have a dataframe as follows:

Name <- c("Jon", "Bill", "Maria", "Ben", "Tina", "Vikram", "Ramesh", "Luther")
Age <- c(23, 41, 32, 58, 26, 41, 32, 58)
Pval <- c(0.01, 0.06, 0.001, 0.002, 0.025, 0.05, 0.01, 0.0002)
df <- data.frame(Name, Age, Pval)

I want to make wordcloud plot for df$Name based on values in df$Age and df$Pval. I used following code:

library("tm")
library("SnowballC")
library("wordcloud")
library("wordcloud2")
library("RColorBrewer")
set.seed(1234)
wordcloud(words = df$Name, freq = df$Age, min.freq = 1,
          max.words=10, random.order=FALSE, rot.per=0.35, 
          colors=brewer.pal(8, "Dark2"))

enter image description here

Here Luther & Ben are of same size, but I need to make Luther to be slightly bigger than Ben as it has lower Pval.

CodePudding user response:

A quick fix workaround:

library("dplyr")
library("scales")
library("wordcloud")
library("RColorBrewer")

Name <- c("Jon", "Bill", "Maria", "Ben", "Tina", "Vikram", "Ramesh", "Luther")
Age <- c(23, 41, 32, 58, 26, 41, 32, 58)
Pval <- c(0.01, 0.06, 0.001, 0.002, 0.025, 0.05, 0.01, 0.0002)
df <- data.frame(Name, Age, Pval)

df <- df %>%
group_by(Age) %>%
mutate(rank = rank(Pval)) %>% #rank pvalue by age 
mutate(weight = scales::rescale(rank/max(rank), to=c(0,1)))  %>%
#this is just to make sure that we don't add more than one to the mix
mutate(weight = Age   (1-weight) ) #because rank is inversed
#the final thing adds 0.5 if there is not anyone with the same age and 1 if
#there is someone else but you have a smaller p-val (it also should work if 
# there is more than 2 person with the same age)

set.seed(1234)
wordcloud(words = df$Name, freq = df$weight, min.freq = 1,
      max.words=10, random.order=FALSE, rot.per=0.35, 
      colors=brewer.pal(8, "Dark2"))

Fun and interesting question btw

  • Related