I have downloaded tweets and i am trying to represent the different hashtags and how often they are tweeted.
some data
screen_name location text created_at hashtags
<chr> <chr> <chr> <dttm> <list>
1 Patrick33079201 "Canada" "Please sign Romans petition to stop vaccin~ 2021-09-24 23:36:33 <chr [1~
2 wakeupsleepers "Philippians 3:20 <U 271E>" "@cwt_news When will people wake up?\nhttps~ 2021-09-24 23:35:58 <chr [1~
3 keen_alice " UK" "Without scanning qr code vaccine passport~ 2021-09-24 23:34:57 <chr [1~
4 Sledgeh63514792 "" "Mike yeadon warned us about being on a com~ 2021-09-24 23:33:10 <chr [1~
5 PeterHu65796484 "" "Mike yeadon warned us about being on a com~ 2021-09-24 23:32:41 <chr [1~
6 thbransfield "here" "@ksorbs Wow.\n\nGet the vaccine. That way~ 2021-09-24 23:32:17 <chr [1~
ggplot(testdata,aes(x=count(unique(hashtags))))
geom_bar()
i get error
Error in abs(x) : non-numeric argument to mathematical function
i want it to count all the occorences of different hashtags that may be present for each user
CodePudding user response:
Based on the input showed, 'hashtags' is a list
column. We may need to unnest
the column first before applying the count
. In addition, count
requires input as data.frame/tibble
and not a vector or list
library(dplyr)
library(tidyr)
library(ggplot2)
testdata %>%
unnest(c(hashtags)) %>%
count(hashtags) %>%
ggplot(aes(x = hashtags, y = n))
geom_col()
Or if we need a base R
plot, unlist
the column, get the frequency count with table
and use barplot
barplot(table(unlist(testdata$hashtags)))