Home > Software design >  Visualizing binary variable as count variable with time on x-axis, R
Visualizing binary variable as count variable with time on x-axis, R

Time:06-10

I wish to make a line- or scatter plot (preferable ggplot) which has time on the x-axis and a count of 1's in a binary variable on the y-axis. This is my data. Can anyone help me?

tweet_id    text                date          morality_binary
1           Lorem ipsum...      2014-01-01    1
2           Lorem ipsum...      2015-01-01    0
3           Lorem ipsum...      2016-01-01    1
4           Lorem ipsum...      2017-01-01    0
5           Lorem ipsum...      2018-01-01    0
6           Lorem ipsum...      2019-01-01    1

Here's a reproducible example of the actual data:

df <- structure(list(X.1 = 0:5, X = c(502026L, 198322L, 711188L, 563672L, 
993641L, 474508L), tweet_id = c(867481042428579840, 469268704732393536, 
915248573083553792, 689948979740725248, 1003463365811953664, 
958533305716101120), user_username = c("GerryConnolly", "SenatorMenendez", 
"RepJayapal", "RoyBlunt", "SenJeffMerkley", "RepChrisStewart"
), text = c(".@governorva demonstrates compassion that potus lacks. trump's immigration eo still threatens to tear this family apart. #freeliliana", 
"hoy,repet<ed> mi llamado a mis colegas rep. de la c<e1>mara para q hagan lo correcto y aprueben una #reformamigraotira #cir", 
"@repadamsmith @reproybalallard the incarceration system for immigrants operates in the shadows, at a huge profit for corporations. our bill phases them out in 3 years.", 
"now isn't the time to accept syrian &amp; iraqi refugees into our country w/o proper system for vetting. rt if you agree", 
"mr. president, the only <93>horrible law<94> is your policy. you have the power to change it. if you saw what i saw today, you would. never before has america deliberately inflicted cruelty on children to deter asylum seekers from finding refuge here. never. and we never should.", 
"republicans and democrats need to work together and reform our immigration policies. #sotu"
), created_at = c("2017-05-24", "2014-05-22", "2017-10-03", "2016-01-20", 
"2018-06-04", "2018-01-31"), user_name = c("Rep. Gerry Connolly", 
"Senator Bob Menendez", "Rep. Pramila Jayapal", "Senator Roy Blunt", 
"Senator Jeff Merkley", "Rep. Chris Stewart"), user_verified = c("True", 
"True", "True", "True", "True", "True"), morality_binary = c(0.78794396, 
0.06992793, 0.75065666, 0.7655833, 0.85510856, 0.52538866), morality = c(1, 
0, 1, 1, 1, 1)), row.names = c(NA, 6L), class = "data.frame")

and this is what I want to get to final plot in my dreams

CodePudding user response:

You can summarise the number of 1s in morality, then use that to plot with ggplot2. You can group by each date to summarise.

library(tidyverse)

df %>%
  mutate(created_at = as.Date(created_at)) %>%
  group_by(created_at) %>%
  summarise(count = sum(morality, na.rm = T)) %>%
  ggplot(aes(x = created_at, y = count))  
  geom_line()  
  labs(x = "Date", y = "moralizing text")

enter image description here

Or if you want to get the count for each year, then you can extract the year from the date, then summarise.

library(tidyverse)

df %>%
  mutate(Date = as.numeric(format(as.Date(created_at),"%Y"))) %>%
  group_by(Date) %>%
  summarise(count = sum(morality, na.rm = T)) %>%
  ggplot(aes(x = Date, y = count))  
  geom_line()  
  labs(x = "Date", y = "moralizing text")

enter image description here

  • Related