Home > Enterprise >  Converting month_year variable into week_year (dplyr) & (lubridate)
Converting month_year variable into week_year (dplyr) & (lubridate)

Time:11-12

I have a dataset structured as follows, where I am tracking collective action mentions by subReddit by month, relative to a policy treatment which is introduced in Feb 17th, 2012. As a result, the period "Feb 2012" appears twice in my dataset where the "pre" period refers to the Feb 2012 days before treatment, and "post" otherwise.

treatment_status  month_year       collective_action_percentage
pre               Dec 2011           5%
pre               Jan 2012           8%
pre               Feb 2012           10%
post              Feb 2012           3%
post              March 2012        10%

However, I am not sure how to best visualize this indicator by month, but I made the following graph but I was wondering if presenting this pattern/variable by week&year, rather than month&year basis would be clearer if I am interested in showing how collective action mentions decline after treatment?

ggplot(data = df1, aes(x = as.Date(month_year), fill = collective_action_percentage ,y = collective_action_percentage))  
    geom_bar(stat = "identity", position=position_dodge())   
    scale_x_date(date_breaks = "1 month", date_labels = "%b %Y")  
          scale_y_continuous(labels = scales::percent_format(accuracy = 1))  
      xlab("Criticism by individuals active before and after treatment")   
  theme_classic() 
    theme(plot.title = element_text(size = 10, face = "bold"),
          axis.text.x = element_text(angle = 90, vjust = 0.5)) 

output: enter image description here

I created the month_year variable as follows using the Zoo package

df<- df %>%  
  mutate(month_year = zoo::as.yearmon(date)) 

Finally, I tried aggregating the data by weekly-basis as follows, however, given that I have multiple years in my dataset, I want to ideally aggregate data by week&year, and not simply by week

df2 %>% group_by(week = isoweek(time)) %>% summarise(value = mean(values))

CodePudding user response:

Plot a point for each row and connect them with a line so that it is clear what the order is. We also color the pre and post points differently and make treatment status a factor so that we can order the pre level before the post level.

library(ggplot2)
library(zoo)

df2 <- transform(df1, month_year = as.yearmon(month_year, "%b %Y"),
  treatment_status = factor(treatment_status, c("pre", "post")))
ggplot(df2, aes(month_year, collective_action_percentage))   
  geom_point(aes(col = treatment_status), cex = 4)   
  geom_line()

screenshot

Note

We assume df1 is as follows. We have already removed % .

df1 <- 
structure(list(treatment_status = c("pre", "pre", "pre", "post", 
"post"), month_year = c("Dec 2011", "Jan 2012", "Feb 2012", "Feb 2012", 
"March 2012"), collective_action_percentage = c(5L, 8L, 10L, 
3L, 10L)), class = "data.frame", row.names = c(NA, -5L))
  • Related