Visualizing average sentiment by day&year (ggplot)-CodePudding

I would like to visualize consumer sentiment by day&year throughout different years. For example, I am interested in comparing consumer sentiment in Dec 18th of 2011, to Dec 18th in 2012. Currently, I have been able to do so by month&year, but I want to visualize the data at a more granular level.

#Creating a month-year variable
valences_by_post<- valences_by_post %>%  
  mutate(month_year = zoo::as.yearmon(date))

#2011 & 2012
valence_11_12<-valences_by_post %>%
  filter(year == 2011 | year ==2012)%>%
  group_by(month_year) %>%
  summarize(mean_valence= mean(valence), n=n())

ggplot(valence_11_12, aes(x =factor(month_year), y = mean_valence, group=1))  
  geom_point()  
  geom_line() 
  geom_smooth() 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

Which produces:

However, to compute sentiment by day&year, and visualize across different years, I ran the following:

valences_by_post<- valences_by_post %>% 
  mutate(year_day = paste(lubridate::year(date), lubridate::yday(date), sep = "-"))

head(valences_by_post$year_day)
valence_day<-valences_by_post %>%
  filter(year == 2011| year == 2012)%>%
  group_by(year_day) %>%
  summarize(mean_valence= mean(valence), n=n())

And then the graph, but I receive an error that, "Error: Discrete value supplied to continuous scale" because the year_day variable is stored as "character", and I was wondering if there is a workaround for this or an equivalent of the "zoo::as.yearmon(date))" function from other packages?

ggplot(valence_day, aes(x =year_day, y = mean_valence))  
  geom_point()  
  geom_line() 
  scale_x_continuous(breaks=seq(1,365,1))  
  geom_smooth()

Here are data samples:

dput(head(valence_day,5))
structure(list(year_day = c("2011-175", "2011-176", "2011-177", 
"2011-182", "2011-189"), mean_valence = c(0, 0.0806100217864924, 
0.0714285714285714, 0, 0.5), n = c(1L, 9L, 1L, 1L, 1L)), row.names = c(NA, 
-5L), class = c("tbl_df", "tbl", "data.frame"))

And

dput(head(valences_by_post,5))
structure(list(document = c("1", "2", "3", "4", "5"), positive = c(1, 
0, 2, 1, 1), negative = c(1, 1, 0, 0, 1), total_words = c(34, 
13, 4, 3, 6), valence = c(0, -0.0769230769230769, 0.5, 0.333333333333333, 
0), date = structure(c(1308873600, 1308960000, 1308960000, 1308960000, 
1308960000), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    year = c(2011, 2011, 2011, 2011, 2011), month = c(6, 6, 6, 
    6, 6), year_day = c("2011-175", "2011-176", "2011-176", "2011-176", 
    "2011-176"), month_year = structure(c(2011.41666666667, 2011

CodePudding user response：

IMHO there is no need to add a year_day. Basically this is the same as the date. Hence, you could do your computations by converting your date (which is a datetime object) to a Date . And to show the yearday in the plot this could be achieved via the labels argument of scale_x_date:

library(dplyr)
library(ggplot2)

valence_day <- valences_by_post %>%
  filter(year %in% c(2011, 2012)) %>%
  group_by(date = as.Date(date)) %>%
  summarize(mean_valence = mean(valence), n = n())

ggplot(valence_day, aes(x = date, y = mean_valence))  
  geom_point()  
  geom_line()  
  scale_x_date(labels = ~ paste(lubridate::year(.x), lubridate::yday(.x), sep = "-"))  
  geom_smooth()

DATA

valences_by_post <- structure(list(
  document = c("1", "2", "3", "4", "5"), positive = c(
    1,
    0, 2, 1, 1
  ), negative = c(1, 1, 0, 0, 1), total_words = c(
    34,
    13, 4, 3, 6
  ), valence = c(
    0, -0.0769230769230769, 0.5, 0.333333333333333,
    0
  ), date = structure(c(
    1308873600, 1308960000, 1308960000, 1308960000,
    1308960000
  ), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
  year = c(2011, 2011, 2011, 2011, 2011), month = c(
    6, 6, 6,
    6, 6
  ), month_year = structure(c(
    2011.41666666667, 2011.41666666667,
    2011.41666666667, 2011.41666666667, 2011.41666666667
  ), class = "yearmon")
), row.names = c(
  NA,
  5L
), class = "data.frame")