I would like to visualize consumer sentiment by day&year throughout different years. For example, I am interested in comparing consumer sentiment in Dec 18th of 2011, to Dec 18th in 2012. Currently, I have been able to do so by month&year, but I want to visualize the data at a more granular level.
#Creating a month-year variable
valences_by_post<- valences_by_post %>%
mutate(month_year = zoo::as.yearmon(date))
#2011 & 2012
valence_11_12<-valences_by_post %>%
filter(year == 2011 | year ==2012)%>%
group_by(month_year) %>%
summarize(mean_valence= mean(valence), n=n())
ggplot(valence_11_12, aes(x =factor(month_year), y = mean_valence, group=1))
geom_point()
geom_line()
geom_smooth()
theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
However, to compute sentiment by day&year, and visualize across different years, I ran the following:
valences_by_post<- valences_by_post %>%
mutate(year_day = paste(lubridate::year(date), lubridate::yday(date), sep = "-"))
head(valences_by_post$year_day)
valence_day<-valences_by_post %>%
filter(year == 2011| year == 2012)%>%
group_by(year_day) %>%
summarize(mean_valence= mean(valence), n=n())
And then the graph, but I receive an error that, "Error: Discrete value supplied to continuous scale" because the year_day variable is stored as "character", and I was wondering if there is a workaround for this or an equivalent of the "zoo::as.yearmon(date))" function from other packages?
ggplot(valence_day, aes(x =year_day, y = mean_valence))
geom_point()
geom_line()
scale_x_continuous(breaks=seq(1,365,1))
geom_smooth()
Here are data samples:
dput(head(valence_day,5))
structure(list(year_day = c("2011-175", "2011-176", "2011-177",
"2011-182", "2011-189"), mean_valence = c(0, 0.0806100217864924,
0.0714285714285714, 0, 0.5), n = c(1L, 9L, 1L, 1L, 1L)), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
And
dput(head(valences_by_post,5))
structure(list(document = c("1", "2", "3", "4", "5"), positive = c(1,
0, 2, 1, 1), negative = c(1, 1, 0, 0, 1), total_words = c(34,
13, 4, 3, 6), valence = c(0, -0.0769230769230769, 0.5, 0.333333333333333,
0), date = structure(c(1308873600, 1308960000, 1308960000, 1308960000,
1308960000), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
year = c(2011, 2011, 2011, 2011, 2011), month = c(6, 6, 6,
6, 6), year_day = c("2011-175", "2011-176", "2011-176", "2011-176",
"2011-176"), month_year = structure(c(2011.41666666667, 2011
CodePudding user response:
IMHO there is no need to add a year_day
. Basically this is the same as the date. Hence, you could do your computations by converting your date (which is a datetime object) to a Date
. And to show the yearday in the plot this could be achieved via the labels
argument of scale_x_date
:
library(dplyr)
library(ggplot2)
valence_day <- valences_by_post %>%
filter(year %in% c(2011, 2012)) %>%
group_by(date = as.Date(date)) %>%
summarize(mean_valence = mean(valence), n = n())
ggplot(valence_day, aes(x = date, y = mean_valence))
geom_point()
geom_line()
scale_x_date(labels = ~ paste(lubridate::year(.x), lubridate::yday(.x), sep = "-"))
geom_smooth()
DATA
valences_by_post <- structure(list(
document = c("1", "2", "3", "4", "5"), positive = c(
1,
0, 2, 1, 1
), negative = c(1, 1, 0, 0, 1), total_words = c(
34,
13, 4, 3, 6
), valence = c(
0, -0.0769230769230769, 0.5, 0.333333333333333,
0
), date = structure(c(
1308873600, 1308960000, 1308960000, 1308960000,
1308960000
), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
year = c(2011, 2011, 2011, 2011, 2011), month = c(
6, 6, 6,
6, 6
), month_year = structure(c(
2011.41666666667, 2011.41666666667,
2011.41666666667, 2011.41666666667, 2011.41666666667
), class = "yearmon")
), row.names = c(
NA,
5L
), class = "data.frame")