I am trying to plot the following data (df_input) in the format of a stacked bar graph where we can also see the change over time by line. Any idea how to do it?
df_input <- data.frame( Year= c(2010,2010,2010,2010,2020,2020,2020,2020), village= c("A","B","C","D","A","B","C","D"), share = c(40,30,20,10,30,30,25,15))
df_input_2 <- data.frame( Year= c(2010,2010,2010,2010,2015,2015,2015,2015,2020,2020,2020,2020), village= c("A","B","C","D","A","B","C","D","A","B","C","D"), share = c(40,30,20,10,30,30,25,15,20,10,30,40))
CodePudding user response:
One option to achieve that would be via a geom_col
and a geom_line
. For the geom_line
you have to group by the variable mapped on fill
, set position to "stack" and adjust the start/end positions to account for the widths of the bars. Additionally you have to manually set the orientation
for the geom_line
to y
:
library(ggplot2)
width <- .6 # Bar width
ggplot(df_input, aes(share, factor(Year), fill = village))
geom_col(width = width)
geom_line(aes(x = share,
y = as.numeric(factor(Year)) ifelse(Year == 2020, -width / 2, width / 2),
group = village), position = "stack", orientation = "y")
EDIT With more than two years things get a bit trickier. In that case I would switch to ´geom_segment`. Additionally we have to do some data wrangling to prepare the data for use with ´geom_segment´:
library(ggplot2)
library(dplyr)
# Example data with three years
df_input_2 <- data.frame( Year= c(2010,2010,2010,2010,2015,2015,2015,2015,2020,2020,2020,2020), village= c("A","B","C","D","A","B","C","D","A","B","C","D"), share = c(40,30,20,10,30,30,25,15,20,10,30,40))
width = .6
# Data wrangling
df_input_2 <- df_input_2 %>%
group_by(Year) %>%
arrange(desc(village)) %>%
mutate(share_cum = cumsum(share)) %>%
group_by(village) %>%
arrange(Year) %>%
mutate(Year = factor(Year),
Year_lead = lead(Year), share_cum_lead = lead(share_cum))
ggplot(df_input_2, aes(share, factor(Year), fill = village))
geom_col(width = width)
geom_segment(aes(x = share_cum, xend = share_cum_lead, y = as.numeric(Year) width / 2, yend = as.numeric(Year_lead) - width / 2, group = village))
#> Warning: Removed 4 rows containing missing values (geom_segment).