I have experience with making alluvial plots using the ggalluvial
package. However, I have run in to an issue where I am trying to create an alluvial plot with two different sources that converge onto 1 variable.
here is example data
library(dplyr)
library(ggplot2)
library(ggalluvial)
data <- data.frame(
unique_alluvium_entires = seq(1:10),
label_1 = c("A", "B", "C", "D", "E", rep(NA, 5)),
label_2 = c(rep(NA, 5), "F", "G", "H", "I", "J"),
shared_label = c("a", "b", "c", "c", "c", "c", "c", "a", "a", "b")
)
here is the code I use to make the plot
#prep the data
data <- data %>%
group_by(shared_label) %>%
mutate(freq = n())
data <- reshape2::melt(data, id.vars = c("unique_alluvium_entires", "freq"))
data$variable <- factor(data$variable, levels = c("label_1", "shared_label", "label_2"))
#ggplot
ggplot(data,
aes(x = variable, stratum = value, alluvium = unique_alluvium_entires,
y = freq, fill = value, label = value))
scale_x_discrete(expand = c(.1, .1))
geom_flow()
geom_stratum(color = "grey", width = 1/4, na.rm = TRUE)
geom_text(stat = "stratum", size = 4)
theme_void()
theme(
axis.text.x = element_text(size = 12, face = "bold")
)
(apparently I cannot embed images yet)
As you can see, I can remove the NA
values, but the shared_label
does not properly "stack". Each unique row should stack on top of each other in the shared_label
column. This would also fix the sizing issue so that they are equal size along the y axis.
Any ideas how to fix this? I have tried ggsankey
but the same issue arises and I cannot remove NA
values. Any tips is greatly appreciated!
CodePudding user response:
This plot is the expected result of the "flow" statistical transformation, which is the default for the "flow" graphical object. (That is, geom_flow()
= geom_flow(stat = "flow")
.) It looks like what you want is to specify the "alluvium" statistical transformation instead. Below i've used all your code but only copied and edited the ggplot()
call.
#ggplot
ggplot(data,
aes(x = variable, stratum = value, alluvium = unique_alluvium_entires,
y = freq, fill = value, label = value))
scale_x_discrete(expand = c(.1, .1))
geom_flow(stat = "alluvium") # <-- specify alternate stat
geom_stratum(color = "grey", width = 1/4, na.rm = TRUE)
geom_text(stat = "stratum", size = 4)
theme_void()
theme(
axis.text.x = element_text(size = 12, face = "bold")
)
#> Warning: Removed 2 rows containing missing values (geom_text).
Created on 2021-12-10 by the reprex package (v2.0.1)