Alluvial plot with 2 different sources but a converging/shared variable [R]-CodePudding

I have experience with making alluvial plots using the ggalluvial package. However, I have run in to an issue where I am trying to create an alluvial plot with two different sources that converge onto 1 variable.

here is example data

library(dplyr)
library(ggplot2)
library(ggalluvial)

data <- data.frame(
  unique_alluvium_entires = seq(1:10),
  label_1 = c("A", "B", "C", "D", "E", rep(NA, 5)),
  label_2 = c(rep(NA, 5), "F", "G", "H", "I", "J"),
  shared_label = c("a", "b", "c", "c", "c", "c", "c", "a", "a", "b")
)

here is the code I use to make the plot

#prep the data
data <- data %>%
  group_by(shared_label) %>%
  mutate(freq = n())

data <- reshape2::melt(data, id.vars = c("unique_alluvium_entires", "freq"))
data$variable <- factor(data$variable, levels = c("label_1", "shared_label", "label_2"))

#ggplot
ggplot(data,
       aes(x = variable, stratum = value, alluvium = unique_alluvium_entires,
           y = freq, fill = value, label = value))  
  scale_x_discrete(expand = c(.1, .1))   
  geom_flow()  
  geom_stratum(color = "grey", width = 1/4, na.rm = TRUE)  
  geom_text(stat = "stratum", size = 4)  
  theme_void()  
  theme(
   axis.text.x = element_text(size = 12, face = "bold")
  )

resulting plot (apparently I cannot embed images yet)

As you can see, I can remove the NA values, but the shared_label does not properly "stack". Each unique row should stack on top of each other in the shared_label column. This would also fix the sizing issue so that they are equal size along the y axis.

Any ideas how to fix this? I have tried ggsankey but the same issue arises and I cannot remove NA values. Any tips is greatly appreciated!

CodePudding user response：

This plot is the expected result of the "flow" statistical transformation, which is the default for the "flow" graphical object. (That is, geom_flow() = geom_flow(stat = "flow").) It looks like what you want is to specify the "alluvium" statistical transformation instead. Below i've used all your code but only copied and edited the ggplot() call.

#ggplot
ggplot(data,
       aes(x = variable, stratum = value, alluvium = unique_alluvium_entires,
           y = freq, fill = value, label = value))  
  scale_x_discrete(expand = c(.1, .1))  
  geom_flow(stat = "alluvium")    # <-- specify alternate stat
  geom_stratum(color = "grey", width = 1/4, na.rm = TRUE)  
  geom_text(stat = "stratum", size = 4)  
  theme_void()  
  theme(
    axis.text.x = element_text(size = 12, face = "bold")
  )
#> Warning: Removed 2 rows containing missing values (geom_text).

^{Created on 2021-12-10 by the reprex package (v2.0.1)}