Home > Software design >  Sankey diagram with ggplot
Sankey diagram with ggplot

Time:01-01

I am relatively experienced in pivoting tables, however this one is almost driving me mad. I want to generate a Sankey plot with flows connecting parameters between two experiments. This is very simple, yet I cannot figure out how the package wants the data laid out.

Please consider the following MWE:

tribble(
~parm, ~value,
"b1", 0.009,
"g1", 0.664,
"b2", 0.000,
"ra", 0.000,
"rc", 0.000,
"ax", 0.084,
"cx", 0.086,
"ex", 0.179,
"ay", 0.045,
"cy", 0.043,
"ey", 0.102
) -> doc1

doc2 <- tribble(
  ~parm, ~value,
  "b1", 0.181,
  "g1", 0.289,
  "b2", 0.181,
  "ra", 0.000,
  "rc", 0.000,
  "ax", 0.001,
  "cx", 0.001,
  "ex", 0.002,
  "ay", 0.001,
  "cy", 0.001,
  "ey", 0.002,
  "re", 0.000,
  "rf", 0.000,
  "b3", 0.289
)


doc1 <- doc1 %>% mutate(model = "exp")
doc2 <- doc2 %>% mutate(model = "exp2")
finalpow <- doc1 %>% full_join(doc2)
a <- pivot_wider(finalpow, names_from = model, values_from = value)

finalpow  <- a%>% make_long( exp, exp2) 

And the following code to generate the diagram:

ggplot(finalpow, aes(x = x,
               next_x = next_x,
               node = node,
               next_node = next_node,
               fill = factor(node),
               label = node))  
  geom_sankey(flow.alpha = 0.5, node.color = 1)  
  geom_sankey_label(size = 3.5, color = 1, fill = "white")  
  scale_fill_viridis_d()  
  theme_sankey(base_size = 16)  
  guides(fill = guide_legend(title = "Title"))

I am close, as the two bars are the experiments as desired. But the labels should be the different parameters names ax, cx, ex, ... and the "barplots" should be proportional to each parameter values. Here is what I get:

enter image description here

CodePudding user response:

If you can live with another package, I personally prefer the {ggalluvial} package - I find the syntax more intuitive and the examples in the documentation help understand the required data structure better.

I think the biggest challenge is here the labelling - as you can see in the code below, this takes the biggest chunk of code. Maybe you might want to refrain from direct labelling in this case.

library(tidyverse)
library(ggalluvial)

## using your data frame a
finalpow <- a %>% pivot_longer(cols = c(exp, exp2)) %>%
  ## you need to manually drop all NA, because geom_alluvium won't do that for you
  drop_na()
## I'd  create a separate data frame for annotation
df_lab <- finalpow %>%
  filter(name == "exp") %>%
  ## for labelling, you need to manually arrange labels in correct position
  arrange(desc(parm))

ggplot(mapping = aes(y = value, x = name))  
  geom_alluvium(
    data = finalpow,
    aes(fill = parm, alluvium = parm),
    color = "black", alpha = 1
  )  
  ggrepel::geom_label_repel(
    data = df_lab,
    aes(label = parm, y = value),
    ## you cannot just use the y value, you need to stack
    position = position_stack(vjust = .5)
  )

Created on 2022-12-28 with reprex v2.0.2

  • Related