Suppose I have this dataset (the actual dataset has 30 columns and thousands of ids)
df <- data. Frame(id = 1:5,
admission = c("Severe", "Mild", "Mild", "Moderate", "Severe"),
d1 = c(NA, "Moderate", "Mild", "Moderate", "Severe"),
d2 = c(NA, "Moderate", "Mild", "Mild", "Moderate"),
d3 = c(NA, "Severe", "Mild", "Mild", "Severe"),
d4 = c(NA, NA, "Mild", "Mild", NA),
outcome = c("Dead", "Dead", "Alive", "Alive", "Dead"))
I want to make a Sankey diagram that illustrates the daily severity of the patients over time. However, when the observation reaches NA (means that an outcome has been reached), I want the node to directly link to the outcome.
This is how the diagram should look like:
Image fetched from the question asked by @qdread
Thanks in advance for the help.
CodePudding user response:
You just need to reshape your data "manually", since make_long
doesn't do what you need here.
do.call(rbind, apply(df, 1, function(x) {
x <- na.omit(x[-1])
data.frame(x = names(x), node = x,
next_x = dplyr::lead(names(x)),
next_node = dplyr::lead(x), row.names = NULL)
})) %>%
mutate(x = factor(x, names(df)[-1]),
next_x = factor(next_x, names(df)[-1])) %>%
ggplot(aes(x = x,
next_x = next_x,
node = node,
next_node = next_node,
fill = node,,
label = node))
geom_sankey(flow.alpha = 0.5,
node.color = NA,
show.legend = TRUE)
geom_sankey_text(size = 3, color = "black", fill = NA, hjust = 0,
position = position_nudge(x = 0.1))
CodePudding user response:
Move the outcome to the left, then plot:
library(ggplot2)
library(dplyr)
library(ggsankey)
# fill NAs from last value
df[] <- t(apply(df, 1, zoo::na.locf, fromLast = TRUE))
head(df)
# id admission d1 d2 d3 d4 outcome
# 1 1 Severe Dead Dead Dead Dead Dead
# 2 2 Mild Moderate Moderate Severe Dead Dead
# 3 3 Mild Mild Mild Mild Mild Alive
# 4 4 Moderate Moderate Mild Mild Mild Alive
# 5 5 Severe Severe Moderate Severe Dead Dead
# then your existing code
df.sankey <- df %>%
make_long(admission, d1, d2, d3, d4, outcome)
# ggplot...