I am trying to use an Alluvial plot (Sankey diagram) to show the change of different categories over two time periods. I was able to create a plot that makes sense to me when all the factor levels are represented in both time periods (pre and post), but with my data the plot looks strange after changing the order of the factors. I also would like to show the same fill colors for the categories from both time periods, but could only change the first time period (pre). When I do the plot, I notice that the colors that I specify are not the ones that I want for each of the factor levels, despite the order of the boxes / stratum is correct.
Any help or advice on how to improve the plot and how I can overcome the problem of ordering the factor levels from both groups when the categories ar not fully represented in both time periods would be so helpful.
Here is the code:
db <- read.table(text = "pre post freq
NE NE 0
NE DD 2
NE LC 5
NE NT 2
NE VU 3
NE EN 5
NE CR 1
DD NE 0
DD DD 3
DD LC 37
DD NT 10
DD VU 14
DD EN 3
DD CR 3
LC NE 0
LC DD 0
LC LC 18
LC NT 2
LC VU 1
LC EN 2
LC CR 0
NT NE 0
NT DD 1
NT LC 3
NT NT 8
NT VU 13
NT EN 5
NT CR 1
VU NE 0
VU DD 0
VU LC 1
VU NT 0
VU VU 7
VU EN 8
VU CR 3
EN NE 0
EN DD 0
EN LC 0
EN NT 0
EN VU 0
EN EN 0
EN CR 2
CR NE 0
CR DD 0
CR LC 1
CR NT 0
CR VU 0
CR EN 0
CR CR 2
", header=T)
head(db)
# Order factor levels
levels(db$pre) <- c("NE", "DD", "LC", "NT", "VU", "EN", "CR")
levels(db$post) <- c("NE", "DD", "LC", "NT", "VU", "EN", "CR")
# Set colors for the plot
colors.p <- c("#282828", "#7C7C7C", "#20AB5F", "#3EFF00",
"#FBFF00", "#FFBD00", "#FF0C00")
# Plot
p <- ggplot(db,
aes(y = freq, axis1 = pre,
axis2 = post))
geom_alluvium(aes(fill = pre), show.legend = FALSE)
geom_stratum(aes(fill = pre), color = "black", alpha = 0.5)
geom_label(stat = "stratum", aes(label = after_stat(stratum)))
scale_x_discrete(limits = c("previous", "current"),
expand = c(0.3, 0.01))
scale_fill_manual(values = colors.p)
theme_void()
theme(
panel.background = element_blank(),
axis.text.y = element_blank(),
axis.text.x = element_text(size = 15, face = "bold"),
axis.title = element_blank(),
axis.ticks = element_blank(),
legend.position = "none"
)
p
CodePudding user response:
I gave it a shot with a different package I am more familiar with (
EDIT: For your new data the previous approach I posted still works. You need to add the additional level ("NE") in the factor releveling for the pre timepoint and as a new color (blue in this example). What error do you get with this data?
library(tidyverse)
library(ggsankey)
db <- read.table(text = "pre post freq
NE NE 0
NE DD 2
NE LC 5
NE NT 2
NE VU 3
NE EN 5
NE CR 1
DD NE 0
DD DD 3
DD LC 37
DD NT 10
DD VU 14
DD EN 3
DD CR 3
LC NE 0
LC DD 0
LC LC 18
LC NT 2
LC VU 1
LC EN 2
LC CR 0
NT NE 0
NT DD 1
NT LC 3
NT NT 8
NT VU 13
NT EN 5
NT CR 1
VU NE 0
VU DD 0
VU LC 1
VU NT 0
VU VU 7
VU EN 8
VU CR 3
EN NE 0
EN DD 0
EN LC 0
EN NT 0
EN VU 0
EN EN 0
EN CR 2
CR NE 0
CR DD 0
CR LC 1
CR NT 0
CR VU 0
CR EN 0
CR CR 2
", header=T)
db %>%
uncount(freq) %>%
make_long(pre, post) %>%
mutate(node = fct_relevel(node,"DD", "LC", "NT","NE", "VU", "EN", "CR"),
next_node = fct_relevel(next_node, "DD", "LC", "NT", "VU", "EN", "CR")) %>%
ggplot(aes(x = x,
next_x = next_x,
node = node,
next_node = next_node,
fill = factor(node)))
geom_alluvial()
scale_fill_manual(values = c("DD" = "#7C7C7C", "LC" = "#20AB5F", "NT" = "#3EFF00", "VU" = "#FBFF00", "EN" = "#FFBD00", "CR" = "#FF0C00", "NE" ="blue"))