I have a dataframe with three categorical variables:
- classification1 : "1", "2", "3", "4"
- classification2 : "Low", "Med", "High"
- survival: "1" (alive), "2" (dead)
The dataframe is composed of multiples rows, one for each individual. The 20 first rows:
classification1 <- c(4, 3, 1, 2, 3, 1, 2, 2, 2, 2, 1, 1, 4, 2, 2, 1, 2, 1, 3, 2)
classification2 <- c("Medium", "Medium", "Low", "High", "High", "Low", "Medium", "Medium", "High", "Low", "Low", "Low", "High", "High", "Medium", "Low", "Medium", "Low", "Medium", "Medium")
survival <- c(2, 1, 1, 1, 2, 1, 1, 2, 2, 1, 1, 1, 2, 2, 1, 2, 2, 1, 2, 1)
df <- data.frame(classification1, classification2, survival)
I want to use ggalluvial
with ggplot2
to build an alluvial plot like this, but I can't figure how!
The following code return an error (
Error in FUN(X[[i]], ...) : objet 'Freq' introuvable
) because I can't figure out what is "Freq":
ggplot(data = df, aes(axis1 = classification1, axis2 = classification2, y = Freq))
scale_x_discrete(limits = c("classification1", "classification2"), expand = c(.2, .05))
geom_alluvium(aes(fill = survival))
geom_stratum()
geom_text(stat = "stratum", aes(label = after_stat(stratum)))
Any clues on how to format my dataframe to fit in ggaluvial
?
CodePudding user response:
You could aggregate your dataset using e.g. dplyr::count
to add a column with the frequencies:
library(ggalluvial)
library(dplyr)
df <- df %>%
count(classification1, classification2, survival, name = "Freq")
ggplot(data = df, aes(axis1 = classification1, axis2 = classification2, y = Freq))
scale_x_discrete(limits = c("classification1", "classification2"), expand = c(.2, .05))
geom_alluvium(aes(fill = factor(survival)))
geom_stratum()
geom_text(stat = "stratum", aes(label = after_stat(stratum)))