I have an R dataframe with 20 columns, one for each model. The lines of the dataset present the statistics for a boxplot. I want to plot a boxplot for each of those models, setting the parameters of the boxplot as the lines of the dataframe
Below is one example:
Model 1 Model 2 ... Model 20
min 1 5 15
q25 2 7 16
median 3 8 20
q75 4 9 21
max 5 10 22
As can be seeing, the statistics are already calculated. I just need to set them to the boxplot but I have no idea in how to do that
CodePudding user response:
In case you are willing to use ggplot2
you could try something like this:
Set up a fake dataset. Apparently, you need that to run ggplot()
geom_boxplot()
:
df <- data.frame("Model" = "Model 1")
Then you can control the single boxplot components like this:
ggplot(df, aes(x = Model,
ymin=5, #min
lower=20, #q25
middle=25, #median
upper=50, #q75
ymax=100)) #max
geom_boxplot(stat="identity")
Analogous for multiple models:
df <- data.frame("Model" = c("Model 1", "Model 2"))
ggplot(df, aes(x = Model,
ymin=c(5, 9),
lower=c(20,46),
middle=c(25,55),
upper=c(50,89),
ymax=c(100, 111)))
geom_boxplot(stat="identity")
CodePudding user response:
What has not been explained so far is that you need a matrix and not a data frame (since data frames are actually lists, the error refers to lists). I assume you somewhere also have the sample sizes, I rbind
them here as a new row.
dat <- rbind(dat, n=c(20, 14, 60))
So all you need to do is coercing as.matrix
.
bxp(list(stats=as.matrix(dat[1:5, ]), n=dat[6, ]))
Data:
dat <- read.table(header=TRUE, text='Model1 Model2 Model20
min 1 5 15
q25 2 7 16
median 3 8 20
q75 4 9 21
max 5 10 22')