Home > Enterprise >  How to boxplot the summary of and data frame in R
How to boxplot the summary of and data frame in R

Time:10-17

I have a data frame containing 80 different features. Using summary(data) I can see the min max and average of each column. What I know would like to do is visualise these with a plot.

I would like to be able to see the min max range of each column as well as the mean. The goal is it to be able to visual see outliers and the range of the data. I tried using a box plot to do so, but I am unable to find the right way to plot it.

Any Help is appreciated. I already got the summary in a data frame doing the following:

summary <- as.data.frame(apply(data[,2:(ncol(data)-1)],2,summary))

Preview of the Data:

    f1  f2  f3  f4
1   1   0   0   0
2   0   0   0   0
3   0   0   0   0
4   1   0   0   1
5   0   0   0   0
6   2   1   0   0
7   2   0   0   0
8   0   0   0   0
9   0   0   0   0
10  0   0   0   0

structure(list(feat_1 = c(1L, 0L, 0L, 1L, 0L, 2L, 2L, 0L, 0L, 
0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), feat_2 = c(0L, 0L, 
0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L), feat_3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 
2L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), feat_4 = c(0L, 0L, 0L, 1L, 
0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 0L, 1L, 0L, 2L, 0L, 0L, 0L, 0L
)), row.names = c(NA, 20L), class = "data.frame")

CodePudding user response:

This was my attempt using reshape and ggplot

  df <-structure(list(feat_1 = c(1L, 0L, 0L, 1L, 0L, 2L, 2L, 0L, 0L, 
0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), feat_2 = c(0L, 0L, 
0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L), feat_3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 
2L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), feat_4 = c(0L, 0L, 0L, 1L, 
0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 0L, 1L, 0L, 2L, 0L, 0L, 0L, 0L
)), row.names = c(NA, 20L), class = "data.frame")

Load packages

library(reshape2)
library(ggplot2)

melt the data frame

df.melted <- melt(df)

use ggplot, here alpha is the transparency and to get the mean use stat_summary with fun=mean

ggplot(df.melted,aes(factor(variable),value,fill=variable)) 
geom_boxplot(alpha=0.6) 
stat_summary(fun=mean, geom="point",shape=21,size=3)

Output:

enter image description here

see ?ggplot for more details

  • Related