I have a data frame containing 80 different features.
Using summary(data)
I can see the min max and average of each column. What I know would like to do is visualise these with a plot.
I would like to be able to see the min max range of each column as well as the mean. The goal is it to be able to visual see outliers and the range of the data. I tried using a box plot to do so, but I am unable to find the right way to plot it.
Any Help is appreciated. I already got the summary in a data frame doing the following:
summary <- as.data.frame(apply(data[,2:(ncol(data)-1)],2,summary))
Preview of the Data:
f1 f2 f3 f4
1 1 0 0 0
2 0 0 0 0
3 0 0 0 0
4 1 0 0 1
5 0 0 0 0
6 2 1 0 0
7 2 0 0 0
8 0 0 0 0
9 0 0 0 0
10 0 0 0 0
structure(list(feat_1 = c(1L, 0L, 0L, 1L, 0L, 2L, 2L, 0L, 0L,
0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), feat_2 = c(0L, 0L,
0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L), feat_3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
2L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), feat_4 = c(0L, 0L, 0L, 1L,
0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 0L, 1L, 0L, 2L, 0L, 0L, 0L, 0L
)), row.names = c(NA, 20L), class = "data.frame")
CodePudding user response:
This was my attempt using reshape
and ggplot
df <-structure(list(feat_1 = c(1L, 0L, 0L, 1L, 0L, 2L, 2L, 0L, 0L,
0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), feat_2 = c(0L, 0L,
0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L), feat_3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
2L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), feat_4 = c(0L, 0L, 0L, 1L,
0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 0L, 1L, 0L, 2L, 0L, 0L, 0L, 0L
)), row.names = c(NA, 20L), class = "data.frame")
Load packages
library(reshape2)
library(ggplot2)
melt the data frame
df.melted <- melt(df)
use ggplot, here alpha
is the transparency and to get the mean use stat_summary
with fun=mean
ggplot(df.melted,aes(factor(variable),value,fill=variable))
geom_boxplot(alpha=0.6)
stat_summary(fun=mean, geom="point",shape=21,size=3)
Output:
see ?ggplot
for more details