I have a df "merged.blood.BP.anthrop_1row" that has data on 250 metabolites.
I'd like, First, to check the number of outliers in the 250 metabolites (> 4*SD)
I made the following function and thought I can save the output in a new df easily but couldn't
could you please help
#create a function to know how many outliers we have in each metabolite
show_outliers <- function (x) {
p<-table(abs(x)>mean(x,na.rm=T) 4*sd(x,na.rm=T))
ifelse(is.na(p[2]),print (0),print(p[2]))
}
#create dataset
outliers_results<-as.data.frame(matrix(nrow = 250, ncol = 2))
names(outliers_results)<-c('metabolite', 'outliers_4_SD')
outliers_results[1:250,1]<-names(merged.blood.BP.anthrop_1row[c(1:500)])
####
for (q in c(1:250)) {
outliers_results[1:250,2]<- show_outliers(merged.blood.BP.anthrop_1row[,q])
}
But it seems not to work
I want to have a df like :
metabolites outliers_4_SD
Acetate 0
HDL 2
LDL 1
Thank you in advance
CodePudding user response:
Here are some possibilities. Note that I've used 3*sd
to get some TRUE
values in a small dataset.
> # The function
> show_outliers <- function (x) {
return(abs(x - mean(x, na.rm = TRUE)) > 3 * sd(x, na.rm = TRUE))
}
> # Fake data
> dat <- data.frame(met = rep(c('a', 'b', 'c'), each = 10), y = rnorm(30))
> # Add a couple outliers
> dat$y[sample(1:nrow(dat), 2)] <- 40
> # Finally, a table
> table(dat$met, show_outliers(dat$y))
FALSE TRUE
a 9 1
b 9 1
c 10 0
> # Or . . .
> dat$outlier <- show_outliers(dat$y)
> # Option 1
> by(dat$outlier, dat$met, sum)
dat$met: a
[1] 1
------------------------------------------------------------
dat$met: b
[1] 1
------------------------------------------------------------
dat$met: c
[1] 0
> # Option 2
> aggregate(dat$outlier, list(dat$met), sum)
Group.1 x
1 a 1
2 b 1
3 c 0
> # Option 3
> aggregate(dat[, 'outlier', drop = FALSE], dat[, 'met', drop = FALSE], sum)
met outlier
1 a 1
2 b 1
3 c 0
> # There are other similar approaches that would work
>