how to save counts of logical values in a table in R-CodePudding

I have a df "merged.blood.BP.anthrop_1row" that has data on 250 metabolites.

I'd like, First, to check the number of outliers in the 250 metabolites (> 4*SD)

I made the following function and thought I can save the output in a new df easily but couldn't

could you please help

#create a function to know how many outliers we have in each metabolite

show_outliers <- function (x) {
  p<-table(abs(x)>mean(x,na.rm=T) 4*sd(x,na.rm=T))
  ifelse(is.na(p[2]),print (0),print(p[2]))
}

#create dataset
outliers_results<-as.data.frame(matrix(nrow = 250, ncol = 2))
names(outliers_results)<-c('metabolite', 'outliers_4_SD') 
outliers_results[1:250,1]<-names(merged.blood.BP.anthrop_1row[c(1:500)])

####
for (q in c(1:250)) {
  outliers_results[1:250,2]<- show_outliers(merged.blood.BP.anthrop_1row[,q])
       }

But it seems not to work

I want to have a df like :

metabolites      outliers_4_SD
Acetate          0
HDL              2
LDL              1

Thank you in advance

CodePudding user response：

Here are some possibilities. Note that I've used 3*sd to get some TRUE values in a small dataset.

> # The function
> show_outliers <- function (x) {
    return(abs(x - mean(x, na.rm = TRUE)) > 3 * sd(x, na.rm = TRUE))
  }

> # Fake data
> dat <- data.frame(met = rep(c('a', 'b', 'c'), each = 10), y = rnorm(30))
> # Add a couple outliers
> dat$y[sample(1:nrow(dat), 2)] <- 40

> # Finally, a table
> table(dat$met, show_outliers(dat$y))

    FALSE TRUE
  a     9    1
  b     9    1
  c    10    0

> # Or . . .
> dat$outlier <- show_outliers(dat$y)
> # Option 1
> by(dat$outlier, dat$met, sum)
dat$met: a
[1] 1
------------------------------------------------------------
dat$met: b
[1] 1
------------------------------------------------------------
dat$met: c
[1] 0

> # Option 2
> aggregate(dat$outlier, list(dat$met), sum)
  Group.1 x
1       a 1
2       b 1
3       c 0

> # Option 3
> aggregate(dat[, 'outlier', drop = FALSE], dat[, 'met', drop = FALSE], sum)
  met outlier
1   a       1
2   b       1
3   c       0
> # There are other similar approaches that would work
>