Home > Net >  Filter with map function in R
Filter with map function in R

Time:09-21

I am trying to filter multiple columns (15) of a dataframe. specifically I want to remove the outliers using Q3 IQR1.5 and Q1 - IQR1.5 method.

Toy example:

library(tidyverse)
aa <- c(2,3,4,3,2,2,1,6,5,4,3,1,15)
bb <- c(0.2,20,30,40,30,20,20,10,30,40,30,10,10)
cc <- c(-9,2,3,4,3,2,2,1,5,4,3,1,25)

df <- tibble(aa,bb,cc)

I tried without success:

i <- NULL
for(i in 1:ncol(fat)){
   po <- fat %>% 
     filter(.[[i]] >= (quantile(.[[i]], .25) - IQR(.[[i]]) * 1.5))
   
   po <- fat %>% 
     filter(.[[i]] <= (quantile(.[[i]], .75)   IQR(.[[i]]) * 1.5))
}

Can I use filter and map functions to do this? and how?

Many thanks GS

CodePudding user response:

We may use filter with if_all/across

library(dplyr)
df %>%
    filter(if_all(where(is.numeric), ~ (.>= (quantile(., .25) - IQR(.) * 1.5 )) &
           (.<= (quantile(., .75)   IQR(.) * 1.5 ))))

CodePudding user response:

Here are couple of base R option using sapply/lapply. We write a custom function to detect outliers and apply it to every column and select only the rows that have no outlier in them.

is_outlier <- function(x) {
  x <= (quantile(x, .25) - IQR(x) * 1.5) | x >= (quantile(x, .75)   IQR(x) * 1.5)
} 

df[!Reduce(`|`, lapply(df, is_outlier)), ]

#      aa    bb    cc
#   <dbl> <dbl> <dbl>
# 1     3    20     2
# 2     4    30     3
# 3     3    40     4
# 4     2    30     3
# 5     2    20     2
# 6     1    20     2
# 7     6    10     1
# 8     5    30     5
# 9     4    40     4
#10     3    30     3
#11     1    10     1

Using sapply -

df[rowSums(sapply(df, is_outlier)) == 0, ]
  • Related