Home > Enterprise >  Tidyverse filter outliers - in one pipe
Tidyverse filter outliers - in one pipe

Time:07-15

I want to filter outliers in the tidyverseframe work in one pipe. Outlier for this example is just defined as Q1 - 1.5 * IQR and Q3 1.5 * IQR. Q1 being the 25 percentile and Q3 the 75% percentile. And IQR the interquartile range, IQR = Q3 - Q1.

I managed to compute the upper and lower bound for outliers, and I am familiar with the filter() function from dplyr. However I do not know how to get the values calculated inside the summarize in the same pipe operation back to the complete data.frame

iris %>% 
  group_by(Species) %>% 
  # filter(API_Psy_dm <=)
  summarise(IQR = IQR(Sepal.Length),
            O_upper =quantile(Sepal.Length, probs=c( .75), na.rm = FALSE) 1.5*IQR,  
            O_lower =quantile(Sepal.Length, probs=c( .25), na.rm = FALSE)-1.5*IQR  
  )

Is this even possible? Or would I need a second pipe? Or is there a more convenient way than to calculate the upper and lower limit myself?

CodePudding user response:

Use mutate instead of summarize, and then filter:

iris %>% 
  group_by(Species) %>% 
  mutate(IQR = IQR(Sepal.Length),
            O_upper = quantile(Sepal.Length, probs=c( .75), na.rm = FALSE) 1.5*IQR,  
            O_lower = quantile(Sepal.Length, probs=c( .25), na.rm = FALSE)-1.5*IQR  
  ) %>% 
  filter(O_lower <= Sepal.Length & Sepal.Length <= O_upper)
  • Related