I am trying to write a function to fix outliers in variables but getting errors when writing in dplyr form.
fn_outlier_fix <- function(x, df){
x = enquo(x)
Q1 = df %>% pull(!!x) %>% quantile(0.25) %>% unname()
Q3 = df %>% pull(!!x) %>% quantile(0.75) %>% unname()
IQR = Q3 - Q1
UC = Q3 (1.5 * IQR)
LC = Q3 - (1.5 * IQR)
df <- df %>%
mutate(!!x := if_else(x > UC,UC,!!x),
!!x := if_else(x < LC,LC,!!x))
}
library(dplyr)
df_test <- tribble(
~sales, ~var1, ~var2,
22, 230.1, 37.8,
10, 44.5, 39.3,
9, 17.2, 45.9,
19, 151.5, 41.3,
13, 180.8, 10.8,
7, 8.7, 48.9,
12, 57.5, 32.8,
13, 120.2, 19.6,
5, 8.6, 2.1,
11, 199.8, 2.6)
fn_outlier_fix(x = var1, df = df_test)
Error:
Error in `mutate()`:
! Problem while computing `var1 = if_else(x > UC, UC, var1)`.
Caused by error in `if_else()`:
! Base operators are not defined for quosures. Do you need to unquote the quosure?
# Bad: myquosure > rhs
# Good: !!myquosure > rhs
Backtrace:
1. global fn_outlier_fix(x = var1, df = df_test)
9. rlang:::Ops.quosure(x, UC)
I don't know why its so complicated in r dplyr to write functions in comparison to Python. I was able to manage write the function in below form that worked but I still want the above code to work for my understanding. Appreciate any help.
Where as below code in base R works
fn_outlier_fix <- function(x){
Q1 = quantile(x, 0.25)
Q3 = quantile(x, 0.75)
IQR = Q3 - Q1
UC = Q3 (1.5 * IQR)
LC = Q3 - (1.5 * IQR)
x[x > UC] <- UC
x[x < LC] <- LC
x <- x
}
CodePudding user response:
You were nearly there, you've just forgotten to unquote the x
in the if_else
statement. This function works:
fn_outlier_fix <- function(x, df){
x = enquo(x)
Q1 = df %>% pull(!!x) %>% quantile(0.25) %>% unname()
Q3 = df %>% pull(!!x) %>% quantile(0.75) %>% unname()
IQR = Q3 - Q1
UC = Q3 (1.5 * IQR)
LC = Q3 - (1.5 * IQR)
df <- df %>%
mutate(!!x := if_else(!!x > UC,UC,!!x),
!!x := if_else(!!x < LC,LC,!!x))
df
}
The reason why writing functions for dplyr
is so complicated is due to the non standard evaluation it uses to access the variable names. There is a complete vignette about programming with dplyr
.
They've changed the recommend way again how to work with NSE in dplyr
, now best practise would look like:
fn_outlier_fix_2 <- function(x, df){
Q1 = df %>% pull({{x}}) %>% quantile(0.25) %>% unname()
Q3 = df %>% pull({{x}}) %>% quantile(0.75) %>% unname()
IQR = Q3 - Q1
UC = Q3 (1.5 * IQR)
LC = Q3 - (1.5 * IQR)
df <- df %>%
mutate({{x}} := if_else({{x}} > UC,UC,{{x}}),
{{x}} := if_else({{x}} < LC,LC,{{x}}))
df
}