I am trying to write a function that takes a string and uses it as a column name in dplyr::mutate()
on both sides of the equals sign. Here is an example of what I'd like to automate:
cars %>%
mutate(
new_speed = speed 5,
revised_speed = case_when(new.speed < 12 ~ 0,
new.speed == 12 ~ 1,
new.speed > 12 ~ 1/new_speed),
)
In order to automate this process for any dataset, I need to 1) attach the prefix "new" to whichever column name I specify, and 2) create an additional column with "improved" prefixed that depends on the values of the first column.
The function should look something like this, where ** ** is replaced with the proper syntax:
insert_names <- function(df, oldname, prefix_1, prefix_2){
df %>% mutate(
**prefix_1.oldname** = oldname 5,
**prefix_2.oldname** = case_when(**prefix_1.oldname** < 12 ~ 0,
**prefix_1.oldname** == 12 ~ 1,
**prefix_1.oldname** > 12 ~ 1/**prefix_1.oldname**),
)
}
The correct function should reproduce the original output like this:
insert_names(cars, oldname = "speed", prefix_1 = "new", prefix_2 = "improved")
though I could leave speed
unquoted if that's easier.
CodePudding user response:
- We can use
library(dplyr)
library(data.table)
insert_names <- function(df, oldname, prefix_1, prefix_2){
pre1_old <- paste0(prefix_1 , "." , oldname)
pre2_old <- paste0(prefix_2 , "." , oldname)
d <- df %>% mutate(
x = !!sym(oldname) 5,
y = case_when(x < 12 ~ 0,
x == 12 ~ 1,
x > 12 ~ 1/x),
)
d %>% setnames(c("x" , "y") ,c(pre1_old ,pre2_old))
d
}
insert_names(cars, oldname = "speed", prefix_1 = "new", prefix_2 = "improved")
- ouput
speed dist new.speed improved.speed
1 4 2 9 0.00000000
2 4 10 9 0.00000000
3 7 4 12 1.00000000
4 7 22 12 1.00000000
5 8 16 13 0.07692308
6 9 10 14 0.07142857
7 10 18 15 0.06666667
8 10 26 15 0.06666667
9 10 34 15 0.06666667
10 11 17 16 0.06250000
11 11 28 16 0.06250000
12 12 14 17 0.05882353
13 12 20 17 0.05882353
14 12 24 17 0.05882353
15 12 28 17 0.05882353
CodePudding user response:
A nice case for using rlang
:
library(dplyr)
library(rlang)
insert_names <- function(df, oldname, prefix_1, prefix_2){
new_name_1 <- paste(prefix_1, oldname, sep = ".")
new_name_2 <- paste(prefix_2, oldname, sep = ".")
df %>% mutate(
!!new_name_1 := !!sym(oldname) 5,
!!new_name_2 := case_when(!!sym(new_name_1) < 12 ~ 0,
!!sym(new_name_1) == 12 ~ 1,
!!sym(new_name_1) > 12 ~ 1/!!sym(new_name_1)),
)
}
insert_names(cars, "speed", "new", "newer")
#> speed dist new.speed newer.speed
#> 1 4 2 9 0.00000000
#> 2 4 10 9 0.00000000
#> 3 7 4 12 1.00000000
#> 4 7 22 12 1.00000000
#> 5 8 16 13 0.07692308
#> 6 9 10 14 0.07142857
#> 7 10 18 15 0.06666667
#> 8 10 26 15 0.06666667
#> 9 10 34 15 0.06666667
#> 10 11 17 16 0.06250000
...
Edit
I do see that the other answer posted about the same time used the same method. Minor difference is in where new columns are named, either when created or before returning data frame.
CodePudding user response:
For this, you need to use forcing and defusing operators. The double curly braces force and defuse a given string, which allows you to (1) reference a column name as a string and (2) force the function argument. When using these operators you must use ":=" as the assignment operator. I also use get() to get the column value from the referenced string name. Not sure if this is THE most efficient way and I'm sure someone may have better code, but it works.
(note: !! or "bang-bang" is a forcing operator, equo() defuses, {{}} does both and is the equivalent of !!enquo() -- not sure if you need {{}} each place I put them in this code but yeah)
Here is a working code:
insert_names <- function(df, oldname, prefix_1, prefix_2){
col_name1 = paste0(prefix_1, "_", oldname)
col_name2 = paste0(prefix_2, "_", oldname)
df %>% mutate(
{{col_name1}} := get(!!oldname) 5,
{{col_name2}} := case_when(get(!!col_name1) < 12 ~ 0,
get(!!col_name1) == 12 ~ 1,
TRUE ~ 1/get(!!col_name1)
))
}
insert_names(cars, oldname = "speed", prefix_1 = "new", prefix_2 = "improved")