Home > database >  How do I pass a column name to a function involving mutate?
How do I pass a column name to a function involving mutate?

Time:08-09

I am trying to write a function that takes a string and uses it as a column name in dplyr::mutate() on both sides of the equals sign. Here is an example of what I'd like to automate:

cars %>% 
  mutate(
    new_speed = speed   5,
    revised_speed = case_when(new.speed < 12 ~ 0,
                              new.speed == 12 ~ 1,
                              new.speed > 12 ~ 1/new_speed), 
  )

In order to automate this process for any dataset, I need to 1) attach the prefix "new" to whichever column name I specify, and 2) create an additional column with "improved" prefixed that depends on the values of the first column.

The function should look something like this, where ** ** is replaced with the proper syntax:

insert_names <- function(df, oldname, prefix_1, prefix_2){
  df %>% mutate(
    **prefix_1.oldname** = oldname   5,
    **prefix_2.oldname** = case_when(**prefix_1.oldname** < 12 ~ 0,
                                     **prefix_1.oldname** == 12 ~ 1,
                                     **prefix_1.oldname** > 12 ~ 1/**prefix_1.oldname**),
    
  )
}

The correct function should reproduce the original output like this:

insert_names(cars, oldname = "speed", prefix_1 = "new", prefix_2 = "improved")

though I could leave speed unquoted if that's easier.

CodePudding user response:

  • We can use
library(dplyr)
library(data.table)

insert_names <- function(df, oldname, prefix_1, prefix_2){
    pre1_old <- paste0(prefix_1 , "." , oldname)
    pre2_old <- paste0(prefix_2 , "." , oldname)
    d <- df %>% mutate(
        x = !!sym(oldname)   5,
        y = case_when(x < 12 ~ 0,
                      x == 12 ~ 1,
                      x > 12 ~ 1/x),
        
    )
    d  %>% setnames(c("x" , "y") ,c(pre1_old ,pre2_old))
    d
}

insert_names(cars, oldname = "speed", prefix_1 = "new", prefix_2 = "improved")
  • ouput
  speed dist new.speed improved.speed
1      4    2         9     0.00000000
2      4   10         9     0.00000000
3      7    4        12     1.00000000
4      7   22        12     1.00000000
5      8   16        13     0.07692308
6      9   10        14     0.07142857
7     10   18        15     0.06666667
8     10   26        15     0.06666667
9     10   34        15     0.06666667
10    11   17        16     0.06250000
11    11   28        16     0.06250000
12    12   14        17     0.05882353
13    12   20        17     0.05882353
14    12   24        17     0.05882353
15    12   28        17     0.05882353

CodePudding user response:

A nice case for using rlang:

library(dplyr)
library(rlang)

insert_names <- function(df, oldname, prefix_1, prefix_2){
  
  new_name_1 <- paste(prefix_1, oldname, sep = ".")
  new_name_2 <- paste(prefix_2, oldname, sep = ".")
  
  df %>% mutate(
    !!new_name_1 := !!sym(oldname)   5,
    !!new_name_2 := case_when(!!sym(new_name_1) < 12 ~ 0,
                                     !!sym(new_name_1) == 12 ~ 1,
                                     !!sym(new_name_1) > 12 ~ 1/!!sym(new_name_1)),
  )
}

insert_names(cars, "speed", "new", "newer")
#>    speed dist new.speed newer.speed
#> 1      4    2         9  0.00000000
#> 2      4   10         9  0.00000000
#> 3      7    4        12  1.00000000
#> 4      7   22        12  1.00000000
#> 5      8   16        13  0.07692308
#> 6      9   10        14  0.07142857
#> 7     10   18        15  0.06666667
#> 8     10   26        15  0.06666667
#> 9     10   34        15  0.06666667
#> 10    11   17        16  0.06250000
...

Edit

I do see that the other answer posted about the same time used the same method. Minor difference is in where new columns are named, either when created or before returning data frame.

CodePudding user response:

For this, you need to use forcing and defusing operators. The double curly braces force and defuse a given string, which allows you to (1) reference a column name as a string and (2) force the function argument. When using these operators you must use ":=" as the assignment operator. I also use get() to get the column value from the referenced string name. Not sure if this is THE most efficient way and I'm sure someone may have better code, but it works.

(note: !! or "bang-bang" is a forcing operator, equo() defuses, {{}} does both and is the equivalent of !!enquo() -- not sure if you need {{}} each place I put them in this code but yeah)

Here is a working code:

insert_names <- function(df, oldname, prefix_1, prefix_2){
  col_name1 = paste0(prefix_1, "_", oldname)
  col_name2 = paste0(prefix_2, "_", oldname)
  df %>% mutate(
    {{col_name1}} := get(!!oldname)   5,
    {{col_name2}} := case_when(get(!!col_name1) < 12 ~ 0,
                               get(!!col_name1) == 12 ~ 1,
                               TRUE ~ 1/get(!!col_name1)
    
  ))
}

insert_names(cars, oldname = "speed", prefix_1 = "new", prefix_2 = "improved")
  • Related