Home > OS >  Using mutate in custom function with mutation condition as argument
Using mutate in custom function with mutation condition as argument

Time:11-11

Is it possible to construct a function, say my_mut(df, condition) such that df is a dataframe, condition is a string describing a mutation, and somewhere in the function, the mutation of df according to condition is used?

For example, if df has a foo column, my_mut(df, "foo = 2*foo"), then somewhere within my_mut() there would be a row that produces the same dataframe as df %>% mutate(foo = 2*foo).

I managed to do something similar with filter using eval and parse.

update_filt <- function(df,
                        filt,
                        col){

  sub <- df %>%
    filter(eval(parse(text = filt))) %>%
    mutate("{{col}}" := 2*{{ col }})

  remain <- df %>%
    filter(eval(parse(
                text = paste0("!(",filt,")")
                ))
           )

  return(rbind(sub, remain))
}

I am not sure the update_filt function is faultproof, but it works in some cases at least, e.g., library(gapminder) date_filt(gapminder, "year == 1952", pop) returns the expected outcome.

The same trick does not seem to work with mutate though. For example,

update_mut <- function(df, mutation){
df %>% mutate(eval(parse(text = mutate))
}

produces outcomes like

library(gapminder)
update_mut(gapminder, "year = 2*year")
# A tibble: 1,704 × 7
   country     continent  year lifeExp      pop gdpPercap `eval(parse(text = mutation))`
   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>                          <dbl>
 1 Afghanistan Asia       1952    28.8  8425333      779.                           3904
 2 Afghanistan Asia       1957    30.3  9240934      821.                           3914
 3 Afghanistan Asia       1962    32.0 10267083      853.                           3924
 4 Afghanistan Asia       1967    34.0 11537966      836.                           3934
 5 Afghanistan Asia       1972    36.1 13079460      740.                           3944
 6 Afghanistan Asia       1977    38.4 14880372      786.                           3954
 7 Afghanistan Asia       1982    39.9 12881816      978.                           3964
 8 Afghanistan Asia       1987    40.8 13867957      852.                           3974
 9 Afghanistan Asia       1992    41.7 16317921      649.                           3984
10 Afghanistan Asia       1997    41.8 22227415      635.                           3994
# … with 1,694 more rows

Instead of the expected

gapminder %>% mutate(year = 2*year)

# A tibble: 1,704 × 6
   country     continent  year lifeExp      pop gdpPercap
   <fct>       <fct>     <dbl>   <dbl>    <int>     <dbl>
 1 Afghanistan Asia       3904    28.8  8425333      779.
 2 Afghanistan Asia       3914    30.3  9240934      821.
 3 Afghanistan Asia       3924    32.0 10267083      853.
 4 Afghanistan Asia       3934    34.0 11537966      836.
 5 Afghanistan Asia       3944    36.1 13079460      740.
 6 Afghanistan Asia       3954    38.4 14880372      786.
 7 Afghanistan Asia       3964    39.9 12881816      978.
 8 Afghanistan Asia       3974    40.8 13867957      852.
 9 Afghanistan Asia       3984    41.7 16317921      649.
10 Afghanistan Asia       3994    41.8 22227415      635.
# … with 1,694 more rows

CodePudding user response:

If your formula is always like origianl = do_something_original(), this may helps.(for dplyr version >= 1.0)

library(dplyr)
library(stringr)

update_mut <- function(df, mutation){
  xx <- word(mutation, 1)
  df %>% 
    mutate("{xx}" := eval(parse(text = mutation)))
}
update_mut(gapminder, "year = 2*year")

   country     continent  year lifeExp      pop gdpPercap
   <fct>       <fct>     <dbl>   <dbl>    <int>     <dbl>
 1 Afghanistan Asia       3904    28.8  8425333      779.
 2 Afghanistan Asia       3914    30.3  9240934      821.
 3 Afghanistan Asia       3924    32.0 10267083      853.
 4 Afghanistan Asia       3934    34.0 11537966      836.
 5 Afghanistan Asia       3944    36.1 13079460      740.
 6 Afghanistan Asia       3954    38.4 14880372      786.
 7 Afghanistan Asia       3964    39.9 12881816      978.
 8 Afghanistan Asia       3974    40.8 13867957      852.
 9 Afghanistan Asia       3984    41.7 16317921      649.
10 Afghanistan Asia       3994    41.8 22227415      635.

CodePudding user response:

The problem is that mutate doesn't understand the assignment, because all the syntax is evaluated inside eval. So mutate simply thinks this is a nameless expression and assigns as its name the whole text of the expression.

One way to circumvent this would be to eval the whole thing, including the mutate verb, as below.

update_mut <- function(df, mutation) {
  # Evaluate the mutation expression
  eval(parse(text = paste0("mutate(df, ", mutation, ")")))
}

Another way would be, inside the update_mut function, to split the mutation parameter by the = character, therefore obtaining the name of the variable and the expressions. Therefore you could use a dynamic variable assingment in mutate. However this would only be more to do, since the above code simply solves the problem.

CodePudding user response:

library(dplyr, warn.conflicts = FALSE)
my_mut <- function(df, df_filter, ...){
  df %>% 
    filter({{ df_filter }}) %>% 
    mutate(newvar = 'other function stuff',
           ...)
}

example_df <- data.frame(a = c('zebra', 'some value'),
                         b = 1:2)

example_df %>% 
  my_mut(df_filter = a == 'some value', 
         b = b*5)
#>            a  b               newvar
#> 1 some value 10 other function stuff

Created on 2021-11-11 by the reprex package (v2.0.1)

If you can't use ... because you're already using it in the function for something else, you could wrap the mutation argument in tibble when calling the function.

library(dplyr, warn.conflicts = FALSE)
my_mut <- function(df, df_filter, mutation){
  df %>% 
    filter({{ df_filter }}) %>% 
    mutate(newvar = 'other function stuff',
           {{ mutation }})
}

example_df <- data.frame(a = c('zebra', 'some value'),
                         b = 1:2)

example_df %>% 
  my_mut(df_filter = a == 'some value', 
         mutation = tibble(b = b*5))
#>            a  b               newvar
#> 1 some value 10 other function stuff

Created on 2021-11-11 by the reprex package (v2.0.1)

  • Related