Is it possible to construct a function, say my_mut(df, condition)
such that df
is a dataframe, condition
is a string describing a mutation, and somewhere in the function, the mutation of df
according to condition
is used?
For example, if df
has a foo
column, my_mut(df, "foo = 2*foo")
, then somewhere within my_mut()
there would be a row that produces the same dataframe as df %>% mutate(foo = 2*foo)
.
I managed to do something similar with filter
using eval
and parse
.
update_filt <- function(df,
filt,
col){
sub <- df %>%
filter(eval(parse(text = filt))) %>%
mutate("{{col}}" := 2*{{ col }})
remain <- df %>%
filter(eval(parse(
text = paste0("!(",filt,")")
))
)
return(rbind(sub, remain))
}
I am not sure the update_filt
function is faultproof, but it works in some cases at least, e.g., library(gapminder) date_filt(gapminder, "year == 1952", pop)
returns the expected outcome.
The same trick does not seem to work with mutate
though. For example,
update_mut <- function(df, mutation){
df %>% mutate(eval(parse(text = mutate))
}
produces outcomes like
library(gapminder)
update_mut(gapminder, "year = 2*year")
# A tibble: 1,704 × 7
country continent year lifeExp pop gdpPercap `eval(parse(text = mutation))`
<fct> <fct> <int> <dbl> <int> <dbl> <dbl>
1 Afghanistan Asia 1952 28.8 8425333 779. 3904
2 Afghanistan Asia 1957 30.3 9240934 821. 3914
3 Afghanistan Asia 1962 32.0 10267083 853. 3924
4 Afghanistan Asia 1967 34.0 11537966 836. 3934
5 Afghanistan Asia 1972 36.1 13079460 740. 3944
6 Afghanistan Asia 1977 38.4 14880372 786. 3954
7 Afghanistan Asia 1982 39.9 12881816 978. 3964
8 Afghanistan Asia 1987 40.8 13867957 852. 3974
9 Afghanistan Asia 1992 41.7 16317921 649. 3984
10 Afghanistan Asia 1997 41.8 22227415 635. 3994
# … with 1,694 more rows
Instead of the expected
gapminder %>% mutate(year = 2*year)
# A tibble: 1,704 × 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <dbl> <dbl> <int> <dbl>
1 Afghanistan Asia 3904 28.8 8425333 779.
2 Afghanistan Asia 3914 30.3 9240934 821.
3 Afghanistan Asia 3924 32.0 10267083 853.
4 Afghanistan Asia 3934 34.0 11537966 836.
5 Afghanistan Asia 3944 36.1 13079460 740.
6 Afghanistan Asia 3954 38.4 14880372 786.
7 Afghanistan Asia 3964 39.9 12881816 978.
8 Afghanistan Asia 3974 40.8 13867957 852.
9 Afghanistan Asia 3984 41.7 16317921 649.
10 Afghanistan Asia 3994 41.8 22227415 635.
# … with 1,694 more rows
CodePudding user response:
If your formula is always like origianl = do_something_original(), this may helps.(for dplyr
version >= 1.0)
library(dplyr)
library(stringr)
update_mut <- function(df, mutation){
xx <- word(mutation, 1)
df %>%
mutate("{xx}" := eval(parse(text = mutation)))
}
update_mut(gapminder, "year = 2*year")
country continent year lifeExp pop gdpPercap
<fct> <fct> <dbl> <dbl> <int> <dbl>
1 Afghanistan Asia 3904 28.8 8425333 779.
2 Afghanistan Asia 3914 30.3 9240934 821.
3 Afghanistan Asia 3924 32.0 10267083 853.
4 Afghanistan Asia 3934 34.0 11537966 836.
5 Afghanistan Asia 3944 36.1 13079460 740.
6 Afghanistan Asia 3954 38.4 14880372 786.
7 Afghanistan Asia 3964 39.9 12881816 978.
8 Afghanistan Asia 3974 40.8 13867957 852.
9 Afghanistan Asia 3984 41.7 16317921 649.
10 Afghanistan Asia 3994 41.8 22227415 635.
CodePudding user response:
The problem is that mutate doesn't understand the assignment, because all the syntax is evaluated inside eval
. So mutate simply thinks this is a nameless expression and assigns as its name the whole text of the expression.
One way to circumvent this would be to eval
the whole thing, including the mutate
verb, as below.
update_mut <- function(df, mutation) {
# Evaluate the mutation expression
eval(parse(text = paste0("mutate(df, ", mutation, ")")))
}
Another way would be, inside the update_mut function, to split the mutation
parameter by the =
character, therefore obtaining the name of the variable and the expressions. Therefore you could use a dynamic variable assingment in mutate. However this would only be more to do, since the above code simply solves the problem.
CodePudding user response:
library(dplyr, warn.conflicts = FALSE)
my_mut <- function(df, df_filter, ...){
df %>%
filter({{ df_filter }}) %>%
mutate(newvar = 'other function stuff',
...)
}
example_df <- data.frame(a = c('zebra', 'some value'),
b = 1:2)
example_df %>%
my_mut(df_filter = a == 'some value',
b = b*5)
#> a b newvar
#> 1 some value 10 other function stuff
Created on 2021-11-11 by the reprex package (v2.0.1)
If you can't use ...
because you're already using it in the function for something else, you could wrap the mutation
argument in tibble
when calling the function.
library(dplyr, warn.conflicts = FALSE)
my_mut <- function(df, df_filter, mutation){
df %>%
filter({{ df_filter }}) %>%
mutate(newvar = 'other function stuff',
{{ mutation }})
}
example_df <- data.frame(a = c('zebra', 'some value'),
b = 1:2)
example_df %>%
my_mut(df_filter = a == 'some value',
mutation = tibble(b = b*5))
#> a b newvar
#> 1 some value 10 other function stuff
Created on 2021-11-11 by the reprex package (v2.0.1)