I have several data.frames and I'd like to apply some transformations on theis columns.
What I have done in first place is something like this:
require(dplyr)
df_1 = data.frame(
'a' = c('aa.aa', 'aa..a/a', 'aaa aa.'),
'b' = c('b..b/', 'bbb./b', '..bb/--b'),
'c' = c('ccc', 'cc/cc', 'ccc.-cc')
)
df_1
df_2 = data.frame(
'a' = c('aa.a..a', '//aa..a/a', 'aaa aa.'),
'b' = c('b../b/', 'bbb./b', '..bb/--b'),
'c' = c('cc//c', 'cc/c/c', 'c//cc.-cc')
)
df_2
# df_3, df_4, df_5, ...
# remove '.', ' ', '/', '-'
# replace with '_'
df_new <- df_1 %>%
mutate(a = toupper(a),
a = gsub('\\.', '_', a),
a = gsub('/', '_', a),
a = gsub(' ', '_', a),
a = gsub('-', '_', a))
df_new
Output:
> df_new
a b c
1 AA_AA b..b/ ccc
2 AA__A_A bbb./b cc/cc
3 AAA_AA_ ..bb/--b ccc.-cc
I remove every special character from 'a' column in df_1. But I'd like to perform these operations on others columns, so I was thinking on a function like this:
remove_special_characters <- function(df, var) {
df_new <- df %>%
mutate(var = gsub('\\.', '_', var),
var = gsub('/', '_', var),
var = gsub(' ', '_', var),
var = gsub('-', '_', var))
df_new
}
remove_special_characters(df_1, a)
Output:
Error: Problem with `mutate()` column `var`.
i `var = gsub("\\.", "_", var)`.
x object 'a' not found
Run `rlang::last_error()` to see where the error occurred.
remove_special_characters(df_2, b)
Output
Error: Problem with `mutate()` column `var`.
i `var = gsub("\\.", "_", var)`.
x object 'b' not found
Run `rlang::last_error()` to see where the error occurred.
# ...
But this doesn't work. I looked for the reason and found that mutate function uses data-masking. I searched some solutions, like this:
Use dynamic variable names in `dplyr`
But it does not solve my problem.
Is there any way to create a function that performs this operation?
CodePudding user response:
You would need to evaluate with !!
, also assign by :=
:
remove_special_characters <- function(df, var) {
df_new <- df %>%
mutate({{var}} := toupper({{var}}),
{{var}} := gsub('\\.', '_', {{var}}),
{{var}} := gsub('/', '_', {{var}}),
{{var}} := gsub(' ', '_', {{var}}),
{{var}} := gsub('-', '_', {{var}}))
df_new
}
Ex:
> remove_special_characters(df_1, a)
a b c
1 AA_AA b..b/ ccc
2 AA__A_A bbb./b cc/cc
3 AAA_AA_ ..bb/--b ccc.-cc
>
CodePudding user response:
You can also use across to apply a function to several variables
Defining a vector of special characters to replace
remove_vec = paste(c("\\.", "/", " ", "-"), collapse = "|")
df_1 %>%
mutate(across(.cols = c(a,b,c),.fns = ~ gsub(remove_vec,"_",.)))