I need to rename multiple variables using a replacement dataframe. This replacement dataframe also includes regex. I would like to use a similar solution proposed here, .e.g
df %>% rename_with(~ newnames, all_of(oldnames))
MWE:
df <- mtcars[, 1:5]
# works without regex
replace_df_1 <- tibble::tibble(
old = df %>% colnames(),
new = df %>% colnames() %>% toupper()
)
df %>% rename_with(~ replace_df_1$new, all_of(replace_df_1$old))
# with regex
replace_df_2 <- tibble::tibble(
old = c("^m", "cyl101|cyl", "disp", "hp", "drat"),
new = df %>% colnames() %>% toupper()
)
old new
<chr> <chr>
1 ^m MPG
2 cyl101|cyl CYL
3 disp DISP
4 hp HP
5 drat DRAT
# does not work
df %>% rename_with(~ replace_df_2$new, all_of(replace_df_2$old))
df %>% rename_with(~ matches(replace_df_2$new), all_of(replace_df_2$old))
EDIT 1:
The solution of @Mael works in general, but there seems to be index issue, e.g. consider the following example
replace_df_2 <- tibble::tibble(
old = c("xxxx", "cyl101|cyl", "yyy", "xxx", "yyy"),
new = mtcars[,1:5] %>% colnames() %>% toupper()
)
mtcars[, 1:5] %>%
rename_with(~ replace_df_2$new, matches(replace_df_2$old))
Results in
mpg MPG disp hp drat
<dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9
meaning that the rename_with
function correctly finds the column, but replaces it with the first item in the replacement column. How can we tell the function to take the respective row where a replacement has been found?
So in this example (edit 1), I only want to substitute the second column with "CYL", the rest should be left untouched. The problem is that the function takes the first replacement (MPG) instead of the second (CYL).
Thank you for any hints!
CodePudding user response:
matches
should be on the regex-y column:
df %>%
rename_with(~ replace_df_2$new, matches(replace_df_2$old))
MPG CYL DISP HP DRAT
Mazda RX4 21.0 6 160.0 110 3.90
Mazda RX4 Wag 21.0 6 160.0 110 3.90
Datsun 710 22.8 4 108.0 93 3.85
Hornet 4 Drive 21.4 6 258.0 110 3.08
Hornet Sportabout 18.7 8 360.0 175 3.15
Valiant 18.1 6 225.0 105 2.76
#...
CodePudding user response:
If the task is simply to set all col names to upper-case, then this works:
sub("^(. )$", "\\U\\1", colnames(df), perl = TRUE)
[1] "MPG" "CYL" "DISP" "HP" "DRAT"
In dplyr
:
df %>%
rename_with( ~sub("^(. )$", "\\U\\1", colnames(df), perl = TRUE))
CodePudding user response:
I found a solution using the idea of non standard evaluation from this question and @Maël's answer.
Using map_lgl
we create a logical vector that returns TRUE
if the column in replace_df_2$old
can be found inside the dataframe df
. Then we pass this logical vector to replace_df_2$new
to get the correct replacement.
df <- mtcars[, 1:5]
df %>%
rename_with(.fn = ~replace_df_2$new[map_lgl(replace_df_2$old,~ any(str_detect(., names(df))))],
.cols = matches(replace_df_2$old))
Result:
mpg CYL disp hp drat
Mazda RX4 21.0 6 160.0 110 3.90