Home > Back-end >  Dplyr: Conditionally rename multiple variables with regex by name
Dplyr: Conditionally rename multiple variables with regex by name

Time:09-13

I need to rename multiple variables using a replacement dataframe. This replacement dataframe also includes regex. I would like to use a similar solution proposed here, .e.g

df %>% rename_with(~ newnames, all_of(oldnames))

MWE:

df <- mtcars[, 1:5]

# works without regex 
replace_df_1 <- tibble::tibble(
  old = df %>% colnames(),
  new = df %>% colnames() %>% toupper()
) 

df %>% rename_with(~ replace_df_1$new, all_of(replace_df_1$old))


# with regex

replace_df_2 <- tibble::tibble(
  old = c("^m", "cyl101|cyl", "disp", "hp", "drat"),
  new = df %>% colnames() %>% toupper()
)

  old        new  
  <chr>      <chr>
1 ^m         MPG  
2 cyl101|cyl CYL  
3 disp       DISP 
4 hp         HP   
5 drat       DRAT 

# does not work
df %>% rename_with(~ replace_df_2$new, all_of(replace_df_2$old))
df %>% rename_with(~ matches(replace_df_2$new), all_of(replace_df_2$old))

EDIT 1:

The solution of @Mael works in general, but there seems to be index issue, e.g. consider the following example

replace_df_2 <- tibble::tibble(
  old = c("xxxx", "cyl101|cyl", "yyy", "xxx", "yyy"),
  new = mtcars[,1:5] %>% colnames() %>% toupper()
)


mtcars[, 1:5] %>% 
  rename_with(~ replace_df_2$new, matches(replace_df_2$old))

Results in

 mpg   MPG  disp    hp  drat
   <dbl> <dbl> <dbl> <dbl> <dbl>
 1  21       6  160    110  3.9 

meaning that the rename_with function correctly finds the column, but replaces it with the first item in the replacement column. How can we tell the function to take the respective row where a replacement has been found? So in this example (edit 1), I only want to substitute the second column with "CYL", the rest should be left untouched. The problem is that the function takes the first replacement (MPG) instead of the second (CYL).

Thank you for any hints!

CodePudding user response:

matches should be on the regex-y column:

df %>% 
  rename_with(~ replace_df_2$new, matches(replace_df_2$old))
                     MPG CYL  DISP  HP DRAT
Mazda RX4           21.0   6 160.0 110 3.90
Mazda RX4 Wag       21.0   6 160.0 110 3.90
Datsun 710          22.8   4 108.0  93 3.85
Hornet 4 Drive      21.4   6 258.0 110 3.08
Hornet Sportabout   18.7   8 360.0 175 3.15
Valiant             18.1   6 225.0 105 2.76
#...

CodePudding user response:

If the task is simply to set all col names to upper-case, then this works:

sub("^(. )$", "\\U\\1", colnames(df), perl = TRUE)
[1] "MPG"  "CYL"  "DISP" "HP"   "DRAT"

In dplyr:

df %>%
  rename_with( ~sub("^(. )$", "\\U\\1", colnames(df), perl = TRUE))

CodePudding user response:

I found a solution using the idea of non standard evaluation from this question and @Maël's answer.

Using map_lgl we create a logical vector that returns TRUE if the column in replace_df_2$old can be found inside the dataframe df. Then we pass this logical vector to replace_df_2$new to get the correct replacement.

df <- mtcars[, 1:5]
df %>% 
  rename_with(.fn = ~replace_df_2$new[map_lgl(replace_df_2$old,~ any(str_detect(., names(df))))], 
              .cols = matches(replace_df_2$old))

Result:

                     mpg CYL  disp  hp drat
Mazda RX4           21.0   6 160.0 110 3.90
  • Related