Home > database >  Dplyr: Rename multiple variables with regex by name
Dplyr: Rename multiple variables with regex by name

Time:09-12

I need to rename multiple variables using a replacement dataframe. This replacement dataframe also includes regex. I would like to use a similar solution proposed here, .e.g

df %>% rename_with(~ newnames, all_of(oldnames))

MWE:

df <- mtcars[, 1:5]

# works without regex 
replace_df_1 <- tibble::tibble(
  old = df %>% colnames(),
  new = df %>% colnames() %>% toupper()
) 

df %>% rename_with(~ replace_df_1$new, all_of(replace_df_1$old))


# with regex

replace_df_2 <- tibble::tibble(
  old = c("^m", "cyl101|cyl", "disp", "hp", "drat"),
  new = df %>% colnames() %>% toupper()
)

  old        new  
  <chr>      <chr>
1 ^m         MPG  
2 cyl101|cyl CYL  
3 disp       DISP 
4 hp         HP   
5 drat       DRAT 

# does not work
df %>% rename_with(~ replace_df_2$new, all_of(replace_df_2$old))
df %>% rename_with(~ matches(replace_df_2$new), all_of(replace_df_2$old))

EDIT 1:

The solution of @Mael works in general, but there seems to be index issue, e.g. consider the following example

replace_df_2 <- tibble::tibble(
  old = c("xxxx", "cyl101|cyl", "yyy", "xxx", "yyy"),
  new = mtcars[,1:5] %>% colnames() %>% toupper()
)


mtcars[, 1:5] %>% 
  rename_with(~ replace_df_2$new, matches(replace_df_2$old))

Results in

 mpg   MPG  disp    hp  drat
   <dbl> <dbl> <dbl> <dbl> <dbl>
 1  21       6  160    110  3.9 

meaning that the rename_with function correctly finds the column, but replaces it with the first item in the replacement column.

Thank you for any hints!

CodePudding user response:

matches should be on the regex-y column:

df %>% 
  rename_with(~ replace_df_2$new, matches(replace_df_2$old))
                     MPG CYL  DISP  HP DRAT
Mazda RX4           21.0   6 160.0 110 3.90
Mazda RX4 Wag       21.0   6 160.0 110 3.90
Datsun 710          22.8   4 108.0  93 3.85
Hornet 4 Drive      21.4   6 258.0 110 3.08
Hornet Sportabout   18.7   8 360.0 175 3.15
Valiant             18.1   6 225.0 105 2.76
#...

CodePudding user response:

If the task is simply to set all col names to upper-case, then this works:

sub("^(. )$", "\\U\\1", colnames(df), perl = TRUE)
[1] "MPG"  "CYL"  "DISP" "HP"   "DRAT"

In dplyr:

df %>%
  rename_with( ~sub("^(. )$", "\\U\\1", colnames(df), perl = TRUE))
  • Related