Home > Software design >  How to rename multiple columns that have numeric column names using dplyr?
How to rename multiple columns that have numeric column names using dplyr?

Time:01-10

Setup

Suppose I have a dataframe in which several columns have actual numeric values as the column names instead of text-based column names. For example, the column names would be `2015` instead of "2015".

Here is a reproducible example:

my_df = structure(list(Col1 = c('a', 'b', 'c'), 
                       Col2 = c('d', 'e', 'f'), 
                       `2015` = c('g','h','i'), 
                       `2016` = c('j','k','l'), 
                       `2017` = c('m','n','o'), 
                       `2018` = c('p','q','r'), 
                       `2019` = c('s','t','u'), 
                       `2020` = c('v','w','x'), 
                       `2021` = c('y','z','zz')), 
               row.names = c(NA, -3L), 
               class = c("tbl_df", "tbl", "data.frame"))

Question

Suppose further that I want to rename all of the numeric column names to something friendlier. Say, for example, from `2015` to "XYZ_2015" (note the conversion to string) and similar conversions to columns `2016`, `2017`, ... `2021`.

How can I perform this column renaming in a way that uses dplyr's piping operator (%>%) and doesn't require me writing them all out manually?

My current solution

Up till now, my approach has been to do this "manually", renaming each column individually:

new_df = my_df %>%
  rename(XYZ_2015 = `2015`,
         XYZ_2016 = `2016`,
         XYZ_2017 = `2017`,
         XYZ_2018 = `2018`,
         XYZ_2019 = `2019`,
         XYZ_2020 = `2020`,
         XYZ_2021 = `2021`)

However, this method is cumbersome and prone to errors. Is there a way for me to do so in a more automated way? I feel like a for-loop would work here, but I can't figure out the syntax to make the for-loop's variable play nice with the the backticks.

For example, I've tried this:

for(year in 2015:2021){
  print(year)
  new_colname = paste0('XYZ_',year)
  my_df = my_df %>% rename(`new_colname` = `year`)
}

But this yields an error:

Error in `stop_subscript()`:
! Can't rename columns that don't exist.
x Location 2015 doesn't exist.
i There are only 9 columns.
---
Backtrace:
  1. my_df %>% rename(new_colname = year)
  3. dplyr:::rename.data.frame(., new_colname = year)
  4. tidyselect::eval_rename(expr(c(...)), .data)
  5. tidyselect:::rename_impl(...)
  6. tidyselect:::eval_select_impl(...)
 15. tidyselect:::vars_select_eval(...)
 16. tidyselect:::loc_validate(pos, vars, call = error_call)
 17. vctrs::vec_as_location(pos, n = length(vars))
 18. vctrs `<fn>`()
 19. vctrs:::stop_subscript_oob(...)
 20. vctrs:::stop_subscript(...)

CodePudding user response:

We can use rename_with and paste (str_c) the prefix XYZ_ with the column name (.x) only for those column names that matches with 4 digit (\\d{4}) column name from start (^) to end ($) of string

library(dplyr)
library(stringr)
my_df %>%
   rename_with(~ str_c("XYZ_", .x), matches("^\\d{4}$"))

-output

# A tibble: 3 × 9
  Col1  Col2  XYZ_2015 XYZ_2016 XYZ_2017 XYZ_2018 XYZ_2019 XYZ_2020 XYZ_2021
  <chr> <chr> <chr>    <chr>    <chr>    <chr>    <chr>    <chr>    <chr>   
1 a     d     g        j        m        p        s        v        y       
2 b     e     h        k        n        q        t        w        z       
3 c     f     i        l        o        r        u        x        zz      

Note that rename_with usage is

rename_with(.data, .fn, .cols = everything(), ...)

and .cols are specified as

.cols - <<tidy-select>> Columns to rename; defaults to all columns.

This implies that we can make use of any of the tidy-select helper functions (matches/starts_with/ends_with/everything()) etc for selecting the columns


Or with base R

names(my_df) <- sub("^X", "XYZ_", make.names(names(my_df)))

CodePudding user response:

Not as elegant and a little hacky way:

library(dplyr)
my_colnames <- paste("XYZ", colnames(my_df[-c(1:2)]))

my_df %>% 
  rename(!!!setNames(names(.[-c(1:2)]), my_colnames))
  Col1  Col2  `XYZ 2015` `XYZ 2016` `XYZ 2017` XYZ 2…¹ XYZ 2…² XYZ 2…³ XYZ 2…⁴
  <chr> <chr> <chr>      <chr>      <chr>      <chr>   <chr>   <chr>   <chr>  
1 a     d     g          j          m          p       s       v       y      
2 b     e     h          k          n          q       t       w       z      
3 c     f     i          l          o          r       u       x       zz     
# … with abbreviated variable names ¹​`XYZ 2018`, ²​`XYZ 2019`, ³​`XYZ 2020`,
  • Related