Setup
Suppose I have a dataframe in which several columns have actual numeric values as the column names instead of text-based column names. For example, the column names would be `2015`
instead of "2015"
.
Here is a reproducible example:
my_df = structure(list(Col1 = c('a', 'b', 'c'),
Col2 = c('d', 'e', 'f'),
`2015` = c('g','h','i'),
`2016` = c('j','k','l'),
`2017` = c('m','n','o'),
`2018` = c('p','q','r'),
`2019` = c('s','t','u'),
`2020` = c('v','w','x'),
`2021` = c('y','z','zz')),
row.names = c(NA, -3L),
class = c("tbl_df", "tbl", "data.frame"))
Question
Suppose further that I want to rename all of the numeric column names to something friendlier. Say, for example, from `2015`
to "XYZ_2015"
(note the conversion to string) and similar conversions to columns `2016`
, `2017`
, ... `2021`
.
How can I perform this column renaming in a way that uses dplyr's piping operator (%>%
) and doesn't require me writing them all out manually?
My current solution
Up till now, my approach has been to do this "manually", renaming each column individually:
new_df = my_df %>%
rename(XYZ_2015 = `2015`,
XYZ_2016 = `2016`,
XYZ_2017 = `2017`,
XYZ_2018 = `2018`,
XYZ_2019 = `2019`,
XYZ_2020 = `2020`,
XYZ_2021 = `2021`)
However, this method is cumbersome and prone to errors. Is there a way for me to do so in a more automated way? I feel like a for-loop would work here, but I can't figure out the syntax to make the for-loop's variable play nice with the the backticks.
For example, I've tried this:
for(year in 2015:2021){
print(year)
new_colname = paste0('XYZ_',year)
my_df = my_df %>% rename(`new_colname` = `year`)
}
But this yields an error:
Error in `stop_subscript()`:
! Can't rename columns that don't exist.
x Location 2015 doesn't exist.
i There are only 9 columns.
---
Backtrace:
1. my_df %>% rename(new_colname = year)
3. dplyr:::rename.data.frame(., new_colname = year)
4. tidyselect::eval_rename(expr(c(...)), .data)
5. tidyselect:::rename_impl(...)
6. tidyselect:::eval_select_impl(...)
15. tidyselect:::vars_select_eval(...)
16. tidyselect:::loc_validate(pos, vars, call = error_call)
17. vctrs::vec_as_location(pos, n = length(vars))
18. vctrs `<fn>`()
19. vctrs:::stop_subscript_oob(...)
20. vctrs:::stop_subscript(...)
CodePudding user response:
We can use rename_with
and paste
(str_c
) the prefix XYZ_
with the column name (.x
) only for those column names that matches
with 4 digit (\\d{4}
) column name from start (^
) to end ($
) of string
library(dplyr)
library(stringr)
my_df %>%
rename_with(~ str_c("XYZ_", .x), matches("^\\d{4}$"))
-output
# A tibble: 3 × 9
Col1 Col2 XYZ_2015 XYZ_2016 XYZ_2017 XYZ_2018 XYZ_2019 XYZ_2020 XYZ_2021
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 a d g j m p s v y
2 b e h k n q t w z
3 c f i l o r u x zz
Note that rename_with
usage is
rename_with(.data, .fn, .cols = everything(), ...)
and .cols
are specified as
.cols - <<tidy-select>> Columns to rename; defaults to all columns.
This implies that we can make use of any of the tidy-select
helper functions (matches/starts_with/ends_with/everything()
) etc for selecting the columns
Or with base R
names(my_df) <- sub("^X", "XYZ_", make.names(names(my_df)))
CodePudding user response:
Not as elegant and a little hacky way:
library(dplyr)
my_colnames <- paste("XYZ", colnames(my_df[-c(1:2)]))
my_df %>%
rename(!!!setNames(names(.[-c(1:2)]), my_colnames))
Col1 Col2 `XYZ 2015` `XYZ 2016` `XYZ 2017` XYZ 2…¹ XYZ 2…² XYZ 2…³ XYZ 2…⁴
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 a d g j m p s v y
2 b e h k n q t w z
3 c f i l o r u x zz
# … with abbreviated variable names ¹`XYZ 2018`, ²`XYZ 2019`, ³`XYZ 2020`,