Is there a way I can rename variables based on how the variable ends?-CodePudding

My dataset looks like this:

library(dplyr)

a <- rnorm(N)
b <- rnorm(N)
c_04 <- rnorm(N)
d_04 <- rnorm(N)
e_04 <- rnorm(N)

df <- data.frame(a, b, c_04, d_04, e_04)

Is there a way I can use the rename_with function to change the variables that end with _04 to drop the _04. In other words, the variables in df should only be a, b, c, d.

Thank you.

CodePudding user response：

You can use str_remove()

library(stringr)
library(dplyr)

df %>% 
  rename_with( ~ str_remove(., "_04"))

Or maybe more generally. Basically just use str_remove() (or another similar function) with whatever pattern you need depending on the problem.

df %>% 
  rename_with( ~ str_remove(., "_\\d "))

CodePudding user response：

To add to previous answers, I like to add the .cols argument in conjunction with dplyr::ends_with() to make the code less mistake prone. This can be useful if you have more complex names. For example you might have a column name containing _04 but not at the end of the string. The previous answer will remove this regardless.

library(tidyverse)
N=1
a <- rnorm(N)
b <- rnorm(N)
c_04 <- rnorm(N)
d_04 <- rnorm(N)
e_04 <- rnorm(N)
weird_04_name_05 <- rnorm(N)

df <- data.frame(a, b, c_04, d_04, e_04,weird_04_name_05)

df %>%  rename_with(.fn = ~ str_replace(.x, "_04", ""),
                    .cols = ends_with("_04"))

CodePudding user response：

Or another option using stringr and set_names:

library(tidyverse)

df %>% 
  set_names(str_remove, "_.*")

Output

           a          b          c          d          e
1 -0.6706685 2.05351983 -0.7972316 -0.1520679 -0.7714376
2 -1.7739331 1.45570354 -0.6012567  0.2613097 -0.7914683
3 -0.7719231 0.04259273  0.3809469  1.2360435  0.8250286

Or in base R:

setNames(df, gsub("_.*", "", names(df)))

Or with data.table:

library(data.table)

setnames(setDT(df), str_remove(names(dt), "_.*"))

Data

df <- structure(list(a = c(0.894805325864747, -1.94185093341678, -1.00994988512899
), b = c(0.77908390827311, -0.0204816421929252, -0.346331859636578
), c_04 = c(-0.18087870239403, -0.275192762246937, -0.494661273775676
), d_04 = c(-0.206752223705721, -0.560550718406792, 1.45531474529632
), e_04 = c(-0.929914176494227, 1.76975758055254, -0.387603128597527
)), class = "data.frame", row.names = c(NA, -3L))

CodePudding user response：

Sounds like you want to rename columns in a data.frame based on how some variable names begin (as opposed to your question title "[...] rename variables based on how the variable ends?"). If you want to do what you asked for (rename variables that you then use to build a data.frame), you can do it like this:

c_04 <- rnorm(N)
d_04 <- rnorm(N)
e_04 <- rnorm(N)
varnames <- c("c_04", "d_04", "e_04")
for(var in varnames){
  name <- sub('_.*', '', var)
  assign(name, eval(parse(text = var)))
  do.call(rm, list(var))
}

Just in case you want to do what you say you want to do. If not, there are plenty of other answers here.