Home > other >  Replace column names with the number from the original label
Replace column names with the number from the original label

Time:12-20

I want to rename multiple columns at once by using the digit(s) from the original column name. With one caveat though. Some questions have the same numbers and are differentiated by the addition of "other." In that case, I want to keep "other" next to the digit(s). Ideally, in front of every column name, I would add "item_". Bonus points if the solution uses the pipe operator (|>).

I have been looking for this answer on StackOverflow, but I could not find this exact situation. Please, feel free to re-direct me to questions already answered if this is a duplicate.

Toy dataset:

df_toy <- data.frame(
  ID = 1:20, 
  "variable 1 - language" = sample(1:4, 20, replace = TRUE),
  "variable 1 - other" = sample(1:7, 20, replace = TRUE), 
  "variable 2 - math" = sample(1:5, 20, replace = TRUE),
  "question 4 - writing" = sample(1:20, 20, replace = TRUE), 
  "question 4 other" = NA, check.names = FALSE
)

   ID variable 1 - language variable 1 - other variable 2 - math question 4 - writing question 4 other
1   1                     2                  2                 1                   14               NA
2   2                     2                  7                 4                   16               NA
3   3                     3                  1                 1                    7               NA
4   4                     2                  2                 1                    1               NA
5   5                     4                  1                 2                   19               NA
6   6                     1                  6                 4                    2               NA
7   7                     2                  7                 2                    3               NA
8   8                     1                  3                 4                   18               NA
9   9                     3                  3                 5                   18               NA

Desidered output:

   ID item_1 item_1 other item_2 item_4 item_4 other
1   1      1            3      2      3           NA
2   2      1            5      2     13           NA
3   3      4            6      1     13           NA
4   4      3            6      3     19           NA
5   5      4            3      5      6           NA
6   6      1            4      4      2           NA
7   7      3            7      3     12           NA
8   8      3            2      3     14           NA
9   9      4            7      1      7           NA
10 10      3            7      1      5           NA

CodePudding user response:

Or use rename_with and stringr:

library(dplyr)
library(stringr)

df_toy |>
  rename_with(~ paste0("item_", str_extract(., "\\d "), if_else(str_detect(., "other"), "_other", "")),
              .cols = -ID)

Output:

# A tibble: 20 × 6
      ID item_1 item_1_other item_2 item_4 item_4_other
   <int>  <int>        <int>  <int>  <int> <lgl>       
 1     1      4            1      2      3 NA          
 2     2      2            7      2     19 NA          
 3     3      3            6      4      8 NA          
 4     4      2            6      4     13 NA          
 5     5      3            2      4      1 NA          
 6     6      1            3      5      2 NA          
 7     7      4            1      5     11 NA          
 8     8      2            1      4     16 NA          
 9     9      3            2      4      5 NA          
10    10      1            2      2      4 NA          
11    11      4            4      5      3 NA          
12    12      2            3      2     15 NA          
13    13      3            2      2     10 NA          
14    14      2            2      4      7 NA          
15    15      1            5      5      7 NA          
16    16      3            7      3      2 NA          
17    17      3            1      3     18 NA          
18    18      4            7      2     18 NA          
19    19      4            3      2      3 NA          
20    20      3            7      1      8 NA

CodePudding user response:

We can use base R option with sub and grepl

v1 <- paste0("item_", sub("^[^0-9] (\\d ).*", "\\1", names(df_toy)[-1]))
names(df_toy)[-1] <- ifelse(grepl("other", names(df_toy)[-1]), paste0(v1, " other"), v1)

-output

> df_toy
   ID item_1 item_1 other item_2 item_4 item_4 other
1   1      2            1      1     18           NA
2   2      3            3      2      7           NA
3   3      3            4      1     18           NA
4   4      3            6      4      3           NA
5   5      1            7      4      4           NA
6   6      4            4      4      4           NA
7   7      3            1      4      3           NA
8   8      4            5      2     20           NA
9   9      2            3      1     10           NA
10 10      1            6      3     18           NA
11 11      2            2      4     17           NA
...

Or without ifelse

trimws(paste0("item_", trimws(gsub("^\\D |(\\d )\\s -\\s (other)", 
    "\\1 \\2", names(df_toy)[-1]), whitespace = "-.*|\\s ")))
[1] "item_1"       "item_1 other" "item_2"       "item_4"       "item_4 other"

Or slightly compact

library(stringr)
 str_c("item_", str_remove(str_extract(names(df_toy)[-1], 
    "\\d (.*other)?"), "-\\s "))
[1] "item_1"       "item_1 other" "item_2"       "item_4"       "item_4 other"
  • Related