Home > Software engineering >  Creating new column names using dplyr across and .names
Creating new column names using dplyr across and .names

Time:10-14

I have the following data frame:

df <- data.frame(A_TR1=sample(10:20, 8, replace = TRUE),A_TR2=seq(2, 16, by=2), A_TR3=seq(1, 16, by=2),
                 B_TR1=seq(1, 16, by=2),B_TR2=seq(2, 16, by=2), B_TR3=seq(1, 16, by=2))
> df
  A_TR1 A_TR2 A_TR3 B_TR1 B_TR2 B_TR3
1    11     2     1     1     2     1
2    12     4     3     3     4     3
3    18     6     5     5     6     5
4    11     8     7     7     8     7
5    17    10     9     9    10     9
6    17    12    11    11    12    11
7    14    14    13    13    14    13
8    11    16    15    15    16    15

What I would like to do, is subtract B_TR1 from A_TR1, B_TR2 from A_TR2, and so on and create new columns from these, similar to below:

df$x_TR1 <- (df$A_TR1 - df$B_TR1)
df$x_TR2 <- (df$A_TR2 - df$B_TR2)
df$x_TR3 <- (df$A_TR3 - df$B_TR3)
> df
  A_TR1 A_TR2 A_TR3 B_TR1 B_TR2 B_TR3 x_TR1 x_TR2 x_TR3
1    12     2     1     1     2     1    11     0     0
2    11     4     3     3     4     3     8     0     0
3    19     6     5     5     6     5    14     0     0
4    13     8     7     7     8     7     6     0     0
5    12    10     9     9    10     9     3     0     0
6    16    12    11    11    12    11     5     0     0
7    16    14    13    13    14    13     3     0     0
8    18    16    15    15    16    15     3     0     0

I would like to name these columns "x TR1", "x TR2", etc. I tried to do the following:

xdf <- df%>%mutate(across(starts_with("A_TR"), -across(starts_with("B_TR")), .names="x TR{.col}"))

However, I get an error in mutate():

attempt to select less than one element in integerOneIndex

I also don't know how to create the proper column names, in terms of getting the numbers right -- I am not even sure the glue() syntax allows for it. Any help appreciated here.

CodePudding user response:

We could use .names in the first across to replace the substring 'a' with 'x' from the column names (.col) while subtracting from the second set of columns

library(dplyr)
library(stringr)
df <- df  %>% 
   mutate(across(starts_with("A_TR"), 
     .names = "{str_replace(.col, 'A', 'x')}") - 
       across(starts_with("B_TR")))

-output

df
   A_TR1 A_TR2 A_TR3 B_TR1 B_TR2 B_TR3 x_TR1 x_TR2 x_TR3
1    10     2     1     1     2     1     9     0     0
2    10     4     3     3     4     3     7     0     0
3    16     6     5     5     6     5    11     0     0
4    12     8     7     7     8     7     5     0     0
5    20    10     9     9    10     9    11     0     0
6    19    12    11    11    12    11     8     0     0
7    17    14    13    13    14    13     4     0     0
8    14    16    15    15    16    15    -1     0     0
  • Related