Home > front end >  Extract a column's name to create a new column name
Extract a column's name to create a new column name

Time:05-11

I have the following simplified dataframe.

df <- data.frame("Task1_AI1" = 1:5, "Task1_AI2" = 6:10, "Task2_AI1" = 1:5, "Task2_AI2"= 6:10)
df

And it looks like this:

Task1_AI1 Task1_AI2 Task2_AI1 Task2_AI2
1 6 1 6
2 7 2 7
3 8 3 8
4 9 4 9
5 10 5 10

I want to write a function that does the following:

  1. sums up values of each task pair, so that Task1_AI1 Task1_AI2 and Task2_AI1 Task2_AI2
  2. save each pair's sum in a new column
  3. the name of a new column will contain the first 5 letters of the corresponding vector. The name should be "Task1" plus "_sum" or "Task2" plus "_sum".

The new dataframe would look like this:

Task1_AI1 Task1_AI2 Task2_AI1 Task2_AI2 Task1_sum Task2_sum
1 6 1 6 7 7
2 7 2 7 9 9
3 8 3 8 11 11
4 9 4 9 13 13
5 10 5 10 15 15

Below function is what I have and it achieves my goals. I want to know how I can improve the function.

third_function <- function (df, x, y) {
  df[[paste(str_sub(colnames(df[x]), 1, 5), "_sum", sep='')]] <- df[[x]]   df[[y]]
  df
}

df <- third_function(df, "Task1_AI1", "Task1_AI2")

I would really appreciate you guidance!

CodePudding user response:

This is not a function, but should get your desired result.

library(tidyverse)

df2 <- df %>%
  mutate(Task1_sum= Task1_AI1   Task1_AI2,
         Task2_sum= Task2_AI1   Task1_AI2)

CodePudding user response:

With the help of data.table:

library(data.table) # load package

sumEquals = function(df){
    a = copy(df)  # copy because data.table works on the object!
    
    # Get the unique names of tasks (I assume there could be more than 2!)
    uniqueNames = unique(substr(names(a), start = 1, stop = 5))
    
    # for each of those unique names do: 
    for (i in uniqueNames){

        # sum row-wise the columns with alike names and assign that to a
        # new column that has the same prefix and the suffix is "_sum"
        a[, paste0(i, "_sum") := apply(.SD, 1, sum), .SDcols = grep(i, names(a), value = TRUE)]
    }

    # return the modified data.frame / data.table
    return(a)
}

Check if the function works:

b = sumEquals(df)

b
   Task1_AI1 Task1_AI2 Task2_AI1 Task2_AI2 Task1_sum Task2_sum
1:         1         6         1         6         7         7
2:         2         7         2         7         9         9
3:         3         8         3         8        11        11
4:         4         9         4         9        13        13
5:         5        10         5        10        15        15
  • Related