I have the following simplified dataframe.
df <- data.frame("Task1_AI1" = 1:5, "Task1_AI2" = 6:10, "Task2_AI1" = 1:5, "Task2_AI2"= 6:10)
df
And it looks like this:
Task1_AI1 | Task1_AI2 | Task2_AI1 | Task2_AI2 |
---|---|---|---|
1 | 6 | 1 | 6 |
2 | 7 | 2 | 7 |
3 | 8 | 3 | 8 |
4 | 9 | 4 | 9 |
5 | 10 | 5 | 10 |
I want to write a function that does the following:
- sums up values of each task pair, so that Task1_AI1 Task1_AI2 and Task2_AI1 Task2_AI2
- save each pair's sum in a new column
- the name of a new column will contain the first 5 letters of the corresponding vector. The name should be "Task1" plus "_sum" or "Task2" plus "_sum".
The new dataframe would look like this:
Task1_AI1 | Task1_AI2 | Task2_AI1 | Task2_AI2 | Task1_sum | Task2_sum |
---|---|---|---|---|---|
1 | 6 | 1 | 6 | 7 | 7 |
2 | 7 | 2 | 7 | 9 | 9 |
3 | 8 | 3 | 8 | 11 | 11 |
4 | 9 | 4 | 9 | 13 | 13 |
5 | 10 | 5 | 10 | 15 | 15 |
Below function is what I have and it achieves my goals. I want to know how I can improve the function.
third_function <- function (df, x, y) {
df[[paste(str_sub(colnames(df[x]), 1, 5), "_sum", sep='')]] <- df[[x]] df[[y]]
df
}
df <- third_function(df, "Task1_AI1", "Task1_AI2")
I would really appreciate you guidance!
CodePudding user response:
This is not a function, but should get your desired result.
library(tidyverse)
df2 <- df %>%
mutate(Task1_sum= Task1_AI1 Task1_AI2,
Task2_sum= Task2_AI1 Task1_AI2)
CodePudding user response:
With the help of data.table:
library(data.table) # load package
sumEquals = function(df){
a = copy(df) # copy because data.table works on the object!
# Get the unique names of tasks (I assume there could be more than 2!)
uniqueNames = unique(substr(names(a), start = 1, stop = 5))
# for each of those unique names do:
for (i in uniqueNames){
# sum row-wise the columns with alike names and assign that to a
# new column that has the same prefix and the suffix is "_sum"
a[, paste0(i, "_sum") := apply(.SD, 1, sum), .SDcols = grep(i, names(a), value = TRUE)]
}
# return the modified data.frame / data.table
return(a)
}
Check if the function works:
b = sumEquals(df)
b
Task1_AI1 Task1_AI2 Task2_AI1 Task2_AI2 Task1_sum Task2_sum
1: 1 6 1 6 7 7
2: 2 7 2 7 9 9
3: 3 8 3 8 11 11
4: 4 9 4 9 13 13
5: 5 10 5 10 15 15