I need to categorize information from column names and restructure a dataset. Here is how my sample dataset looks like:
df <- data.frame(id = c(111,112,113),
Demo_1_Color_Naming = c("Text1","Text1","Text1"),
Demo_1.Errors =c(0,1,2),
Item_1_Color_Naming = c("Text1","Text1","Text1"),
Item_1.Time_in_Seconds =c(10,NA, 44),
Item_1.Errors = c(0,1,NA),
Demo_2_Shape_Naming = c("Text1","Text1","Text1"),
Demo_2.Errors =c(4,1,5),
Item_2_Shape_Naming = c("Text1","Text1","Text1"),
Item_2.Time_in_Seconds =c(55,35, 22),
Item_2.Errors = c(5,2,NA))
> df
id Demo_1_Color_Naming Demo_1.Errors Item_1_Color_Naming Item_1.Time_in_Seconds Item_1.Errors Demo_2_Shape_Naming Demo_2.Errors
1 111 Text1 0 Text1 10 0 Text1 4
2 112 Text1 1 Text1 NA 1 Text1 1
3 113 Text1 2 Text1 44 NA Text1 5
Item_2_Shape_Naming Item_2.Time_in_Seconds Item_2.Errors
1 Text1 55 5
2 Text1 35 2
3 Text1 22 NA
The columns are grouped by the numbers 1,2,3,..
. Each number represesents a grouping name. For example number 1
in this dataset represents Color
grouping where number 2
represents Shape
grouping. I would like to keep Time_in_seconds
info and Errors
info. Then I need to sum both time and errors.
Additionally, this dataset is only limited to two grouping. The bigger dataset has more than 2 grouping. I need to handle this for a multi group/column.
How can I achieve this below:
> df1
id ColorTime ShapeTime ColorError ShapeError TotalTime TotalError
1 111 10 55 0 5 65 5
2 112 NA 35 1 2 35 3
3 113 44 22 NA NA 66 NA
CodePudding user response:
We may do
cbind(df['id'], do.call(cbind, lapply(setNames(c("Time_in_Seconds",
"Item.*Errors"), c("Time_in_Seconds", "Errors")), \(x) {
tmp <- df[grep(x, names(df), value = TRUE)]
out <- setNames(as.data.frame(sapply(split.default(tmp,
gsub("\\D ", "", names(tmp))), rowSums, na.rm = TRUE)), c("Color", "Shape"))
transform(out, Total = rowSums(out))
})))
-output
id Time_in_Seconds.Color Time_in_Seconds.Shape Time_in_Seconds.Total Errors.Color Errors.Shape Errors.Total
1 111 10 55 65 0 5 5
2 112 0 35 35 1 2 3
3 113 44 22 66 0 0 0