I have a multi-index data set with 100 cases, and each case has 5 questions. Each question was scored by 2 raters.
case question rater1 rater2
1 1 1 1
1 2 1 0
1 3 1 1
1 4 1 1
1 5 0 0
2 1 0 1
2 2 1 1
2 3 1 1
2 4 1 0
2 5 0 0
3 1 0 0
3 2 1 0
3 3 1 1
3 4 1 1
3 5 0 1
...
I want to sum question 1, 2, 3 in each case as A, and question 4, 5 in each case as B. Then insert the value at the end of each case, such as
case question rater1 rater2
1 1 1 1
1 2 1 0
1 3 1 1
1 4 1 1
1 5 0 0
1 A 3 2
1 B 1 1
2 1 0 1
2 2 1 1
2 3 1 1
2 4 1 0
2 5 0 0
2 A 2 3
2 B 1 0
3 1 0 0
3 2 1 0
3 3 1 1
3 4 1 1
3 5 0 1
3 A 2 1
3 B 1 2
...
I am unsure how to achieve it.
CodePudding user response:
You could summarize the data, and then bind it back to the original data and resort it. For example
library(dplyr)
dd %>%
group_by(case, grp = case_when(question %in% 1:3~"A", question %in% 4:5 ~ "B")) %>%
summarize(across(-question, sum)) %>%
ungroup() %>%
rename(question = grp) %>%
bind_rows(mutate(dd, question = as.character(question))) %>%
arrange(case, question)
CodePudding user response:
With data.table
library(data.table)
dt[
,.(
question = c(question, "A", "B"),
rater1 = c(rater1, sum(rater1[1:3]), sum(rater1[4:5])),
rater2 = c(rater2, sum(rater2[1:3]), sum(rater2[4:5]))
), case
][1:15]
#> case question rater1 rater2
#> 1: 1 1 1 0
#> 2: 1 2 1 1
#> 3: 1 3 0 0
#> 4: 1 4 0 0
#> 5: 1 5 0 1
#> 6: 1 A 2 1
#> 7: 1 B 0 1
#> 8: 2 1 0 0
#> 9: 2 2 0 1
#> 10: 2 3 0 1
#> 11: 2 4 1 1
#> 12: 2 5 0 0
#> 13: 2 A 0 2
#> 14: 2 B 1 1
#> 15: 3 1 0 0
Data
dt <- data.table(
case = rep(1:100, each = 5),
question = rep(1:5, 100),
rater1 = sample(0:1, 500, 1),
rater2 = sample(0:1, 500, 1)
)