Home > Enterprise >  How to cope with multi-index data in R?
How to cope with multi-index data in R?

Time:12-02

I have a multi-index data set with 100 cases, and each case has 5 questions. Each question was scored by 2 raters.

case  question  rater1  rater2
1     1         1       1
1     2         1       0 
1     3         1       1
1     4         1       1
1     5         0       0
2     1         0       1
2     2         1       1 
2     3         1       1
2     4         1       0
2     5         0       0
3     1         0       0
3     2         1       0 
3     3         1       1
3     4         1       1
3     5         0       1
...

I want to sum question 1, 2, 3 in each case as A, and question 4, 5 in each case as B. Then insert the value at the end of each case, such as

case  question  rater1  rater2
1     1         1       1
1     2         1       0 
1     3         1       1
1     4         1       1
1     5         0       0
1     A         3       2
1     B         1       1
2     1         0       1
2     2         1       1 
2     3         1       1
2     4         1       0
2     5         0       0
2     A         2       3
2     B         1       0
3     1         0       0
3     2         1       0 
3     3         1       1
3     4         1       1
3     5         0       1
3     A         2       1
3     B         1       2
...

I am unsure how to achieve it.

CodePudding user response:

You could summarize the data, and then bind it back to the original data and resort it. For example

library(dplyr)
dd %>% 
  group_by(case, grp = case_when(question %in% 1:3~"A", question %in% 4:5 ~ "B")) %>% 
  summarize(across(-question, sum)) %>% 
  ungroup() %>% 
  rename(question = grp) %>% 
  bind_rows(mutate(dd, question = as.character(question))) %>% 
  arrange(case, question)

CodePudding user response:

With data.table

library(data.table)

dt[
  ,.(
    question = c(question, "A", "B"),
    rater1 = c(rater1, sum(rater1[1:3]), sum(rater1[4:5])),
    rater2 = c(rater2, sum(rater2[1:3]), sum(rater2[4:5]))
  ), case
][1:15]
#>     case question rater1 rater2
#>  1:    1        1      1      0
#>  2:    1        2      1      1
#>  3:    1        3      0      0
#>  4:    1        4      0      0
#>  5:    1        5      0      1
#>  6:    1        A      2      1
#>  7:    1        B      0      1
#>  8:    2        1      0      0
#>  9:    2        2      0      1
#> 10:    2        3      0      1
#> 11:    2        4      1      1
#> 12:    2        5      0      0
#> 13:    2        A      0      2
#> 14:    2        B      1      1
#> 15:    3        1      0      0

Data

dt <- data.table(
  case = rep(1:100, each = 5),
  question = rep(1:5, 100),
  rater1 = sample(0:1, 500, 1),
  rater2 = sample(0:1, 500, 1)
)
  • Related