Home > Back-end >  How to manipulate multi-index data in R?
How to manipulate multi-index data in R?

Time:12-02

I have a multi-index data set with 100 cases, and each case has 5 questions. Each question was scored by 2 raters.

case  question  rater1  rater2
1     1         1       1
1     2         1       0 
1     3         1       1
1     4         1       1
1     5         0       0
2     1         0       1
2     2         1       1 
2     3         1       1
2     4         1       0
2     5         0       0
3     1         0       0
3     2         1       0 
3     3         1       1
3     4         1       1
3     5         0       1
...

I want to sum question 1, 2, 3 in each case as 6; question 4, 5 in each case as 7; question 1~5 in each case as 8. Then insert the value at the end of each case, such as

case  question  rater1  rater2
1     1         1       1
1     2         1       0 
1     3         1       1
1     4         1       1
1     5         0       0
1     6         3       2
1     7         1       1
1     8         4       3
2     1         0       1
2     2         1       1 
2     3         1       1
2     4         1       0
2     5         0       0
2     6         2       3
2     7         1       0
2     8         3       3
3     1         0       0
3     2         1       0 
3     3         1       1
3     4         1       1
3     5         0       1
3     6         2       1
3     7         1       2
3     8         3       3
...

I am unsure how to achieve it.

CodePudding user response:

Or, writing a function to do the grunt work and provide a degree of robustness and generality:

library(tidyverse)

addSummaryRow <- function(data, qFilter, newIndex) {
  data %>%
    bind_rows(
      data %>% 
        pivot_longer(starts_with("rater")) %>% 
        filter(question %in% qFilter) %>% 
        group_by(case, name) %>% 
        summarise(value=sum(value), .groups="drop") %>% 
        pivot_wider(id_cols=c(case), names_from=name, values_from=value) %>% 
        mutate(question=newIndex)
    ) %>%
    arrange(case, question)
}

d %>% 
  addSummaryRow(1:3, 6) %>% 
  addSummaryRow(4:5, 7) %>% 
  addSummaryRow(1:5, 8)
# A tibble: 24 × 4
    case question rater1 rater2
   <int>    <dbl>  <int>  <int>
 1     1        1      1      1
 2     1        2      1      0
 3     1        3      1      1
 4     1        4      1      1
 5     1        5      0      0
 6     1        6      3      2
 7     1        7      1      1
 8     1        8      4      3
 9     2        1      0      1
10     2        2      1      1
# … with 14 more rows

CodePudding user response:

This is a bit pedestrian, but it gets the job done:

library(dplyr)

df %>%
  group_by(case) %>%
  summarize(rater1 = c(rater1, 
                       sum(rater1[question %in% 1:3]),
                       sum(rater1[question %in% 4:5]),
                       sum(rater1)),
            rater2 = c(rater2, 
                       sum(rater2[question %in% 1:3]),
                       sum(rater2[question %in% 4:5]),
                       sum(rater2)),
            question = 1:8, .groups = 'drop') %>%
  select(1, 4, 2, 3) %>%
  as.data.frame()
#>    case question rater1 rater2
#> 1     1        1      1      1
#> 2     1        2      1      0
#> 3     1        3      1      1
#> 4     1        4      1      1
#> 5     1        5      0      0
#> 6     1        6      3      2
#> 7     1        7      1      1
#> 8     1        8      4      3
#> 9     2        1      0      1
#> 10    2        2      1      1
#> 11    2        3      1      1
#> 12    2        4      1      0
#> 13    2        5      0      0
#> 14    2        6      2      3
#> 15    2        7      1      0
#> 16    2        8      3      3
#> 17    3        1      0      0
#> 18    3        2      1      0
#> 19    3        3      1      1
#> 20    3        4      1      1
#> 21    3        5      0      1
#> 22    3        6      2      1
#> 23    3        7      1      2
#> 24    3        8      3      3
  • Related