Home > Back-end >  How do I make a conditional composite score based off a categorical variable in R?
How do I make a conditional composite score based off a categorical variable in R?

Time:10-11

Dataset

I have a dataset similar to this:

stack_tib <- structure(list(ID = 1:50, Phase = c("Wave_1", "Wave_2", "Wave_1", 
"Wave_2", "Wave_1", "Wave_2", "Wave_1", "Wave_2", "Wave_1", "Wave_2", 
"Wave_1", "Wave_2", "Wave_1", "Wave_2", "Wave_1", "Wave_2", "Wave_1", 
"Wave_2", "Wave_1", "Wave_2", "Wave_1", "Wave_2", "Wave_1", "Wave_2", 
"Wave_1", "Wave_2", "Wave_1", "Wave_2", "Wave_1", "Wave_2", "Wave_1", 
"Wave_2", "Wave_1", "Wave_2", "Wave_1", "Wave_2", "Wave_1", "Wave_2", 
"Wave_1", "Wave_2", "Wave_1", "Wave_2", "Wave_1", "Wave_2", "Wave_1", 
"Wave_2", "Wave_1", "Wave_2", "Wave_1", "Wave_2"), ABC1 = c(NA, 
45, NA, 51, NA, 84, NA, 25, NA, 41, NA, 57, NA, 52, NA, 86, NA, 
11, NA, 41, NA, 46, NA, 35, NA, 16, NA, 53, NA, 75, NA, 44, NA, 
68, NA, 64, NA, 49, NA, 42, NA, 46, NA, 93, NA, 28, NA, 41, NA, 
48), ABC2 = c(NA, 49, NA, 77, NA, 80, NA, 62, NA, 54, NA, 40, 
NA, 30, NA, 56, NA, 51, NA, 91, NA, 4, NA, 36, NA, 71, NA, 26, 
NA, 47, NA, 58, NA, 63, NA, 57, NA, 59, NA, 73, NA, 61, NA, 37, 
NA, 38, NA, 81, NA, 29), ABC3 = c(NA, 55, NA, 43, NA, 49, NA, 
17, NA, 68, NA, 62, NA, 49, NA, 56, NA, 37, NA, 30, NA, 31, NA, 
45, NA, 37, NA, 52, NA, 49, NA, 59, NA, 42, NA, 73, NA, 65, NA, 
21, NA, 45, NA, 20, NA, 39, NA, 64, NA, 24), RABC1 = c(66, NA, 
57, NA, 48, NA, 61, NA, 70, NA, 71, NA, 25, NA, 42, NA, 63, NA, 
60, NA, 46, NA, 49, NA, 35, NA, 51, NA, 59, NA, 29, NA, 43, NA, 
45, NA, 72, NA, 65, NA, 54, NA, 52, NA, 24, NA, 62, NA, 38, NA
), RABC2 = c(94, NA, 45, NA, 42, NA, 34, NA, 83, NA, 52, NA, 
75, NA, 30, NA, 41, NA, 25, NA, 39, NA, 72, NA, 43, NA, 36, NA, 
68, NA, 89, NA, 54, NA, 39, NA, 46, NA, 56, NA, 34, NA, 80, NA, 
46, NA, 48, NA, 37, NA), RABC3 = c(42, NA, 43, NA, 82, NA, 72, 
NA, 48, NA, 40, NA, 51, NA, 96, NA, 47, NA, 42, NA, 67, NA, 64, 
NA, 67, NA, 53, NA, 59, NA, 17, NA, 58, NA, 52, NA, 54, NA, 46, 
NA, 73, NA, 73, NA, 90, NA, 87, NA, 50, NA)), row.names = c(NA, 
-50L), class = c("tbl_df", "tbl", "data.frame"))

Problem

This is what my data looks like:

# A tibble: 50 × 8
      ID Phase   ABC1  ABC2  ABC3 RABC1 RABC2 RABC3
   <int> <chr>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1     1 Wave_1    NA    NA    NA    66    94    42
 2     2 Wave_2    45    49    55    NA    NA    NA
 3     3 Wave_1    NA    NA    NA    57    45    43
 4     4 Wave_2    51    77    43    NA    NA    NA
 5     5 Wave_1    NA    NA    NA    48    42    82
 6     6 Wave_2    84    80    49    NA    NA    NA
 7     7 Wave_1    NA    NA    NA    61    34    72
 8     8 Wave_2    25    62    17    NA    NA    NA
 9     9 Wave_1    NA    NA    NA    70    83    48
10    10 Wave_2    41    54    68    NA    NA    NA
# … with 40 more rows
# ℹ Use `print(n = ...)` to see more rows

You can see here that there are 3 items for ABC, 3 items for RABC. However you may also notice that the ABC values are NA for Wave 1 and the RABC values are NA for Wave 2. It is not this uniform, but basically those who were tested at different times were given slightly different scales. As such, I need to find a way to run a composite for the specific ID of each person based on which version it is (basically which Wave in this case). So ideally it would look something like this:

# A tibble: 50 × 9
      ID Phase   ABC1  ABC2  ABC3 RABC1 RABC2 RABC3 Sum_ABC
   <int> <chr>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
 1     1 Wave_1    NA    NA    NA    66    94    42     149
 2     2 Wave_2    45    49    55    NA    NA    NA     138
 3     3 Wave_1    NA    NA    NA    57    45    43     144
 4     4 Wave_2    51    77    43    NA    NA    NA     150
 5     5 Wave_1    NA    NA    NA    48    42    82     157
 6     6 Wave_2    84    80    49    NA    NA    NA     133
 7     7 Wave_1    NA    NA    NA    61    34    72     147
 8     8 Wave_2    25    62    17    NA    NA    NA     158
 9     9 Wave_1    NA    NA    NA    70    83    48     145
10    10 Wave_2    41    54    68    NA    NA    NA     152

CodePudding user response:

What about this?

stack_tib %>% 
  mutate(Sum_ABC = rowSums(.[-c(1, 2)], na.rm = T))

#More flexible:
stack_tib %>% 
  mutate(Sum_ABC = rowSums(select(., where(is.numeric)), na.rm = T))

#Select if contains 'ABC':
stack_tib %>% 
  mutate(Sum_ABC = rowSums(select(., contains("ABC")), na.rm = T))

output

# A tibble: 50 × 9
      ID Phase   ABC1  ABC2  ABC3 RABC1 RABC2 RABC3 Sum_ABC
   <int> <chr>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
 1     1 Wave_1    NA    NA    NA    66    94    42     202
 2     2 Wave_2    45    49    55    NA    NA    NA     149
 3     3 Wave_1    NA    NA    NA    57    45    43     145
 4     4 Wave_2    51    77    43    NA    NA    NA     171
 5     5 Wave_1    NA    NA    NA    48    42    82     172
 6     6 Wave_2    84    80    49    NA    NA    NA     213
 7     7 Wave_1    NA    NA    NA    61    34    72     167
 8     8 Wave_2    25    62    17    NA    NA    NA     104
 9     9 Wave_1    NA    NA    NA    70    83    48     201
10    10 Wave_2    41    54    68    NA    NA    NA     163
# … with 40 more rows
  • Related