Home > other >  Subsetting a data frame based on a minimum number of responses per group of measures
Subsetting a data frame based on a minimum number of responses per group of measures

Time:06-06

I hope someone can help me with this query. I have a large data set and am going to run analyses on a set of participants, provided they meet certain criteria. In this case, the criterion is that each participant provided at least 1 answer to Measure 1 items AND at least 1 answer to Measure 2 items (there are three items for Measure 1 and three items for Measure 2). As such, if they provide three answers to all Measure 1 items but none to Measure 2 items, they are removed from the data set. Same thing if they provide two answers to one of the measures but No answer to items belonging to the other measure. Consider the example below:

df <- data.frame(tester_ID = c("A1", "A2", "A3", "A4", "A5", "A6",
                               "A7", "A1", "A2", "A3", "A4", "A5", "A6", "A7"),
                 Phase = c("Phase1", "Phase1", "Phase1", "Phase1", "Phase1",
                           "Phase1", "Phase1", "Phase2", 
                           "Phase2", "Phase2", "Phase2", "Phase2", "Phase2", 
                           "Phase2"),
                 Item1Measure1 = c(5, NA, 3, 4, 4, 1, 4, 4, 5, NA, NA, NA, NA, NA),
                 Item2Measure1 = c(5, 3, NA, NA, 4, 1, NA, 4, 5, NA, NA, 3, NA, 1),
                 Item3Measure1 = c(NA, NA, NA, NA, 4, 1, NA, 4, 5, 1, 3, 5, NA, NA),
                 Item1Measure2 = c(NA, NA, NA, NA, NA, 1, NA, 4, 5, NA,NA, NA,NA,NA),
                 Item2Measure2 = c(5, NA, NA, 4, 4, 1, 4, NA, 5, 2, 4, 1, 2, 4),
                 Item3Measure2 = c(5, NA, 3, 4, 4, 1, 4, NA, 5, NA, NA, NA, NA, NA))

Created on 2022-06-05 by the reprex package (v2.0.1)

I am hoping create a condition whereby only participants that provided AT LEAST one answer to a Measure1 item AND AT LEAST one answer to a Measure2 item are considered. For instance, the Tester_ID named A2, in Phase one, did not reply to any items for Measure 2, so that tester would be excluded in the new data set. The same applies to Tester_ID A6, in Phase 2, as that tester only provided answers to Measure 2 items but none to Measure 1 items. The remaining 12 rows would meet the criterion of at least one answer per Measure.

Any help would be greatly appreciated.

CodePudding user response:

We may use if_any - loop over the 'Measure1', columns, check for non-NA elements (complete.cases) and (&) loop separately over the 'Measure2', do the same, both of the conditions return a single TRUE/FALSE with if_any, which will be TRUE only if both are TRUE i.e. if there is at least one non-NA in both sets of columns

library(dplyr)
df %>% 
  filter(if_any(ends_with('Measure1'), complete.cases ) & 
         if_any(ends_with('Measure2'), complete.cases))

-output

 tester_ID  Phase Item1Measure1 Item2Measure1 Item3Measure1 Item1Measure2 Item2Measure2 Item3Measure2
1         A1 Phase1             5             5            NA            NA             5             5
2         A3 Phase1             3            NA            NA            NA            NA             3
3         A4 Phase1             4            NA            NA            NA             4             4
4         A5 Phase1             4             4             4            NA             4             4
5         A6 Phase1             1             1             1             1             1             1
6         A7 Phase1             4            NA            NA            NA             4             4
7         A1 Phase2             4             4             4             4            NA            NA
8         A2 Phase2             5             5             5             5             5             5
9         A3 Phase2            NA            NA             1            NA             2            NA
10        A4 Phase2            NA            NA             3            NA             4            NA
11        A5 Phase2            NA             3             5            NA             1            NA
12        A7 Phase2            NA             1            NA            NA             4            NA
  • Related