How to filter across multiples rows within the same item in R-CodePudding

I have data with the following structure:

Participant	Item	Region	Observation	RT
p1	Item 1	Region 1	1	3300
p1	Item 1	Region 2	0	NA
p1	Item 1	Region 3	0	NA
p1	Item 2	Region 1	1	5000
p1	Item 2	Region 2	0	NA
p1	Item 2	Region 3	1	7000

I have a set of participants responding to a range of items. For each item, there are three rows per participant, each row corresponding to a different region of the item. Observation denotes whether the participant looked at that region (0- no, 1- yes) I want to conduct an analysis looking at RT for Region 3 only in cases where there is an observation for Region 1 and Region 3, but no observation for Region 2. So in the example above Region 3 for item 1 would not be included, but Region 3 for item 2 would be included.

All of my attempts so far have not managed to successfully account for the values across item. I can create a code for whether each Region has an observation, but then once I take only Region 3 for analysis, I can't account for the value in Region 1 and 2 for each item (I end up with no valid observations).

I have tried grouping the data by item, but I really don't know where to go from there.

My data:

structure(list(Participant = c("p1", "p1", "p1", "p1", "p1", 
"p1"), Item = c("Item 1", "Item 1", "Item 1", "Item 2", "Item 2", 
"Item 2"), Region = c("Region 1", "Region 2", "Region 3", 
"Region 1", "Region 2", "Region 3"), Observation = c(1, 0, 
0, 1, 0, 1), RT = c(3300L, NA, NA, 5000L, NA, 7000L)), class = "data.frame", row.names = c(NA, 
-6L))

CodePudding user response：

You could remove NA's (or filter for zeros or ones) and filter per group:

library(dplyr)

df |>
  na.omit() |>
  group_by(Item) |>
  filter(all(c("Region1", "Region3") %in% Region),
         Region == "Region3") |>
  ungroup()

Output:

# A tibble: 1 × 5
  Participant Item  Region  Observation    RT
  <chr>       <chr> <chr>         <dbl> <dbl>
2 p1          Item2 Region3           1  7000

Data:

library(readr)

df <- read_table("Participant   Item    Region  Observation RT
p1  Item1   Region1 1   3300
p1  Item1   Region2 0   NA
p1  Item1   Region3 0   NA
p1  Item3   Region1 0   NA
p1  Item3   Region2 0   NA
p1  Item3   Region3 1   3300
p1  Item2   Region1 1   5000
p1  Item2   Region2 0   NA
p1  Item2   Region3 1   7000")

CodePudding user response：

An alternative version without changing the NAs to 0:

df <- structure(list(Participant = c("p1", "p1", "p1", "p1", "p1", 
                               "p1"), Item = c("Item 1", "Item 1", "Item 1", "Item 2", "Item 2", 
                                                "Item 2"), Region = c("Region 1", "Region 2", "Region 3", 
                                                                      "Region 1", "Region 2", "Region 3"), Observation = c(1, 0, 
                                                                                                                           0, 1, 0, 1), RT = c(3300L, NA, NA, 5000L, NA, 7000L)), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                                      -6L))


df |>
  # subset only with needed data
  dplyr::select(-RT) |>
  tidyr::pivot_wider(names_from = Region, 
                     values_from = Observation) |>
  # filter
  dplyr::filter(`Region 1` == 1 & `Region 2` == 0 & `Region 3` == 1) |>
  dplyr::select(Participant, Item, `Region 3`) |>
  # back to original format
  tidyr::pivot_longer(`Region 3`, names_to = "Region", values_to = "Observation") |>
  dplyr::left_join(df)
#> Joining, by = c("Participant", "Item", "Region", "Observation")
#> # A tibble: 1 × 5
#>   Participant Item   Region   Observation    RT
#>   <chr>       <chr>  <chr>          <dbl> <int>
#> 1 p1          Item 2 Region 3           1  7000

^{Created on 2022-12-16 with reprex v2.0.2}