I have data with the following structure:
Participant | Item | Region | Observation | RT |
---|---|---|---|---|
p1 | Item 1 | Region 1 | 1 | 3300 |
p1 | Item 1 | Region 2 | 0 | NA |
p1 | Item 1 | Region 3 | 0 | NA |
p1 | Item 2 | Region 1 | 1 | 5000 |
p1 | Item 2 | Region 2 | 0 | NA |
p1 | Item 2 | Region 3 | 1 | 7000 |
I have a set of participants responding to a range of items. For each item, there are three rows per participant, each row corresponding to a different region of the item. Observation denotes whether the participant looked at that region (0- no, 1- yes) I want to conduct an analysis looking at RT for Region 3 only in cases where there is an observation for Region 1 and Region 3, but no observation for Region 2. So in the example above Region 3 for item 1 would not be included, but Region 3 for item 2 would be included.
All of my attempts so far have not managed to successfully account for the values across item. I can create a code for whether each Region has an observation, but then once I take only Region 3 for analysis, I can't account for the value in Region 1 and 2 for each item (I end up with no valid observations).
I have tried grouping the data by item, but I really don't know where to go from there.
My data:
structure(list(Participant = c("p1", "p1", "p1", "p1", "p1",
"p1"), Item = c("Item 1", "Item 1", "Item 1", "Item 2", "Item 2",
"Item 2"), Region = c("Region 1", "Region 2", "Region 3",
"Region 1", "Region 2", "Region 3"), Observation = c(1, 0,
0, 1, 0, 1), RT = c(3300L, NA, NA, 5000L, NA, 7000L)), class = "data.frame", row.names = c(NA,
-6L))
CodePudding user response:
You could remove NA's (or filter for zeros or ones) and filter per group:
library(dplyr)
df |>
na.omit() |>
group_by(Item) |>
filter(all(c("Region1", "Region3") %in% Region),
Region == "Region3") |>
ungroup()
Output:
# A tibble: 1 × 5
Participant Item Region Observation RT
<chr> <chr> <chr> <dbl> <dbl>
2 p1 Item2 Region3 1 7000
Data:
library(readr)
df <- read_table("Participant Item Region Observation RT
p1 Item1 Region1 1 3300
p1 Item1 Region2 0 NA
p1 Item1 Region3 0 NA
p1 Item3 Region1 0 NA
p1 Item3 Region2 0 NA
p1 Item3 Region3 1 3300
p1 Item2 Region1 1 5000
p1 Item2 Region2 0 NA
p1 Item2 Region3 1 7000")
CodePudding user response:
An alternative version without changing the NAs to 0:
df <- structure(list(Participant = c("p1", "p1", "p1", "p1", "p1",
"p1"), Item = c("Item 1", "Item 1", "Item 1", "Item 2", "Item 2",
"Item 2"), Region = c("Region 1", "Region 2", "Region 3",
"Region 1", "Region 2", "Region 3"), Observation = c(1, 0,
0, 1, 0, 1), RT = c(3300L, NA, NA, 5000L, NA, 7000L)), class = "data.frame", row.names = c(NA,
-6L))
df |>
# subset only with needed data
dplyr::select(-RT) |>
tidyr::pivot_wider(names_from = Region,
values_from = Observation) |>
# filter
dplyr::filter(`Region 1` == 1 & `Region 2` == 0 & `Region 3` == 1) |>
dplyr::select(Participant, Item, `Region 3`) |>
# back to original format
tidyr::pivot_longer(`Region 3`, names_to = "Region", values_to = "Observation") |>
dplyr::left_join(df)
#> Joining, by = c("Participant", "Item", "Region", "Observation")
#> # A tibble: 1 × 5
#> Participant Item Region Observation RT
#> <chr> <chr> <chr> <dbl> <int>
#> 1 p1 Item 2 Region 3 1 7000
Created on 2022-12-16 with reprex v2.0.2