Home > front end >  How to use string values in one column to ID and select rows from another column in the same datafra
How to use string values in one column to ID and select rows from another column in the same datafra

Time:05-11

I have a data frame that contains all possible order permutations that participants could be presented with, and a column that contains a permutation ID (i.e. presentation order) that each individual participant actually was presented with.

How do I use the permutation ID to select the values of the corresponding column for each participant?

To illustrate with dummy data, in this example, the ID for the presentation order each participant saw is given in column presentation_type. I would like to use that ID to select the values from the corresponding order* columns.

library(dplyr)

df1 <- tibble::tribble(
  ~participant_id, ~presentation_type, ~trial_number, ~response, ~order1, ~order2, ~order3,
             "p1",           "order3",            1L,     "yes",     "a",     "b",     "c",
             "p1",           "order3",            2L,     "yes",     "b",     "c",     "a",
             "p1",           "order3",            3L,      "no",     "c",     "a",     "b",
             "p2",           "order1",            1L,      "no",     "a",     "b",     "c",
             "p2",           "order1",            2L,     "yes",     "b",     "c",     "a",
             "p2",           "order1",            3L,      "no",     "c",     "a",     "b",
             "p3",           "order2",            1L,      "no",     "a",     "b",     "c",
             "p3",           "order2",            2L,     "yes",     "b",     "c",     "a",
             "p3",           "order2",            3L,     "yes",     "c",     "a",     "b"
  )

In other words, the desired outcome is a data fame with a column that contains the actual stimuli each participant saw (stimulus_presented), like this:

desired_outcome <- tibble::tribble(
                     ~participant_id, ~presentation_type, ~trial_number, ~response, ~stimulus_presented,
                                "p1",           "order3",            1L,     "yes",              "c",
                                "p1",           "order3",            2L,     "yes",              "a",
                                "p1",           "order3",            3L,      "no",              "b",
                                "p2",           "order1",            1L,      "no",              "a",
                                "p2",           "order1",            2L,     "yes",              "b",
                                "p2",           "order1",            3L,      "no",              "c",
                                "p3",           "order2",            1L,      "no",              "b",
                                "p3",           "order2",            2L,     "yes",              "c",
                                "p3",           "order2",            3L,     "yes",              "a"
                     )

I thought I could get there with something like the code below, but it just seems to assign the order for participant_id "p1" to all participants. Do I need to somehow group by/map for individual participants?

## Attempt so far - incorrect
# make single column of the stimulus presented on each row
stimuli_presented <- df1 %>% 
 select(stimulus_presented = .$presentation_type[1])

# bind our newly created column back onto the original data frame, then remove "order1", "order2", etc. 
df2 <- bind_cols(df1, stimuli_presented) %>% 
  relocate(stimulus_presented, .after = response) %>% 
  select(-c(starts_with("order")))

CodePudding user response:

With dplyr, you can use rowwise and get:

library(dplyr)
df1 %>% 
  rowwise() %>% 
  mutate(stimulus_presented = get(presentation_type)) %>% 
  select(-starts_with("order"))

# A tibble: 9 × 5
# Rowwise: 
  participant_id presentation_type trial_number response stimulus_presented
  <chr>          <chr>                    <int> <chr>    <chr>             
1 p1             order3                       1 yes      c                 
2 p1             order3                       2 yes      a                 
3 p1             order3                       3 no       b                 
4 p2             order1                       1 no       a                 
5 p2             order1                       2 yes      b                 
6 p2             order1                       3 no       c                 
7 p3             order2                       1 no       b                 
8 p3             order2                       2 yes      c                 
9 p3             order2                       3 yes      a                 

With base R:

diag(as.matrix(df1[match(df1$presentation_type, colnames(df1))]))
#or
unlist(sapply(seq(nrow(df1)), \(x) df1[x, match(df1$presentation_type[x], colnames(df1))]))

#[1] "c" "a" "b" "a" "b" "c" "b" "c" "a"

CodePudding user response:

You can pivot the data into a "long" format, then filter for Order that matched the presentation_type.

library(tidyverse)

df1 %>% 
  pivot_longer(starts_with("order"), names_to = "Order", values_to = "stimulus_presented") %>% 
  filter(presentation_type == Order) %>% 
  select(-Order)

# A tibble: 9 x 5
  participant_id presentation_type trial_number response stimulus_presented
  <chr>          <chr>                    <int> <chr>    <chr>             
1 p1             order3                       1 yes      c                 
2 p1             order3                       2 yes      a                 
3 p1             order3                       3 no       b                 
4 p2             order1                       1 no       a                 
5 p2             order1                       2 yes      b                 
6 p2             order1                       3 no       c                 
7 p3             order2                       1 no       b                 
8 p3             order2                       2 yes      c                 
9 p3             order2                       3 yes      a   
  • Related