how to create a new variable based on a list of ids in another dataframe in R-CodePudding

I have my main data in data1:

     structure(list(participant = c("DB", "DB", "DB", "TW", "TW", 
"CF", "CF", "JH", "JH", "JH"), timepoint = c(1, 2, 3, 1, 2, 1, 
2, 1, 2, 3), score = c(7, 8, 8, NA, 9, 9, 8, 10, 10, 10)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -10L))

and I have a list of ids in data2:

structure(list(participant = c("DB", "CF")), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -2L))

I would like to add a new column to data1 and create a binary variable (new_dummy) that will equal to 1 if the participant is in data2 and 0 if the participant is not in data2. which should look like this:

    structure(list(participant = c("DB", "DB", "DB", "TW", "TW", 
    "CF", "CF", "JH", "JH", "JH"), timepoint = c(1, 2, 3, 1, 2, 1, 
    2, 1, 2, 3), score = c(7, 8, 8, NA, 9, 9, 8, 10, 10, 10), new_dummy = c(1, 
    1, 1, 0, 0, 1, 1, 0, 0, 0)), class = c("tbl_df", "tbl", "data.frame"
    ), row.names = c(NA, -10L))

CodePudding user response：

Here a base R solution:

data1$new_dummy <- as.numeric(data1$participant %in% data2$participant)

This checks for each participant in data1 if it also exists in the participant column of data2. The output is a vector of TRUE and FALSE statements. By converting the booleans into numerics, you will get 0 and 1 instead. This vector is then assigned to a new column.

CodePudding user response：

A dplyr solution

library(tidyverse)

data1 %>% 
  mutate(newdummy = case_when(participant %in% data2$participant ~ 1, 
                              TRUE ~ 0))

# A tibble: 10 x 4
   participant timepoint score newdummy
   <chr>           <dbl> <dbl>    <dbl>
 1 DB                  1     7        1
 2 DB                  2     8        1
 3 DB                  3     8        1
 4 TW                  1    NA        0
 5 TW                  2     9        0
 6 CF                  1     9        1
 7 CF                  2     8        1
 8 JH                  1    10        0
 9 JH                  2    10        0
10 JH                  3    10        0