I have my main data in data1:
structure(list(participant = c("DB", "DB", "DB", "TW", "TW",
"CF", "CF", "JH", "JH", "JH"), timepoint = c(1, 2, 3, 1, 2, 1,
2, 1, 2, 3), score = c(7, 8, 8, NA, 9, 9, 8, 10, 10, 10)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -10L))
and I have a list of ids in data2:
structure(list(participant = c("DB", "CF")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -2L))
I would like to add a new column to data1 and create a binary variable (new_dummy) that will equal to 1 if the participant is in data2 and 0 if the participant is not in data2. which should look like this:
structure(list(participant = c("DB", "DB", "DB", "TW", "TW",
"CF", "CF", "JH", "JH", "JH"), timepoint = c(1, 2, 3, 1, 2, 1,
2, 1, 2, 3), score = c(7, 8, 8, NA, 9, 9, 8, 10, 10, 10), new_dummy = c(1,
1, 1, 0, 0, 1, 1, 0, 0, 0)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -10L))
CodePudding user response:
Here a base R solution:
data1$new_dummy <- as.numeric(data1$participant %in% data2$participant)
This checks for each participant
in data1
if it also exists in the participant
column of data2
. The output is a vector of TRUE
and FALSE
statements. By converting the booleans into numerics, you will get 0
and 1
instead. This vector is then assigned to a new column.
CodePudding user response:
A dplyr
solution
library(tidyverse)
data1 %>%
mutate(newdummy = case_when(participant %in% data2$participant ~ 1,
TRUE ~ 0))
# A tibble: 10 x 4
participant timepoint score newdummy
<chr> <dbl> <dbl> <dbl>
1 DB 1 7 1
2 DB 2 8 1
3 DB 3 8 1
4 TW 1 NA 0
5 TW 2 9 0
6 CF 1 9 1
7 CF 2 8 1
8 JH 1 10 0
9 JH 2 10 0
10 JH 3 10 0