Home > other >  How do I create a new binary variable in a long dataset at t=0 to represent whether a value was ever
How do I create a new binary variable in a long dataset at t=0 to represent whether a value was ever

Time:04-08

With the example dataset below, I'd like to create a new column and fill it with a binary variable in the first row for each subject to represent if they ever had a measurement over 12.5 (yes/no) while keeping the format of the rest of the dataset.

Example using olddata_long from The R Cookbook:

olddata_long <- read.table(header=TRUE, text='
 subject sex condition measurement
       1   M   control         7.9
       1   M     cond1        12.3
       1   M     cond2        10.7
       2   F   control         6.3
       2   F     cond1        10.6
       2   F     cond2        11.1
       3   F   control         9.5
       3   F     cond1        13.1
       3   F     cond2        13.8
       4   M   control        11.5
       4   M     cond1        13.4
       4   M     cond2        12.9
')

CodePudding user response:

library(dplyr)
olddata_long %>%
  group_by(subject) %>%
  mutate(new_col = as.integer(measurement > 12.5 & cumsum(measurement > 12.5) < 2))
# # A tibble: 12 × 5
# # Groups:   subject [4]
#    subject sex   condition measurement new_col
#      <int> <chr> <chr>           <dbl>   <int>
#  1       1 M     control           7.9       0
#  2       1 M     cond1            12.3       0
#  3       1 M     cond2            10.7       0
#  4       2 F     control           6.3       0
#  5       2 F     cond1            10.6       0
#  6       2 F     cond2            11.1       0
#  7       3 F     control           9.5       0
#  8       3 F     cond1            13.1       1
#  9       3 F     cond2            13.8       0
# 10       4 M     control          11.5       0
# 11       4 M     cond1            13.4       1
# 12       4 M     cond2            12.9       0

CodePudding user response:

May I suggest any?

olddata_long %>%
  group_by(subject) %>%
  mutate(new_col = as.integer(any(measurement > 12.5)))

# A tibble: 12 x 5
# Groups:   subject [4]
   subject sex   condition measurement new_col
     <int> <chr> <chr>           <dbl>   <int>
 1       1 M     control           7.9       0
 2       1 M     cond1            12.3       0
 3       1 M     cond2            10.7       0
 4       2 F     control           6.3       0
 5       2 F     cond1            10.6       0
 6       2 F     cond2            11.1       0
 7       3 F     control           9.5       1
 8       3 F     cond1            13.1       1
 9       3 F     cond2            13.8       1
10       4 M     control          11.5       1
11       4 M     cond1            13.4       1
12       4 M     cond2            12.9       1
  • Related