With the example dataset below, I'd like to create a new column and fill it with a binary variable in the first row for each subject to represent if they ever had a measurement over 12.5 (yes/no) while keeping the format of the rest of the dataset.
Example using olddata_long
from The R Cookbook:
olddata_long <- read.table(header=TRUE, text='
subject sex condition measurement
1 M control 7.9
1 M cond1 12.3
1 M cond2 10.7
2 F control 6.3
2 F cond1 10.6
2 F cond2 11.1
3 F control 9.5
3 F cond1 13.1
3 F cond2 13.8
4 M control 11.5
4 M cond1 13.4
4 M cond2 12.9
')
CodePudding user response:
library(dplyr)
olddata_long %>%
group_by(subject) %>%
mutate(new_col = as.integer(measurement > 12.5 & cumsum(measurement > 12.5) < 2))
# # A tibble: 12 × 5
# # Groups: subject [4]
# subject sex condition measurement new_col
# <int> <chr> <chr> <dbl> <int>
# 1 1 M control 7.9 0
# 2 1 M cond1 12.3 0
# 3 1 M cond2 10.7 0
# 4 2 F control 6.3 0
# 5 2 F cond1 10.6 0
# 6 2 F cond2 11.1 0
# 7 3 F control 9.5 0
# 8 3 F cond1 13.1 1
# 9 3 F cond2 13.8 0
# 10 4 M control 11.5 0
# 11 4 M cond1 13.4 1
# 12 4 M cond2 12.9 0
CodePudding user response:
May I suggest any
?
olddata_long %>%
group_by(subject) %>%
mutate(new_col = as.integer(any(measurement > 12.5)))
# A tibble: 12 x 5
# Groups: subject [4]
subject sex condition measurement new_col
<int> <chr> <chr> <dbl> <int>
1 1 M control 7.9 0
2 1 M cond1 12.3 0
3 1 M cond2 10.7 0
4 2 F control 6.3 0
5 2 F cond1 10.6 0
6 2 F cond2 11.1 0
7 3 F control 9.5 1
8 3 F cond1 13.1 1
9 3 F cond2 13.8 1
10 4 M control 11.5 1
11 4 M cond1 13.4 1
12 4 M cond2 12.9 1