I have a dataframe with columns seq (sequence) and num, e.g.:
seq num
1 0.1
2 0.1
3 0.2
1 0
2 0
3 0
1 0.5
2 2
3 6
4 9
5 12
1 0
2 0
3 0
I need to create a new binary column state, that would be state=1 for the sequences that have num>7.5.
So, I need state=1 to start from the closest seq=1 prior to the num>7.5 value:
seq num state
1 0.1 0
2 0.1 0
3 0.2 0
1 0 0
2 0 0
3 0 0
1 0.5 1
2 2 1
3 6 1
4 9 1
5 12 1
1 0 0
2 0 0
3 0 0
This seems like it should be simple, but I've been failing with it for a few days.
To state the obvious if I just do a conditional that takes over 7.5 I would not get state=1 for the full sequence:
for(i in 1:(length(df$state))){
if(df$num[i] > 7.5){
df$state[i] = 1
}
}
seq num state
1 0.1 0
2 0.1 0
3 0.2 0
1 0 0
2 0 0
3 0 0
1 0.5 0
2 2 0
3 6 0
4 9 1
5 12 1
1 0 0
2 0 0
3 0 0
Thank you!
CodePudding user response:
We can define a grouping variable that is the cumulative count of 1
s in the seq
column, and then assign state
by group:
library(dplyr)
df %>%
group_by(grp = cumsum(seq == 1)) %>%
mutate(state = as.integer(any(num > 7.5))) %>%
ungroup()
# # A tibble: 14 × 4
# seq num grp state
# <int> <dbl> <int> <int>
# 1 1 0.1 1 0
# 2 2 0.1 1 0
# 3 3 0.2 1 0
# 4 1 0 2 0
# 5 2 0 2 0
# 6 3 0 2 0
# 7 1 0.5 3 1
# 8 2 2 3 1
# 9 3 6 3 1
# 10 4 9 3 1
# 11 5 12 3 1
# 12 1 0 4 0
# 13 2 0 4 0
# 14 3 0 4 0