I have a dataframe like so:
ID <- c('A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A' )
EXP_P <- c(62,62,62,62,62,62,62,64,64,64,67,67,67,67,67,67)
BRR <- c(61,57,66,53,54,50,55,65,71,53,51,50,58,54,55,57)
val1 <- c(1,1,1,1,1,1,1,2,2,2,3,3,3,3,3,3)
df <- data.frame(ID, EXP_P, BRR, val1)
Output:
ID EXP_P BRR val1
1 A 62 61 1
2 A 62 57 1
3 A 62 66 1
4 A 62 53 1
5 A 62 54 1
6 A 62 50 1
7 A 62 55 1
8 A 64 65 2
9 A 64 71 2
10 A 64 53 2
11 A 67 51 3
12 A 67 50 3
13 A 67 58 3
14 A 67 54 3
15 A 67 55 3
16 A 67 57 3
I would like to add a new column BPP
that is dependent on each of the other columns.
Rules:
- When
val1
changes and it does not equal the row above, say from1
to2
,BPP
should be same asEXP_P
. - When the second occurrance of
val1
appears (by groupID
),BPP
should be the previousBRR
value. - When the third occurrance of
val1
appears (by groupID
),BPP
should be the max of the last two previousBRR
values. - When the fourth occurrance of
val1
appears (by groupID
), and all subsequent occurrances of thatval1
, BPP should be the max of the last three previousBRR
values.
Desired output:
ID EXP_P BRR val1 BPP
1 A 62 61 1 62
2 A 62 57 1 61
3 A 62 66 1 61
4 A 62 53 1 66
5 A 62 54 1 66
6 A 62 50 1 66
7 A 62 55 1 54
8 A 64 65 2 64
9 A 64 71 2 65
10 A 64 53 2 71
11 A 67 51 3 67
12 A 67 50 3 51
13 A 67 58 3 51
14 A 67 54 3 58
15 A 67 55 3 58
16 A 67 57 3 58
Ideally I am looking for a dplyr
solution with mutate
but if another one is possible, this would be good also.
CodePudding user response:
Here's a pretty direct solution. If you need to generalize to more lags you could use rolling max from the RcppRoll
package, but with 3 at most this is pretty manageable to hard-code:
df %>%
group_by(ID, val1) %>%
mutate(occ = row_number()) %>%
group_by(ID) %>%
mutate(
BPP = case_when(
occ == 1 ~ EXP_P,
occ == 2 ~ lag(BRR),
occ == 3 ~ pmax(lag(BRR, 1), lag(BRR, 2), na.rm = TRUE),
TRUE ~ pmax(lag(BRR, 1), lag(BRR, 2), lag(BRR, 3), na.rm = TRUE)
)
) %>%
ungroup()
# # A tibble: 16 × 6
# ID EXP_P BRR val1 occ BPP
# <chr> <dbl> <dbl> <dbl> <int> <dbl>
# 1 A 62 61 1 1 62
# 2 A 62 57 1 2 61
# 3 A 62 66 1 3 61
# 4 A 62 53 1 4 66
# 5 A 62 54 1 5 66
# 6 A 62 50 1 6 66
# 7 A 62 55 1 7 54
# 8 A 64 65 2 1 64
# 9 A 64 71 2 2 65
# 10 A 64 53 2 3 71
# 11 A 67 51 3 1 67
# 12 A 67 50 3 2 51
# 13 A 67 58 3 3 51
# 14 A 67 54 3 4 58
# 15 A 67 55 3 5 58
# 16 A 67 57 3 6 58