Home > Blockchain >  Add column in R that is dependent on multiple variables
Add column in R that is dependent on multiple variables

Time:11-30

I have a dataframe like so:

ID <- c('A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A' )
EXP_P <- c(62,62,62,62,62,62,62,64,64,64,67,67,67,67,67,67)
BRR <- c(61,57,66,53,54,50,55,65,71,53,51,50,58,54,55,57)
val1 <- c(1,1,1,1,1,1,1,2,2,2,3,3,3,3,3,3)

df <- data.frame(ID, EXP_P, BRR, val1)

Output:

   ID EXP_P BRR val1
1   A    62  61    1
2   A    62  57    1
3   A    62  66    1
4   A    62  53    1
5   A    62  54    1
6   A    62  50    1
7   A    62  55    1
8   A    64  65    2
9   A    64  71    2
10  A    64  53    2
11  A    67  51    3
12  A    67  50    3
13  A    67  58    3
14  A    67  54    3
15  A    67  55    3
16  A    67  57    3

I would like to add a new column BPP that is dependent on each of the other columns.

Rules:

  1. When val1 changes and it does not equal the row above, say from 1 to 2, BPP should be same as EXP_P.
  2. When the second occurrance of val1 appears (by group ID), BPP should be the previous BRR value.
  3. When the third occurrance of val1 appears (by group ID), BPP should be the max of the last two previous BRR values.
  4. When the fourth occurrance of val1 appears (by group ID), and all subsequent occurrances of that val1, BPP should be the max of the last three previous BRR values.

Desired output:

   ID EXP_P BRR val1 BPP
1   A    62  61    1  62
2   A    62  57    1  61
3   A    62  66    1  61
4   A    62  53    1  66
5   A    62  54    1  66
6   A    62  50    1  66
7   A    62  55    1  54
8   A    64  65    2  64
9   A    64  71    2  65
10  A    64  53    2  71
11  A    67  51    3  67
12  A    67  50    3  51
13  A    67  58    3  51
14  A    67  54    3  58
15  A    67  55    3  58
16  A    67  57    3  58

Ideally I am looking for a dplyr solution with mutate but if another one is possible, this would be good also.

CodePudding user response:

Here's a pretty direct solution. If you need to generalize to more lags you could use rolling max from the RcppRoll package, but with 3 at most this is pretty manageable to hard-code:

df %>%
  group_by(ID, val1) %>%
  mutate(occ = row_number()) %>%
  group_by(ID) %>%
  mutate(
    BPP = case_when(
      occ == 1 ~ EXP_P,
      occ == 2 ~ lag(BRR),
      occ == 3 ~ pmax(lag(BRR, 1), lag(BRR, 2), na.rm = TRUE),
      TRUE ~ pmax(lag(BRR, 1), lag(BRR, 2), lag(BRR, 3), na.rm = TRUE)
    )
  ) %>%
  ungroup()
# # A tibble: 16 × 6
#    ID    EXP_P   BRR  val1   occ   BPP
#    <chr> <dbl> <dbl> <dbl> <int> <dbl>
#  1 A        62    61     1     1    62
#  2 A        62    57     1     2    61
#  3 A        62    66     1     3    61
#  4 A        62    53     1     4    66
#  5 A        62    54     1     5    66
#  6 A        62    50     1     6    66
#  7 A        62    55     1     7    54
#  8 A        64    65     2     1    64
#  9 A        64    71     2     2    65
# 10 A        64    53     2     3    71
# 11 A        67    51     3     1    67
# 12 A        67    50     3     2    51
# 13 A        67    58     3     3    51
# 14 A        67    54     3     4    58
# 15 A        67    55     3     5    58
# 16 A        67    57     3     6    58
  • Related