Home > Enterprise >  Add column in R that is dependent on several variables
Add column in R that is dependent on several variables

Time:01-06

This is a follow up to my question here.

I have a dataframe like so:

ID <- c('A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A' )
EXP_P <- c(62,62,62,62,62,62,62,64,64,64,67,67,67,67,67,67)
BRR <- c(61,57,66,53,54,50,55,65,71,53,51,50,58,54,55,57)
val1 <- c(1,1,1,1,1,1,1,2,2,2,3,3,3,3,3,3)
CURR_PR <- c(1,1,1,1,1,1,1,2,2,2,2,3,3,3,3,3)


df <- data.frame(ID, EXP_P, BRR, val1, CURR_PR)

Output:

   ID EXP_P BRR val1 CURR_PR
1   A    62  61    1       1
2   A    62  57    1       1
3   A    62  66    1       1
4   A    62  53    1       1
5   A    62  54    1       1
6   A    62  50    1       1
7   A    62  55    1       1
8   A    64  65    2       2
9   A    64  71    2       2
10  A    64  53    2       2
11  A    67  51    3       2
12  A    67  50    3       3
13  A    67  58    3       3
14  A    67  54    3       3
15  A    67  55    3       3
16  A    67  57    3       3

I have now added the new column BPP that is dependent on each of the other columns using the rules specified in my initial question thanks to that response:

    df %>%
      group_by(ID, val1) %>%
      mutate(occ = row_number()) %>%
      group_by(ID) %>%
      mutate(
        BPP = case_when(
          occ == 1 ~ EXP_P,
          occ == 2 ~ lag(BRR),
          occ == 3 ~ pmax(lag(BRR, 1), lag(BRR, 2), na.rm = TRUE),
          TRUE ~ pmax(lag(BRR, 1), lag(BRR, 2), lag(BRR, 3), na.rm = TRUE)
        )
      ) %>%
      ungroup()
# # A tibble: 16 × 6
#    ID    EXP_P   BRR  val1   occ   BPP
#    <chr> <dbl> <dbl> <dbl> <int> <dbl>
#  1 A        62    61     1     1    62
#  2 A        62    57     1     2    61
#  3 A        62    66     1     3    61
#  4 A        62    53     1     4    66
#  5 A        62    54     1     5    66
#  6 A        62    50     1     6    66
#  7 A        62    55     1     7    54
#  8 A        64    65     2     1    64
#  9 A        64    71     2     2    65
# 10 A        64    53     2     3    71
# 11 A        67    51     3     1    67
# 12 A        67    50     3     2    51
# 13 A        67    58     3     3    51
# 14 A        67    54     3     4    58
# 15 A        67    55     3     5    58
# 16 A        67    57     3     6    58

Now I have another variable (CURR_PR) and when this changes, I need the value in BRR to be the same as it was under the previous value in BRR.

For example, when CURR_PR is 1, the first value for BRR is 61. When CURR_PR changes to 2, I want BRR to be the same value as it was for the first BRR value when CURR_PR is 1 (61).

When CURR_PR is 2, the first value for BRR after the update will be 61. When CURR_PR changes to 3, I want BRR to be the same value as it was for the first BRR value when CURR_PR is 2 (61).

Note that this value will update through the calcs shown above in the mutate and case_when values, so it won't always be 61.

Desired output:

   ID EXP_P BRR val1 CURR_PR
1   A    62  61    1       1
2   A    62  57    1       1
3   A    62  66    1       1
4   A    62  53    1       1
5   A    62  54    1       1
6   A    62  50    1       1
7   A    62  55    1       1
8   A    64  61    2       2
9   A    64  71    2       2
10  A    64  53    2       2
11  A    67  51    3       2
12  A    67  61    3       3
13  A    67  58    3       3
14  A    67  54    3       3
15  A    67  55    3       3
16  A    67  57    3       3

CodePudding user response:

I was a little confused, basically every time that CURR_PR change, BRR will receive the first value of BRR where CURR_PR was 1? If so:

Code

df %>% 
  group_by(CURR_PR) %>% 
  mutate(aux = row_number()) %>% 
  ungroup() %>% 
  mutate(BRR = if_else(aux == 1 & CURR_PR > 1, BRR[aux == 1 & CURR_PR == 1],BRR))

Output

# A tibble: 16 x 6
   ID    EXP_P   BRR  val1 CURR_PR   aux
   <chr> <dbl> <dbl> <dbl>   <dbl> <int>
 1 A        62    61     1       1     1
 2 A        62    57     1       1     2
 3 A        62    66     1       1     3
 4 A        62    53     1       1     4
 5 A        62    54     1       1     5
 6 A        62    50     1       1     6
 7 A        62    55     1       1     7
 8 A        64    61     2       2     1
 9 A        64    71     2       2     2
10 A        64    53     2       2     3
11 A        67    51     3       2     4
12 A        67    61     3       3     1
13 A        67    58     3       3     2
14 A        67    54     3       3     3
15 A        67    55     3       3     4
16 A        67    57     3       3     5

CodePudding user response:

We may group by 'ID' and use replace

library(dplyr)
df %>%
   group_by(ID) %>% 
   mutate(BRR = replace(BRR, !duplicated(CURR_PR), first(BRR))) %>% 
   ungroup

-output

# A tibble: 16 × 5
   ID    EXP_P   BRR  val1 CURR_PR
   <chr> <dbl> <dbl> <dbl>   <dbl>
 1 A        62    61     1       1
 2 A        62    57     1       1
 3 A        62    66     1       1
 4 A        62    53     1       1
 5 A        62    54     1       1
 6 A        62    50     1       1
 7 A        62    55     1       1
 8 A        64    61     2       2
 9 A        64    71     2       2
10 A        64    53     2       2
11 A        67    51     3       2
12 A        67    61     3       3
13 A        67    58     3       3
14 A        67    54     3       3
15 A        67    55     3       3
16 A        67    57     3       3

If we are using the devel version of dplyr, can use .by in mutate

 df %>%
   mutate(BRR = replace(BRR, !duplicated(CURR_PR), first(BRR)), .by = "ID")

-output

   ID EXP_P BRR val1 CURR_PR
1   A    62  61    1       1
2   A    62  57    1       1
3   A    62  66    1       1
4   A    62  53    1       1
5   A    62  54    1       1
6   A    62  50    1       1
7   A    62  55    1       1
8   A    64  61    2       2
9   A    64  71    2       2
10  A    64  53    2       2
11  A    67  51    3       2
12  A    67  61    3       3
13  A    67  58    3       3
14  A    67  54    3       3
15  A    67  55    3       3
16  A    67  57    3       3
  • Related