Home > Blockchain >  Return the max value by group with multiple criteria in R
Return the max value by group with multiple criteria in R

Time:11-30

I have a dataframe like so:

ID <- c('A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A' )
BRR <- c(62,57,66,53,54,50,55,65,71,53,51,50,58,54,55,57)
val1 <- c(1,1,1,1,1,1,1,2,2,2,3,3,3,3,3,3)
val2 <- c(0,0,0,0,0,0,0,1,1,1,2,2,2,2,2,2)

df <- data.frame(ID, BRR, val1, val2)

Output:

   ID BRR val1 val2
1   A  62    1    0
2   A  57    1    0
3   A  66    1    0
4   A  53    1    0
5   A  54    1    0
6   A  50    1    0
7   A  55    1    0
8   A  65    2    1
9   A  71    2    1
10  A  53    2    1
11  A  51    3    2
12  A  50    3    2
13  A  58    3    2
14  A  54    3    2
15  A  55    3    2
16  A  57    3    2

I would like to get the max value of BRR by group (ID, val1, val2). In this case, ID has the same value. Val2 will always be 1 less than val1 so I am not sure that this column is even required.

So when val2 is 0, max_val_KP will be 0. When val2 is 1, I would like the max value from val1 grouping, like so:

   ID BRR val1 val2 Max_val_KP
1   A  62    1    0          0
2   A  57    1    0          0
3   A  66    1    0          0
4   A  53    1    0          0
5   A  54    1    0          0
6   A  50    1    0          0
7   A  55    1    0          0
8   A  65    2    1         66
9   A  71    2    1         66
10  A  53    2    1         66
11  A  51    3    2         71
12  A  50    3    2         71
13  A  58    3    2         71
14  A  54    3    2         71
15  A  55    3    2         71
16  A  57    3    2         71

I tried:

require(dplyr)
df <- df %>%
  filter(va1 == val2) %>%
  group_by(ID, val2) %>%
  mutate(max_val_KP = max(BRR))

I also tried:

require(dplyr)
df <- df %>%
  group_by(ID, val1 == val2) %>%
  mutate(max_val_KP = max(BRR))

CodePudding user response:

You can do:

library(dplyr)
library(tidyr)

df %>%
  group_by(ID) %>%
  mutate(max_val_KP = replace_na(ave(BRR, val1, FUN = max)[match(val2, val1)], 0)) %>%
  ungroup()

# A tibble: 16 × 5
   ID      BRR  val1  val2 max_val_KP
   <chr> <dbl> <dbl> <dbl>      <dbl>
 1 A        62     1     0          0
 2 A        57     1     0          0
 3 A        66     1     0          0
 4 A        53     1     0          0
 5 A        54     1     0          0
 6 A        50     1     0          0
 7 A        55     1     0          0
 8 A        65     2     1         66
 9 A        71     2     1         66
10 A        53     2     1         66
11 A        51     3     2         71
12 A        50     3     2         71
13 A        58     3     2         71
14 A        54     3     2         71
15 A        55     3     2         71
16 A        57     3     2         71

CodePudding user response:

This is by no means the most elegant and efficient way to solve the problem, but this occurs to me.

ID <- c('A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A' )
BRR <- c(62,57,66,53,54,50,55,65,71,53,51,50,58,54,55,57)
val1 <- c(1,1,1,1,1,1,1,2,2,2,3,3,3,3,3,3)
val2 <- c(0,0,0,0,0,0,0,1,1,1,2,2,2,2,2,2)

df <- data.frame(ID, BRR, val1, val2)


Group <- df%>%group_by(val1)%>%summarise(Max_val_KP = max(BRR))
colnames(Group)[1] <- "val2"

Final <- left_join(df, Group)
Final$Max_val_KP[is.na(Final$Max_val_KP)] <- 0

As I understood you want the value of Val2 when it is equal to Val1 - 1, that's why I change the name of the column in the Group data frame, so the left join matches Val2 and not Val1.

This give me the same result as you desire output.

CodePudding user response:

If val2 is always val1 1, then val1 == val2 will return nothing because they are never the same. I suggest removing that condition and then also using the mutate() function from dplyr.

Try this

library(dplyr)
df |> group_by(val1, val2) |> 
  mutate(Max_val_KP = max(BRR)) |> ungroup()

enter image description here

  • Related