Compare two observations and keep the higher Vale-CodePudding

I have a DF that looks like this;

ID      Station   Date       Value
111-001 Place1   2022-05-01  39.1
111-002 Place2   2022-05-01  0.01
222-001 Place1   2022-05-02  8.5
222-002 Place2   2022-05-02  40.3

I'd like to compare the similar IDs and keep the higher of the two. How is the best method to achieve this? I prefer using packages to base and any help is appreciated.

CodePudding user response：

Use susbtr to group_by the 3 first numbers of ID, then use max:

library(dplyr)
dat %>% 
  group_by(gp = substr(ID, 1, 3)) %>% 
  mutate(max = max(Value))

# A tibble: 4 × 6
# Groups:   gp [2]
  ID      Station Date       Value gp      max
  <chr>   <chr>   <chr>      <dbl> <chr> <dbl>
1 111-001 Place1  2022-05-01 39.1  111    39.1
2 111-002 Place2  2022-05-01  0.01 111    39.1
3 222-001 Place1  2022-05-02  8.5  222    40.3
4 222-002 Place2  2022-05-02 40.3  222    40.3

In base R, use ave:

with(dat, ave(Value, substr(ID, 1, 3), FUN = max))
#[1] 39.1 39.1 40.3 40.3

CodePudding user response：

Using base R

transform(dat, Max = ave(Value, trimws(ID, whitespace = "-\\d "), FUN = max))

CodePudding user response：

data.table option:

library(data.table)
setDT(df)[, .SD[which.max(Value)], by=ID]

Output:

        ID Station       Date Value
1: 111-001  Place1 2022-05-01  39.1
2: 111-002  Place2 2022-05-02  40.3

Data:

df <- data.frame(ID = c("111-001", "111-002", "111-001", "111-002"),
                 Station = c("Place1", "Place2", "Place1", "Place2"),
                 Date = c("2022-05-01", "2022-05-01", "2022-05-02", "2022-05-02"),
                 Value = c(39.1, 0.01, 8.5, 40.3))

CodePudding user response：

You may use aggregate() in baseR. Change ID to factor

df$ID <- as.factor(df$ID)
# and do 
aggregate(.~ID, df, FUN = max)

gives

       ID Value
1 111-001  39.1
2 111-002  40.3

data

df <- data.frame(ID = c("111-001", "111-002", "111-001", "111-002"), 
                 Value = c(39.1, 0.01, 8.5, 40.3))

CodePudding user response：

Similar to Maël's answer I used this code to achieve my goal.

dat %>% 
  group_by(gp = substr(ID, 1, 3)) %>% 
  slice_max(order_by = Value)