I have a DF that looks like this;
ID Station Date Value
111-001 Place1 2022-05-01 39.1
111-002 Place2 2022-05-01 0.01
222-001 Place1 2022-05-02 8.5
222-002 Place2 2022-05-02 40.3
I'd like to compare the similar IDs and keep the higher of the two. How is the best method to achieve this? I prefer using packages to base and any help is appreciated.
CodePudding user response:
Use susbtr
to group_by
the 3 first numbers of ID, then use max
:
library(dplyr)
dat %>%
group_by(gp = substr(ID, 1, 3)) %>%
mutate(max = max(Value))
# A tibble: 4 × 6
# Groups: gp [2]
ID Station Date Value gp max
<chr> <chr> <chr> <dbl> <chr> <dbl>
1 111-001 Place1 2022-05-01 39.1 111 39.1
2 111-002 Place2 2022-05-01 0.01 111 39.1
3 222-001 Place1 2022-05-02 8.5 222 40.3
4 222-002 Place2 2022-05-02 40.3 222 40.3
In base R, use ave
:
with(dat, ave(Value, substr(ID, 1, 3), FUN = max))
#[1] 39.1 39.1 40.3 40.3
CodePudding user response:
Using base R
transform(dat, Max = ave(Value, trimws(ID, whitespace = "-\\d "), FUN = max))
CodePudding user response:
data.table
option:
library(data.table)
setDT(df)[, .SD[which.max(Value)], by=ID]
Output:
ID Station Date Value
1: 111-001 Place1 2022-05-01 39.1
2: 111-002 Place2 2022-05-02 40.3
Data:
df <- data.frame(ID = c("111-001", "111-002", "111-001", "111-002"),
Station = c("Place1", "Place2", "Place1", "Place2"),
Date = c("2022-05-01", "2022-05-01", "2022-05-02", "2022-05-02"),
Value = c(39.1, 0.01, 8.5, 40.3))
CodePudding user response:
You may use aggregate()
in baseR. Change ID
to factor
df$ID <- as.factor(df$ID)
# and do
aggregate(.~ID, df, FUN = max)
gives
ID Value
1 111-001 39.1
2 111-002 40.3
data
df <- data.frame(ID = c("111-001", "111-002", "111-001", "111-002"),
Value = c(39.1, 0.01, 8.5, 40.3))
CodePudding user response:
Similar to Maël's answer I used this code to achieve my goal.
dat %>%
group_by(gp = substr(ID, 1, 3)) %>%
slice_max(order_by = Value)