I have a dataset with lot numbers, and flooding in inches by day. A lot of the lot numbers are duplicated, and I want to find the highest value for a given lot, for a given day. So, as example data, I have:
day1= c( 1, 1.2, 1.5, 4, 6.2, NA, 7, 3.4, 3.4, 3.2)
day2= c( NA, 2.2, 1.9, 4.7, 6.9, NA, 11.1, 3.9, 3.9, 3.4)
day3= c( .9, 1.2, .8, 4, 5.9, NA, 5.2, 1.2, 1.2, .8)
lot= c("id-1", "id-1", "id-1", "id-2", "id-3", "id-4", "id-4", "id-5", "id-5", "id-5")
df <-data.frame(day1, day2, day3, lot)
and what I want to get is something along the lines of:
> head(df_slim, n=10)
day1 day2 day3 lot
1 1.5 2.2 1.2 id-1
2 4.0 4.7 4.0 id-2
3 6.2 6.9 5.9 id-3
4 7.0 11.1 5.2 id-4
5 3.4 3.9 1.2 id-5
I am trying the code below, but I know I'm not close, and was hoping ya'll could point me in the right direction! Thanks!
df_slim <- df %>% group_by(lot) %>% df[which.max(df$day1),]
CodePudding user response:
This code will give the maximum value for each day by lot
library(dplyr)
df %>%
group_by(lot) %>%
summarise(across(.cols = starts_with("day"),.fns = ~max(.,na.rm = TRUE)))
# A tibble: 5 x 4
lot day1 day2 day3
<chr> <dbl> <dbl> <dbl>
1 id-1 1.5 2.2 1.2
2 id-2 4 4.7 4
3 id-3 6.2 6.9 5.9
4 id-4 7 11.1 5.2
5 id-5 3.4 3.9 1.2
CodePudding user response:
If this needs to be done on each column separately
library(dplyr)
df %>%
group_by(lot) %>%
summarise(across(everything(), ~ .[which.max(.)]))
-output
# A tibble: 5 × 4
lot day1 day2 day3
<chr> <dbl> <dbl> <dbl>
1 id-1 1.5 2.2 1.2
2 id-2 4 4.7 4
3 id-3 6.2 6.9 5.9
4 id-4 7 11.1 5.2
5 id-5 3.4 3.9 1.2
Or using summarise_all
df %>%
group_by(lot) %>%
summarise_all(max, na.rm = TRUE)
# A tibble: 5 × 4
lot day1 day2 day3
<chr> <dbl> <dbl> <dbl>
1 id-1 1.5 2.2 1.2
2 id-2 4 4.7 4
3 id-3 6.2 6.9 5.9
4 id-4 7 11.1 5.2
5 id-5 3.4 3.9 1.2
Or using data.table
library(data.table)
setDT(df)[, lapply(.SD, max, na.rm = TRUE), lot]
lot day1 day2 day3
1: id-1 1.5 2.2 1.2
2: id-2 4.0 4.7 4.0
3: id-3 6.2 6.9 5.9
4: id-4 7.0 11.1 5.2
5: id-5 3.4 3.9 1.2
Or with collapse
library(collapse)
fmax(nv(df), g = df$lot)
day1 day2 day3
id-1 1.5 2.2 1.2
id-2 4.0 4.7 4.0
id-3 6.2 6.9 5.9
id-4 7.0 11.1 5.2
id-5 3.4 3.9 1.2
Or this can be done in base R
with aggregate
aggregate(.~ lot, df, max)
lot day1 day2 day3
1 id-1 1.5 2.2 1.2
2 id-2 4.0 4.7 4.0
3 id-3 6.2 6.9 5.9
4 id-4 7.0 11.1 5.2
5 id-5 3.4 3.9 1.2