groupby and then find max across many columns-CodePudding

I have a dataset with lot numbers, and flooding in inches by day. A lot of the lot numbers are duplicated, and I want to find the highest value for a given lot, for a given day. So, as example data, I have:

day1= c( 1, 1.2, 1.5, 4, 6.2, NA, 7, 3.4, 3.4, 3.2)
day2= c( NA, 2.2, 1.9, 4.7, 6.9, NA, 11.1, 3.9, 3.9, 3.4)
day3= c( .9, 1.2, .8, 4, 5.9, NA, 5.2, 1.2, 1.2, .8)
lot=  c("id-1", "id-1", "id-1", "id-2", "id-3", "id-4", "id-4", "id-5", "id-5", "id-5")

df <-data.frame(day1, day2, day3, lot)

and what I want to get is something along the lines of:

> head(df_slim, n=10)
  day1 day2 day3  lot
1  1.5  2.2  1.2 id-1
2  4.0  4.7  4.0 id-2
3  6.2  6.9  5.9 id-3
4  7.0 11.1  5.2 id-4
5  3.4  3.9  1.2 id-5

I am trying the code below, but I know I'm not close, and was hoping ya'll could point me in the right direction! Thanks!

df_slim <-  df %>% group_by(lot) %>% df[which.max(df$day1),]

CodePudding user response：

This code will give the maximum value for each day by lot

library(dplyr)

df %>% 
  group_by(lot) %>% 
  summarise(across(.cols = starts_with("day"),.fns = ~max(.,na.rm = TRUE)))

# A tibble: 5 x 4
 lot     day1  day2  day3
<chr>    <dbl> <dbl> <dbl>
1 id-1    1.5   2.2   1.2
2 id-2    4     4.7   4  
3 id-3    6.2   6.9   5.9
4 id-4    7    11.1   5.2
5 id-5    3.4   3.9   1.2

CodePudding user response：

If this needs to be done on each column separately

library(dplyr)
df %>% 
   group_by(lot) %>% 
   summarise(across(everything(), ~ .[which.max(.)]))

-output

# A tibble: 5 × 4
  lot    day1  day2  day3
  <chr> <dbl> <dbl> <dbl>
1 id-1    1.5   2.2   1.2
2 id-2    4     4.7   4  
3 id-3    6.2   6.9   5.9
4 id-4    7    11.1   5.2
5 id-5    3.4   3.9   1.2

Or using summarise_all

df %>%
    group_by(lot) %>% 
    summarise_all(max, na.rm = TRUE)
# A tibble: 5 × 4
  lot    day1  day2  day3
  <chr> <dbl> <dbl> <dbl>
1 id-1    1.5   2.2   1.2
2 id-2    4     4.7   4  
3 id-3    6.2   6.9   5.9
4 id-4    7    11.1   5.2
5 id-5    3.4   3.9   1.2

Or using data.table

library(data.table)
setDT(df)[, lapply(.SD, max, na.rm = TRUE), lot]
    lot day1 day2 day3
1: id-1  1.5  2.2  1.2
2: id-2  4.0  4.7  4.0
3: id-3  6.2  6.9  5.9
4: id-4  7.0 11.1  5.2
5: id-5  3.4  3.9  1.2

Or with collapse

library(collapse)
fmax(nv(df), g = df$lot)
     day1 day2 day3
id-1  1.5  2.2  1.2
id-2  4.0  4.7  4.0
id-3  6.2  6.9  5.9
id-4  7.0 11.1  5.2
id-5  3.4  3.9  1.2

Or this can be done in base R with aggregate

aggregate(.~ lot, df, max)
   lot day1 day2 day3
1 id-1  1.5  2.2  1.2
2 id-2  4.0  4.7  4.0
3 id-3  6.2  6.9  5.9
4 id-4  7.0 11.1  5.2
5 id-5  3.4  3.9  1.2