Home > Mobile >  How find the row containing the maximum value and its associated year, when the Year Column contains
How find the row containing the maximum value and its associated year, when the Year Column contains

Time:09-25

How find the row containing the maximum value and its associated year, when the Year Column contains multiple years. My data frame contains monthly river discharge data from january2013 till December 2020. for example if I wanted to find the row containing maximum discharge for the year 2013 or I would like to find out both maximum discharge for 2013 and the date (date/month/year) associated with that particular maximum discharge. How would I do that? in R?

Year Discharge
1/1/2013 23
2/1/2013 45
- - --
12/31/2020 80

CodePudding user response:

We can convert the column to Date class, get the year as a separate column, do a group by and slice the max row

library(dplyr)
library(lubridate)
df1 %>%
    group_by(year = year(mdy(Year))) %>%
    slice_max(n = 1, order_by = Discharge) %>%
    ungroup

-output

# A tibble: 2 x 3
  Year       Discharge  year
  <chr>          <int> <dbl>
1 2/1/2013          45  2013
2 12/31/2020        80  2020

if there are multiple formats in the 'Year' column, use parse_date from parsedate

library(parsedate)
df1 %>%
    group_by(year = year(parse_date(Year))) %>%
    slice_max(n = 1, order_by = Discharge) %>%
    ungroup

Update

Based on the dput in the comments, the 'Date' column is already in Date class

df1 %>%
   group_by(year= year(Date)) %>%
   slice_max(n = 1, order_by = Discharge, with_ties = FALSE) %>%
    ungroup

-output

# A tibble: 1 x 3
  Date       Discharge  year
  <date>         <dbl> <dbl>
1 2018-06-07    0.0116  2018

data

df1 <- structure(list(Year = c("1/1/2013", "2/1/2013", "12/31/2020"), 
    Discharge = c(23L, 45L, 80L)), class = "data.frame", row.names = c(NA, 
-3L))
  • Related