How find the row containing the maximum value and its associated year, when the Year Column contains multiple years. My data frame contains monthly river discharge data from january2013 till December 2020. for example if I wanted to find the row containing maximum discharge for the year 2013 or I would like to find out both maximum discharge for 2013 and the date (date/month/year) associated with that particular maximum discharge. How would I do that? in R?
Year | Discharge |
---|---|
1/1/2013 | 23 |
2/1/2013 | 45 |
- - | -- |
12/31/2020 | 80 |
CodePudding user response:
We can convert the column to Date class, get the year
as a separate column, do a group by and slice
the max
row
library(dplyr)
library(lubridate)
df1 %>%
group_by(year = year(mdy(Year))) %>%
slice_max(n = 1, order_by = Discharge) %>%
ungroup
-output
# A tibble: 2 x 3
Year Discharge year
<chr> <int> <dbl>
1 2/1/2013 45 2013
2 12/31/2020 80 2020
if there are multiple formats in the 'Year' column, use parse_date
from parsedate
library(parsedate)
df1 %>%
group_by(year = year(parse_date(Year))) %>%
slice_max(n = 1, order_by = Discharge) %>%
ungroup
Update
Based on the dput
in the comments, the 'Date' column is already in Date
class
df1 %>%
group_by(year= year(Date)) %>%
slice_max(n = 1, order_by = Discharge, with_ties = FALSE) %>%
ungroup
-output
# A tibble: 1 x 3
Date Discharge year
<date> <dbl> <dbl>
1 2018-06-07 0.0116 2018
data
df1 <- structure(list(Year = c("1/1/2013", "2/1/2013", "12/31/2020"),
Discharge = c(23L, 45L, 80L)), class = "data.frame", row.names = c(NA,
-3L))