For example, I have a dataset of 30-years air temperature of a city, the dataset looks like:
Year Julian_date temperature
1991 1 2.1
1991 2 2.2
... ... ...
1991 365 2.3
1992 1 2.1
... ... ...
1992 365 2.5
... ... ...
2020 366 2.5
I would like to calculate the 90th percentile value of each Julian date (from different years), and returen the results, like:
Julian_date value(the 90th percentile)
1 2.4
2 2.6
... ...
365 2.5
How should I write the code in r?
CodePudding user response:
You can first group by Julian_date
, then use the quantile
function to set the probability inside summarise
.
library(tidyverse)
df %>%
group_by(Julian_date) %>%
summarise("value (the 90th percentile)" = quantile(temperature, probs=0.9, na.rm=TRUE))
Output
Julian_date `value (the 90th percentile)`
<int> <dbl>
1 1 2.1
2 2 2.2
3 365 2.5
Data
df <- structure(list(Year = c(1991L, 1991L, 1991L, 1992L, 1992L, 2020L
), Julian_date = c(1L, 2L, 365L, 1L, 365L, 365L), temperature = c(2.1,
2.2, 2.3, 2.1, 2.5, 2.5)), class = "data.frame", row.names = c(NA,
-6L))
CodePudding user response:
You can use quantile()
function. If (from different years)
in your question means each year should have separate calculation, then you need to group the data frame by Year
and Julian_date
. If instead it means the different years are combined, you need to group the data frame only by Julian_date
, as @AndrewGB and @benson23 showed.
library(dplyr)
yourdf %>% group_by(Year, Julian_date) %>%
summarise (value_90th_percentile = quantile(temperature, 0.9, na.rm = TRUE))