I have a dataframe representing a two-year daily time series of temperature for two rivers. I have identified when the temperature is either above or below the peak temperature. I have also created a run-length ID column for when temperature is either above or below a threshold temperature of 10 degrees.
How can I get the first day of year for each site and year and the following conditions:
- maximum run-length & below peak =
TRUE
- maximum run-length & above peak =
TRUE
Example Data:
library(ggplot2)
library(lubridate)
library(dplyr)
library(dataRetrieval)
siteNumber <- c("01432805","01388000") # United States Geological Survey site numbers
parameterCd <- "00010" # temperature
statCd <- "00003" # mean
startDate <- "1996-01-01"
endDate <- "1997-12-31"
dat <- readNWISdv(siteNumber, parameterCd, startDate, endDate, statCd=statCd) # obtains the timeseries from the USGS
dat <- dat[,c(2:4)]
colnames(dat)[3] <- "temperature"
# To view at the time series
ggplot(data = dat, aes(x = Date, y = temperature))
geom_point()
theme_bw()
facet_wrap(~site_no)
To create the columns described above
dat <- dat %>%
mutate(year = year(Date),
doy = yday(Date)) %>% # doy = day of year
group_by(site_no, year) %>%
mutate(lt_10 = temperature <= 10,
peak_doy = doy[which.max(temperature)],
below_peak = doy < peak_doy,
after_peak = doy > peak_doy,
run = data.table::rleid(lt_10))
View(dat)
The ideal output would look as follows:
site_no year doy_below doy_after
1 01388000 1996 111 317
2 01388000 1997 112 312
3 01432805 1996 137 315
4 01432805 1997 130 294
doy_after
= the first row for after_peak == TRUE
& max(run)
when group_by(site_no,year)
doy_below
= the first row for below_peak == TRUE
& max(run)
when group_by(site_no,year)
- For
site_no
= 01388000 inyear
= 1996, themax(run)
whenbelow_peak == TRUE
is 4. The first row whenrun
= 4 andbelow_peak == TRUE
corresponds with date1996-04-20
which has adoy
= 111.
CodePudding user response:
As the data is already grouped, just summarise
by extracting the 'doy' where the run
is max
for the subset of run
where the values are TRUE in 'below_peak' or 'after_peak' and get the first
element of 'doy'
library(dplyr)
dat %>%
summarise(doy_below = first(doy[run == max(run[below_peak])]),
doy_above = first(doy[run == max(run[after_peak])]), .groups = 'drop')
-output
# A tibble: 4 × 4
site_no year doy_below doy_above
<chr> <dbl> <dbl> <dbl>
1 01388000 1996 111 317
2 01388000 1997 112 312
3 01432805 1996 137 315
4 01432805 1997 130 294