Home > Mobile >  Identify first unique value when multiple conditions are met using dplyr in R
Identify first unique value when multiple conditions are met using dplyr in R

Time:10-29

I have a dataframe representing a two-year daily time series of temperature for two rivers. I have identified when the temperature is either above or below the peak temperature. I have also created a run-length ID column for when temperature is either above or below a threshold temperature of 10 degrees.

How can I get the first day of year for each site and year and the following conditions:

  1. maximum run-length & below peak = TRUE
  2. maximum run-length & above peak = TRUE

Example Data:

library(ggplot2)
library(lubridate)
library(dplyr)
library(dataRetrieval)

siteNumber <- c("01432805","01388000") # United States Geological Survey site numbers
parameterCd <- "00010" # temperature
statCd <- "00003" # mean
startDate <- "1996-01-01"
endDate <- "1997-12-31"

dat <- readNWISdv(siteNumber, parameterCd, startDate, endDate, statCd=statCd) # obtains the timeseries from the USGS
dat <- dat[,c(2:4)]
colnames(dat)[3] <- "temperature"

# To view at the time series
ggplot(data = dat, aes(x = Date, y = temperature))  
  geom_point()  
  theme_bw()  
  facet_wrap(~site_no)

To create the columns described above

dat <- dat %>%
  mutate(year = year(Date),
         doy = yday(Date)) %>% # doy = day of year
  group_by(site_no, year) %>%
  mutate(lt_10 = temperature <= 10,
         peak_doy = doy[which.max(temperature)],
         below_peak = doy < peak_doy,
         after_peak = doy > peak_doy,
         run = data.table::rleid(lt_10))
View(dat)

The ideal output would look as follows:

   site_no year doy_below doy_after
1 01388000 1996       111       317
2 01388000 1997       112       312
3 01432805 1996       137       315
4 01432805 1997       130       294

doy_after = the first row for after_peak == TRUE & max(run) when group_by(site_no,year)

doy_below = the first row for below_peak == TRUE & max(run) when group_by(site_no,year)

  • For site_no = 01388000 in year = 1996, the max(run) when below_peak == TRUE is 4. The first row whenrun = 4 and below_peak == TRUE corresponds with date 1996-04-20 which has a doy = 111.

CodePudding user response:

As the data is already grouped, just summarise by extracting the 'doy' where the run is max for the subset of run where the values are TRUE in 'below_peak' or 'after_peak' and get the first element of 'doy'

library(dplyr)
dat %>% 
 summarise(doy_below = first(doy[run == max(run[below_peak])]), 
           doy_above = first(doy[run == max(run[after_peak])]), .groups = 'drop')

-output

# A tibble: 4 × 4
  site_no   year doy_below doy_above
  <chr>    <dbl>     <dbl>     <dbl>
1 01388000  1996       111       317
2 01388000  1997       112       312
3 01432805  1996       137       315
4 01432805  1997       130       294
  • Related