Home > Blockchain >  In R, if there are multiple TRUE answers, select the first TRUE answer
In R, if there are multiple TRUE answers, select the first TRUE answer

Time:10-28

I have a dataframe that represents a two-year daily time series of temperature for one river. For this river, I would like to know what day of year (doy):

  1. temperature is sustained greater than or equal to 10 degrees
  • sustained is when there are no more dips below 10 until after the peak temperature of the year, such as during autumn or winter
  1. temperature is sustained less than or equal to 10 degrees
  • sustained is when there are no more peaks above 10 until the following year

I run into errors when I try to calculate 2 because there are multiple TRUE answers for the code to choose from. I would like to know how I can make the code go with the first TRUE answer if there are multiple TRUE answers.

Example Dataset

library(ggplot2)
library(lubridate)
library(dplyr)
library(dataRetrieval)

siteNumber <- "01417500"
parameterCd <- "00010" # water temperature
statCd <- "00003" # mean
startDate <- "2015-01-01"
endDate <- "2016-12-31"

dat <- readNWISdv(siteNumber, parameterCd, startDate, endDate, statCd=statCd)
dat <- dat[,c(2:4)]
colnames(dat)[3] <- "temperature"

# Visually inspect the time series
ggplot(data = dat, aes(x = Date, y = temperature))  
  geom_point()  
  theme_bw()

Code for 1 & 2 where 2 is having issues because there are multiple TRUE statements to choose from

dat %>%
  mutate(year = year(Date),
         doy = yday(Date)) %>%
  group_by(year) %>%
  mutate(gt_10 = temperature >= 10, # greater than or equal to 10 degrees
         lt_10 = temperature <= 10, # less than or equal to 10 degrees
         peak_doy = doy[which.max(temperature)], # what doy is max temperature
         below_peak = doy < peak_doy, # is the observed doy less than the peak temperature doy
         after_peak = doy > peak_doy, # is the observed doy greater than the peak temperature doy
         test_above = ave(gt_10, cumsum(!gt_10), FUN = cumsum), # counts number of days above 10 degree threshold
         test_below = ave(lt_10, cumsum(!lt_10), FUN = cumsum)) %>% # counts number of days below 10 degree threshold
  summarise(first_above_10_sustained = doy[below_peak == T & test_above == 14]-13, # answer to 1 
            first_below_10_sustained = doy[after_peak == T & test_below == 14]-13) # answer to 2
  • To answer 2, the code is looking at those times when temperature is after the annual peak temperature (i.e., after_peak == T) and when temperature has been below the 10 threshold for 14 consecutive days (i.e., test_below == 14). The test_below == 14 is where the error lies because there are multiple times when this occurs. Yes, you could change the threshold of consecutive days to some value > 14 but that is beside the point. How can I get the code to accept the first TRUE answer is there are multiple TRUE answers?

I have a similar SO question here but my answer only works when there is not multiple TRUE answers to choose from.

CodePudding user response:

There are a couple of tricks I would employ here:

  1. Create a column that specifies whether the temperature is above or below 10 degrees.
  2. Get the rleid of this column, which will group all of the consecutive days above or below the threshold of 10 degrees.
  3. Find the maximum temperature of each year, and store it in a column.
  4. The rleid which contains that maximum temperature will be the dates where temperature is sustained > 10 degrees for that year according to your definition
  5. Do the same thing with minimum temperatures, but subtract 6 months when calculating the year to group by when calculating the minimum for that year. This will not make a difference to the end result but allows a calculation of the minimum temperature in the winter period:
df <- dat %>%
  mutate(year = year(Date)) %>%
  group_by(year) %>%
  mutate(max_temp = max(temperature)) %>%
  ungroup() %>%
  mutate(above_ten = temperature >= 10,
         run = factor(data.table::rleid(above_ten))) %>%
  group_by(run) %>%
  mutate(sustained_hi = max(temperature) == max(max_temp)) %>%
  ungroup() %>%
  mutate(year = year(Date - months(6))) %>%
  group_by(year) %>%
  mutate(min_temp = min(temperature)) %>%
  group_by(run) %>%
  mutate(sustained_lo = min(temperature) == min(min_temp)) %>%
  mutate(group = ifelse(sustained_hi, 'High', 
                        ifelse(sustained_lo, 'Low', 
                               'Unsustained'))) %>%
  select(site_no, Date, temperature, group, run)

This results in:

df
#> # A tibble: 731 x 5
#> # Groups:   run [27]
#>    site_no  Date       temperature group run  
#>    <chr>    <date>           <dbl> <chr> <fct>
#>  1 01417500 2015-01-01         0.7 Low   1    
#>  2 01417500 2015-01-02         1.1 Low   1    
#>  3 01417500 2015-01-03         1   Low   1    
#>  4 01417500 2015-01-04         2.5 Low   1    
#>  5 01417500 2015-01-05         2   Low   1    
#>  6 01417500 2015-01-06         0.3 Low   1    
#>  7 01417500 2015-01-07         0.2 Low   1    
#>  8 01417500 2015-01-08         0.2 Low   1    
#>  9 01417500 2015-01-09         0.3 Low   1    
#> 10 01417500 2015-01-10         0.3 Low   1    
#> # ... with 721 more rows
#> # i Use `print(n = ...)` to see more rows

And we can see the result by plotting like this:

ggplot(df, aes(x = Date, y = temperature, color = group))  
  geom_point()  
  scale_color_manual(limits = c('High', 'Unsustained', 'Low'),
                     values = c('orange', 'gray', 'steelblue'))  
  theme_bw()

enter image description here

And we can get a nice little data frame of the start and end dates of our sustained high / low temperatures by doing:

df %>% 
  filter(group != 'Unsustained') %>%
  group_by(run) %>%
  summarize(Date = c(first(Date), last(Date)),
            Event = paste('Sustained', first(group), c('Start', 'End'))) %>%
  ungroup() %>%
  select(-run)
#> # A tibble: 10 x 2
#>    Date       Event               
#>    <date>     <chr>               
#>  1 2015-01-01 Sustained Low Start 
#>  2 2015-04-28 Sustained Low End   
#>  3 2015-04-29 Sustained High Start
#>  4 2015-07-16 Sustained High End  
#>  5 2015-11-08 Sustained Low Start 
#>  6 2016-03-31 Sustained Low End   
#>  7 2016-05-18 Sustained High Start
#>  8 2016-10-09 Sustained High End  
#>  9 2016-10-23 Sustained Low Start 
#> 10 2016-12-31 Sustained Low End 
  •  Tags:  
  • r
  • Related