Is it possible to get the specific index of occurence when working with period.apply?-CodePudding

Assuming I worked with meteorological observations and wanted to know not only the daily maximum value but also the relevant timestamp, describing when this value was observed, is it possible to accomplish this without significant overhead, e.g. by setting some sort of parameter?

library(xts)

set.seed(42)

# init data
datetimes <- seq(from = as.POSIXct("2022-01-01"), 
                 to = as.POSIXct("2022-01-08"),
                 by = "10 mins")

values <- length(datetimes) |> runif() |> sin()

data <- xts(x = values,
            order.by = datetimes)

# 
ep <- endpoints(data, "days")

data_max <- period.apply(data, INDEX = ep, FUN = max)

head(data_max)
#>                          [,1]
#> 2022-01-01 23:50:00 0.8354174
#> 2022-01-02 23:50:00 0.8396034
#> 2022-01-03 23:50:00 0.8364624
#> 2022-01-04 23:50:00 0.8376930
#> 2022-01-05 23:50:00 0.8392988
#> 2022-01-06 23:50:00 0.8372780

Obviously, this would not work with summarizing functions like mean and median where you would want to specify the interval width considered, but when working with e.g. min and max, how would I proceed when I wanted to know the exact index of the value in question observed.

At the moment, I'm just looping over my xts subsets to determine the relevant index, but maybe there is a more elegant approach, maybe even an argument when using period.apply() I haven't noticed to get this information.

sub <- "2022-01-04"

ind <- which(data[sub] == max(data[sub]))

data[sub][ind]
#>                         [,1]
#> 2022-01-04 23:20:00 0.837693

My desired output would look like this:

#>                          [,1]
#> 2022-01-01 03:40:00 0.8354174
#> 2022-01-02 15:00:00 0.8396034
#> 2022-01-03 05:10:00 0.8364624
#> 2022-01-04 23:20:00 0.8376930
#> 2022-01-05 02:50:00 0.8392988
#> 2022-01-06 06:40:00 0.8372780

Thanks a lot in advance!

CodePudding user response：

1) Use ave and subset that down to the rows with the maxima. The last line takes the first maximum in each day and could be omitted if there are no duplicate maxima within a day such as for a strictly increasing input or if all maxima were desired. The format is to avoid time zone problems.

library(magrittr)
library(xts)

data %>%
  subset(., ave(., as.Date(format(time)), FUN = max) == .) %>%
  aggregate(., as.Date(format(time(.))), head, 1)

##                          [,1]
## 2022-01-01 03:40:00 0.8354174
## 2022-01-02 15:00:00 0.8396034
## 2022-01-03 05:10:00 0.8364624
## 2022-01-04 23:20:00 0.8376930
## 2022-01-05 02:50:00 0.8392988
## 2022-01-06 06:40:00 0.8372780
## 2022-01-07 08:40:00 0.8406546
## 2022-01-08 00:00:00 0.2335385

2) Another possibility is to aggregate using tapply with which.max to get the time of the first maximum in each date and then subset the data to those times.

data %>%
  subset(time %in% tapply(time, as.Date(format(time)), \(x) x[which.max(.[x])] ))

##                          [,1]
## 2022-01-01 03:40:00 0.8354174
## 2022-01-02 15:00:00 0.8396034
## 2022-01-03 05:10:00 0.8364624
## 2022-01-04 23:20:00 0.8376930
## 2022-01-05 02:50:00 0.8392988
## 2022-01-06 06:40:00 0.8372780
## 2022-01-07 08:40:00 0.8406546
## 2022-01-08 00:00:00 0.2335385

CodePudding user response：

Using your data and inserting some NAs

set.seed(42)
datetimes <- seq(from = as.POSIXct('2022-01-01'),
to = as.POSIXct('2022-01-09'), by = '10 mins')


values <- length(datetimes) |> runif() |> sin()
ep <- endpoints(data, 'days')

values[which(1:length(values) %% 10 == 0)] <- NA
data <- xts(x = values,
   order.by = datetimes)

data_max <- period.apply(data, INDEX=ep, FUN = max, na.rm = TRUE)
> data[which(data %in% data_max == TRUE)]
                         [,1]
2022-01-01 03:40:00 0.8354174
2022-01-02 15:00:00 0.8396034
2022-01-03 05:10:00 0.8364624
2022-01-04 23:20:00 0.8376930
2022-01-05 02:50:00 0.8392988
2022-01-06 06:40:00 0.8372780
2022-01-07 08:40:00 0.8406546
2022-01-08 06:40:00 0.8411353
2022-01-09 00:00:00 0.8247520

Seems like period.apply is the way to go, and index of 'moment' can be found. And helps to use the same time period 01 -> 09, not 10.