Assuming I worked with meteorological observations and wanted to know not only the daily maximum value but also the relevant timestamp, describing when this value was observed, is it possible to accomplish this without significant overhead, e.g. by setting some sort of parameter?
library(xts)
set.seed(42)
# init data
datetimes <- seq(from = as.POSIXct("2022-01-01"),
to = as.POSIXct("2022-01-08"),
by = "10 mins")
values <- length(datetimes) |> runif() |> sin()
data <- xts(x = values,
order.by = datetimes)
#
ep <- endpoints(data, "days")
data_max <- period.apply(data, INDEX = ep, FUN = max)
head(data_max)
#> [,1]
#> 2022-01-01 23:50:00 0.8354174
#> 2022-01-02 23:50:00 0.8396034
#> 2022-01-03 23:50:00 0.8364624
#> 2022-01-04 23:50:00 0.8376930
#> 2022-01-05 23:50:00 0.8392988
#> 2022-01-06 23:50:00 0.8372780
Obviously, this would not work with summarizing functions like mean
and median
where you would want to specify the interval width considered, but when working with e.g. min
and max
, how would I proceed when I wanted to know the exact index of the value in question observed.
At the moment, I'm just looping over my xts subsets to determine the relevant index, but maybe there is a more elegant approach, maybe even an argument when using period.apply()
I haven't noticed to get this information.
sub <- "2022-01-04"
ind <- which(data[sub] == max(data[sub]))
data[sub][ind]
#> [,1]
#> 2022-01-04 23:20:00 0.837693
My desired output would look like this:
#> [,1]
#> 2022-01-01 03:40:00 0.8354174
#> 2022-01-02 15:00:00 0.8396034
#> 2022-01-03 05:10:00 0.8364624
#> 2022-01-04 23:20:00 0.8376930
#> 2022-01-05 02:50:00 0.8392988
#> 2022-01-06 06:40:00 0.8372780
Thanks a lot in advance!
CodePudding user response:
1) Use ave
and subset that down to the rows with the maxima. The last line takes the first maximum in each day and could be omitted if there are no duplicate maxima within a day such as for a strictly increasing input or if all maxima were desired. The format
is to avoid time zone problems.
library(magrittr)
library(xts)
data %>%
subset(., ave(., as.Date(format(time)), FUN = max) == .) %>%
aggregate(., as.Date(format(time(.))), head, 1)
## [,1]
## 2022-01-01 03:40:00 0.8354174
## 2022-01-02 15:00:00 0.8396034
## 2022-01-03 05:10:00 0.8364624
## 2022-01-04 23:20:00 0.8376930
## 2022-01-05 02:50:00 0.8392988
## 2022-01-06 06:40:00 0.8372780
## 2022-01-07 08:40:00 0.8406546
## 2022-01-08 00:00:00 0.2335385
2) Another possibility is to aggregate using tapply with which.max to get the time of the first maximum in each date and then subset the data to those times.
data %>%
subset(time %in% tapply(time, as.Date(format(time)), \(x) x[which.max(.[x])] ))
## [,1]
## 2022-01-01 03:40:00 0.8354174
## 2022-01-02 15:00:00 0.8396034
## 2022-01-03 05:10:00 0.8364624
## 2022-01-04 23:20:00 0.8376930
## 2022-01-05 02:50:00 0.8392988
## 2022-01-06 06:40:00 0.8372780
## 2022-01-07 08:40:00 0.8406546
## 2022-01-08 00:00:00 0.2335385
CodePudding user response:
Using your data and inserting some NAs
set.seed(42)
datetimes <- seq(from = as.POSIXct('2022-01-01'),
to = as.POSIXct('2022-01-09'), by = '10 mins')
values <- length(datetimes) |> runif() |> sin()
ep <- endpoints(data, 'days')
values[which(1:length(values) %% 10 == 0)] <- NA
data <- xts(x = values,
order.by = datetimes)
data_max <- period.apply(data, INDEX=ep, FUN = max, na.rm = TRUE)
> data[which(data %in% data_max == TRUE)]
[,1]
2022-01-01 03:40:00 0.8354174
2022-01-02 15:00:00 0.8396034
2022-01-03 05:10:00 0.8364624
2022-01-04 23:20:00 0.8376930
2022-01-05 02:50:00 0.8392988
2022-01-06 06:40:00 0.8372780
2022-01-07 08:40:00 0.8406546
2022-01-08 06:40:00 0.8411353
2022-01-09 00:00:00 0.8247520
Seems like period.apply
is the way to go, and index of 'moment' can be found. And helps to use the same time period 01
-> 09
, not 10
.