I have a time series dataset and I am trying to calculate the maximum rate of change to estimate greenup date on NDVI data. My data is below:
date NDVI
1 2021-01-01 0.307
2 2021-01-17 0.291
3 2021-02-02 0.287
4 2021-02-18 0.247
5 2021-03-06 0.286
6 2021-03-22 0.268
7 2021-04-07 0.299
8 2021-04-23 0.295
9 2021-05-09 0.349
10 2021-05-25 0.402
11 2021-06-10 0.359
12 2021-06-26 0.432
13 2021-07-12 0.564
14 2021-07-28 0.654
15 2021-08-13 0.699
16 2021-08-29 0.614
17 2021-09-14 0.588
18 2021-09-30 0.553
19 2021-10-16 0.450
20 2021-11-01 0.377
21 2021-11-17 0.341
22 2021-12-03 0.331
23 2021-12-19 0.323
#I plot my dataset and fit a curve
p1 <- ggplot(data,aes(x = date, y = NDVI)) stat_smooth(method = "lm", formula = y ~ ns(x,3), color="blue") geom_point()
p1
I now want to be able to calculate the maximum rate of curve change to identify when the vegetation is starting to greenup (my guess based on the figure is sometime in May).
Thanks for your help.
CodePudding user response:
The maximum rate of increase in this model occurs at 9:13pm on 11th June 2021. A simple way to get this result is to create your model outside of ggplot:
library(splines)
mod <- lm(NDVI ~ ns(date, 3), data = data)
Now use this to predict NDVI for each day in the year:
dates <- seq(as.Date("2021-01-01"), as.Date("2021-12-31"), by = "day")
predictions <- predict(mod, newdata = list(date = dates))
The maximum rate of change occurs at the maximum difference between consecutive days' predictions:
dates[which.max(diff(predictions))]
# [1] "2021-06-11"
If you need the precision down to the hour and minute (which isn't justified based on the number of data points), you would do:
data$date <- as.POSIXct(data$date)
mod <- lm(NDVI ~ ns(date, 3), data = data)
dates <- seq(as.POSIXct("2021-01-01"), as.POSIXct("2021-12-31"), by = "min")
predictions <- predict(mod, newdata = list(date = dates))
dates[which.max(diff(predictions))]
#> [1] "2021-06-11 21:13:00 GMT"
CodePudding user response:
One possible approach could be to build the ggplot, extract the spline, and then calculate on its control points:
library(dplyr); library(ggplot2)
a <- mtcars %>%
ggplot(aes(drat, mpg))
stat_smooth(method = "lm", formula = y ~ splines::ns(x,3), color="blue")
geom_point()
b <- ggplot_build(a)
spline_x <- b[["data"]][[1]][["x"]]
spline_y <- b[["data"]][[1]][["y"]]
change_rate <- c(NA, diff(spline_y)/ diff(spline_x))
a
geom_line(data = data.frame(spline_x, change_rate),
aes(spline_x, change_rate), color = "red")
Then we could extract the value with max change_rate:
data.frame(spline_x, change_rate) %>%
filter(change_rate == max(change_rate, na.rm = TRUE))
spline_x change_rate
1 3.748861 10.68438
CodePudding user response:
The Savitsky-Golay filter (here using signal::sgolayfilt()
) will smooth your data and can optionally return the m-th derivative. So you can run that to get the 1st derivative and just pull the date at which that is at a maximum. You can tweak the length
of the filter to determine how smooth vs jagged you want it to be.
library(tidyverse)
library(signal)
d <- structure(list(date = structure(c(18628L, 18644L, 18660L, 18676L, 18692L, 18708L, 18724L, 18740L, 18756L, 18772L, 18788L, 18804L, 18820L, 18836L, 18852L, 18868L, 18884L, 18900L, 18916L, 18932L, 18948L, 18964L, 18980L), class = c("IDate", "Date")), NDVI = c(0.307, 0.291, 0.287, 0.247, 0.286, 0.268, 0.299, 0.295, 0.349, 0.402, 0.359, 0.432, 0.564, 0.654, 0.699, 0.614, 0.588, 0.553, 0.45, 0.377, 0.341, 0.331, 0.323)), row.names = c(NA, -23L), class = "data.frame")
# add sg smooth and derivative
d <- d %>%
mutate(date = as.Date(date)) %>%
mutate(sg = sgolayfilt(NDVI, n = 9)) %>%
mutate(dydx = sgolayfilt(NDVI, n = 9, m = 1))
# plot original data with smooth and derivative and annotate date of max slope
d %>%
ggplot(aes(date))
geom_point(aes(y = NDVI))
geom_line(aes(y = sg), color = "blue", size = 1)
geom_line(aes(y = dydx), color = "red", size = 1)
geom_vline(aes(xintercept = date[which.max(dydx)]), linetype = "dashed", size = 1.5)
annotate("text", x = min(d$date), y = max(d$NDVI), hjust = 0,
label = paste0("Max rate of change on: ", d$date[which.max(d$dydx)]))
Created on 2022-04-01 by the reprex package (v2.0.1)