I have a particular data set that includes two variables. One is numeric and the other is a character that identifies the season and year that the numeric data came from. This is what the head of the data looks like:
SeasonYear mean
<chr> <dbl>
1 winter2000 0.957
2 spring2000 0.943
3 summer2000 1.03
4 fall2000 0.981
5 winter2001 1.06
6 spring2001 1.05
7 summer2001 1.02
8 fall2001 1.03
9 winter2002 1.02
10 spring2002 1.05
Now I am looking to pull a lag of this data but only for the previous spring so that my data looks something like this:
SeasonYear mean lag
<chr> <dbl> <dbl>
1 winter2000 0.957 NA
2 spring2000 0.943 NA
3 summer2000 1.03 0.943
4 fall2000 0.981 0.943
5 winter2001 1.06 0.943
6 spring2001 1.05 0.943
7 summer2001 1.02 1.05
8 fall2001 1.03 1.05
9 winter2002 1.02 1.05
10 spring2002 1.05 1.05
I am also looking to go back 2 springs as well so that my data looks something like this:
SeasonYear mean lag
<chr> <dbl> <dbl>
1 winter2000 0.957 NA
2 spring2000 0.943 NA
3 summer2000 1.03 NA
4 fall2000 0.981 NA
5 winter2001 1.06 NA
6 spring2001 1.05 NA
7 summer2001 1.02 0.943
8 fall2001 1.03 0.943
9 winter2002 1.02 0.943
10 spring2002 1.05 0.943
I know I can use the lag()
function to get previous data in the dataframe but I am looking for a way to specify a function that pulls a particular type of lag like the one I mentioned.
CodePudding user response:
One option to achieve your desired result may look like so:
- Split your SeasonYear into season and year
- Add a column with the value for the spring in each year
- Get the n-th lag taking into account that for fall and summer its the (n-1)-th lag
library(tidyr)
library(dplyr)
lag_spring <- function(x, y, n = 1) {
data.frame(x = x, season_year = y) %>%
tidyr::extract(season_year, into = c("season", "year"), regex = "^(. ?)(\\d{4})$") %>%
group_by(year) %>%
mutate(springmean = x[season == "spring"]) %>%
ungroup() %>%
group_by(season) %>%
mutate(lag = ifelse(!season %in% c("summer", "fall"), lag(springmean, n = n), lag(springmean, n = n - 1))) %>%
ungroup() %>%
pull(lag)
}
dd %>%
mutate(lag = lag_spring(mean, SeasonYear))
#> SeasonYear mean lag
#> 1 winter2000 0.957 NA
#> 2 spring2000 0.943 NA
#> 3 summer2000 1.030 0.943
#> 4 fall2000 0.981 0.943
#> 5 winter2001 1.060 0.943
#> 6 spring2001 1.050 0.943
#> 7 summer2001 1.020 1.050
#> 8 fall2001 1.030 1.050
#> 9 winter2002 1.020 1.050
#> 10 spring2002 1.050 1.050
dd %>%
mutate(lag = lag_spring(mean, SeasonYear, n = 2))
#> SeasonYear mean lag
#> 1 winter2000 0.957 NA
#> 2 spring2000 0.943 NA
#> 3 summer2000 1.030 NA
#> 4 fall2000 0.981 NA
#> 5 winter2001 1.060 NA
#> 6 spring2001 1.050 NA
#> 7 summer2001 1.020 0.943
#> 8 fall2001 1.030 0.943
#> 9 winter2002 1.020 0.943
#> 10 spring2002 1.050 0.943
DATA
dd <- structure(list(SeasonYear = c(
"winter2000", "spring2000", "summer2000",
"fall2000", "winter2001", "spring2001", "summer2001", "fall2001",
"winter2002", "spring2002"
), mean = c(
0.957, 0.943, 1.03, 0.981,
1.06, 1.05, 1.02, 1.03, 1.02, 1.05
)), class = "data.frame", row.names = c(
"1",
"2", "3", "4", "5", "6", "7", "8", "9", "10"
))