I have created a data frame;
city_name Date temp hum
London 2022-01-01 0 10
London 2022-01-02 7 8
London 2022-01-03 3 6
London 2022-01-04 7 7
London 2022-01-05 10 8
Paris 2022-01-01 11 3
Paris 2022-01-02 NA 2
Paris 2022-01-03 9 NA
Paris 2022-01-04 10 7
Paris 2022-01-05 11 9
what I am trying to do is lag temp and hum by 1 and 2 by city_name and put NA to lagged values if the Value is NA in temp or hum.
city_name Date temp hum temp_lag1 temp_lag2 hum_lag1 hum_lag2
London 2022-01-01 0 10 NA NA NA NA
London 2022-01-02 7 8 0 NA 10 NA
London 2022-01-03 3 6 7 0 8 10
London 2022-01-04 7 7 3 7 6 8
London 2022-01-05 10 8 7 3 7 6
Paris 2022-01-01 11 3 NA NA NA NA
Paris 2022-01-02 NA 2 11 NA 3 NA
Paris 2022-01-03 9 NA NA 11 2 3
Paris 2022-01-04 10 7 9 NA NA 2
Paris 2022-01-05 11 9 10 9 7 NA
What I have tried doing is;
df <- df %>%
group_by(city_name) %>%
dplyr::mutate(temp_lag1 = dplyr::lag(temp, n = 1, default=NA)) %>%
as.data.frame()
df <- df %>%
group_by(city_name)
dplyr::mutate(temp_lag2 = dplyr::lag(temp, n = 2, default=NA)) %>%
as.data.frame()
df <- df %>%
group_by(city_name) %>%
dplyr::mutate(hum_lag1 = dplyr::lag(hum, n = 1, default=NA)) %>%
as.data.frame()
df <- df %>%
group_by(city_name) %>%
dplyr::mutate(rltv_hum_lag2 = dplyr::lag(rltv_hum, n = 2, default=NA)) %>%
as.data.frame()
it gives an error of: Error in dplyr::lag(temp, n = 1, default = NA) : object 'temp' not found
CodePudding user response:
Instead of doing this in multiple steps, use across
to loop over the columns and return a list
of columns
library(dplyr)
df <- df %>%
group_by(City_Name) %>%
mutate(across(c(temp, hum), list(lag1 = ~lag(.x, n = 1),
lag2 = ~ lag(.x, n = 2)))) %>%
ungroup
-output
# A tibble: 10 × 8
City_Name Date temp hum temp_lag1 temp_lag2 hum_lag1 hum_lag2
<chr> <chr> <int> <int> <int> <int> <int> <int>
1 London 2022-01-01 0 10 NA NA NA NA
2 London 2022-01-02 7 8 0 NA 10 NA
3 London 2022-01-03 3 6 7 0 8 10
4 London 2022-01-04 7 7 3 7 6 8
5 London 2022-01-05 10 8 7 3 7 6
6 Paris 2022-01-01 11 3 NA NA NA NA
7 Paris 2022-01-02 NA 2 11 NA 3 NA
8 Paris 2022-01-03 9 NA NA 11 2 3
9 Paris 2022-01-04 10 7 9 NA NA 2
10 Paris 2022-01-05 11 9 10 9 7 NA
data
df <- structure(list(City_Name = c("London", "London", "London", "London",
"London", "Paris", "Paris", "Paris", "Paris", "Paris"), Date = c("2022-01-01",
"2022-01-02", "2022-01-03", "2022-01-04", "2022-01-05", "2022-01-01",
"2022-01-02", "2022-01-03", "2022-01-04", "2022-01-05"), temp = c(0L,
7L, 3L, 7L, 10L, 11L, NA, 9L, 10L, 11L), hum = c(10L, 8L, 6L,
7L, 8L, 3L, 2L, NA, 7L, 9L)), class = "data.frame", row.names = c(NA,
-10L))