I have a dataframe with ~25 areas and ~45 variables across different timeframes. For each area I am wanting to create a rolling sum which I have been able to do for single columns with sum_run
df%>%
group_by(Area) %>%
mutate(Measure_A_sum = sum_run(`Measure A`, k =4, na_pad = T, lag = 0))
However, I want to do this to all the relevant columns (39:71), I have tried to use a for loop to loop through all the relevant columns, however I keep getting different errors, currently it is an unexpected = error.
for (col in 39:length(df)) {
test <- df %>%
group_by(Area) %>%
mutate(paste(df[col], ' - Rolling', sep = '') = sum_run(df[col], k = 4, na_pad = T, lag = 0))
}
Here is a cut of my data , any help would be greatly appreciated on setting up this for loop
structure(list(Area = c("A", "A",
"A", "A", "A",
"A", "A", "A",
"B", "B", "B", "B", "B",
"B", "B", "B", "C", "C", "C",
"C", "C", "C", "C", "C"), Quarter = c(2022.1,
2022.2, 2022.3, 2022.4, 2023.1, 2023.2, 2023.3, 2023.4, 2022.1,
2022.2, 2022.3, 2022.4, 2023.1, 2023.2, 2023.3, 2023.4, 2022.1,
2022.2, 2022.3, 2022.4, 2023.1, 2023.2, 2023.3, 2023.4), Year = c(2021L,
2021L, 2021L, 2022L, 2022L, 2022L, 2022L, 2023L, 2021L, 2021L,
2021L, 2022L, 2022L, 2022L, 2022L, 2023L, 2021L, 2021L, 2021L,
2022L, 2022L, 2022L, 2022L, 2023L), Measure A` = c(3457L,
3348L, 3279L, 3095L, 3155L, NA, NA, NA, 3416L, 3325L, 3248L,
3076L, 3187L, NA, NA, NA, 2815L, 1942L, 1960L, 1863L, 2033L,
NA, NA, NA), `Measure B` = c(NA,
NA, NA, NA, NA, NA, NA, NA, 168L, 230L, 221L, 253L, 318L, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), `Measure C` = c(374L,
408L, 404L, 497L, 455L, NA, NA, NA, 814L, 853L, 711L, 689L, 534L,
NA, NA, NA, 1430L, 1335L, 1551L, 1352L, 1512L, NA, NA, NA)), row.names = c(NA,
24L), class = "data.frame")
CodePudding user response:
You can use across
within mutate
to select relevant columns, and apply a function/operation to a set of columns (see e.g. ?across
for examples and details). There's no need for an explicit for
loop.
library(runner)
df %>%
mutate(across(starts_with("Measure"), sum_run, k = 4, na_pad = T, lag = 0))
# Area Quarter Year Measure A Measure B Measure C
#1 A 2022.1 2021 NA NA NA
#2 A 2022.2 2021 NA NA NA
#3 A 2022.3 2021 NA NA NA
#4 A 2022.4 2022 13179 NA 1683
#5 A 2023.1 2022 12877 NA 1764
#6 A 2023.2 2022 9529 NA 1356
#7 A 2023.3 2022 6250 NA 952
#8 A 2023.4 2023 3155 NA 455
#9 B 2022.1 2021 3416 168 814
#10 B 2022.2 2021 6741 398 1667
#11 B 2022.3 2021 9989 619 2378
#12 B 2022.4 2022 13065 872 3067
#13 B 2023.1 2022 12836 1022 2787
#14 B 2023.2 2022 9511 792 1934
#15 B 2023.3 2022 6263 571 1223
#16 B 2023.4 2023 3187 318 534
#17 C 2022.1 2021 2815 NA 1430
#18 C 2022.2 2021 4757 NA 2765
#19 C 2022.3 2021 6717 NA 4316
#20 C 2022.4 2022 8580 NA 5668
#21 C 2023.1 2022 7798 NA 5750
#22 C 2023.2 2022 5856 NA 4415
#23 C 2023.3 2022 3896 NA 2864
#24 C 2023.4 2023 2033 NA 1512
across
also accepts column indices, e.g. across(4:6, sum_run, ...)
applies sum_run
on columns 4-6.