Home > Blockchain >  For loop with sum_run
For loop with sum_run

Time:10-17

I have a dataframe with ~25 areas and ~45 variables across different timeframes. For each area I am wanting to create a rolling sum which I have been able to do for single columns with sum_run

df%>%
  group_by(Area) %>%
  mutate(Measure_A_sum = sum_run(`Measure A`, k =4, na_pad = T, lag = 0))

However, I want to do this to all the relevant columns (39:71), I have tried to use a for loop to loop through all the relevant columns, however I keep getting different errors, currently it is an unexpected = error.

for (col in 39:length(df)) {
  test <- df %>%
  group_by(Area) %>%
  mutate(paste(df[col], ' - Rolling', sep = '') = sum_run(df[col], k = 4, na_pad = T, lag = 0))

}

Here is a cut of my data , any help would be greatly appreciated on setting up this for loop

structure(list(Area = c("A", "A", 
"A", "A", "A", 
"A", "A", "A", 
"B", "B", "B", "B", "B", 
"B", "B", "B", "C", "C", "C", 
"C", "C", "C", "C", "C"), Quarter = c(2022.1, 
2022.2, 2022.3, 2022.4, 2023.1, 2023.2, 2023.3, 2023.4, 2022.1, 
2022.2, 2022.3, 2022.4, 2023.1, 2023.2, 2023.3, 2023.4, 2022.1, 
2022.2, 2022.3, 2022.4, 2023.1, 2023.2, 2023.3, 2023.4), Year = c(2021L, 
2021L, 2021L, 2022L, 2022L, 2022L, 2022L, 2023L, 2021L, 2021L, 
2021L, 2022L, 2022L, 2022L, 2022L, 2023L, 2021L, 2021L, 2021L, 
2022L, 2022L, 2022L, 2022L, 2023L), Measure A` = c(3457L, 
3348L, 3279L, 3095L, 3155L, NA, NA, NA, 3416L, 3325L, 3248L, 
3076L, 3187L, NA, NA, NA, 2815L, 1942L, 1960L, 1863L, 2033L, 
NA, NA, NA), `Measure B` = c(NA, 
NA, NA, NA, NA, NA, NA, NA, 168L, 230L, 221L, 253L, 318L, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), `Measure C` = c(374L, 
408L, 404L, 497L, 455L, NA, NA, NA, 814L, 853L, 711L, 689L, 534L, 
NA, NA, NA, 1430L, 1335L, 1551L, 1352L, 1512L, NA, NA, NA)), row.names = c(NA, 
24L), class = "data.frame")

CodePudding user response:

You can use across within mutate to select relevant columns, and apply a function/operation to a set of columns (see e.g. ?across for examples and details). There's no need for an explicit for loop.

library(runner)
df %>%
    mutate(across(starts_with("Measure"), sum_run, k = 4, na_pad = T, lag = 0))
#   Area Quarter Year Measure A Measure B Measure C
#1     A  2022.1 2021        NA        NA        NA
#2     A  2022.2 2021        NA        NA        NA
#3     A  2022.3 2021        NA        NA        NA
#4     A  2022.4 2022     13179        NA      1683
#5     A  2023.1 2022     12877        NA      1764
#6     A  2023.2 2022      9529        NA      1356
#7     A  2023.3 2022      6250        NA       952
#8     A  2023.4 2023      3155        NA       455
#9     B  2022.1 2021      3416       168       814
#10    B  2022.2 2021      6741       398      1667
#11    B  2022.3 2021      9989       619      2378
#12    B  2022.4 2022     13065       872      3067
#13    B  2023.1 2022     12836      1022      2787
#14    B  2023.2 2022      9511       792      1934
#15    B  2023.3 2022      6263       571      1223
#16    B  2023.4 2023      3187       318       534
#17    C  2022.1 2021      2815        NA      1430
#18    C  2022.2 2021      4757        NA      2765
#19    C  2022.3 2021      6717        NA      4316
#20    C  2022.4 2022      8580        NA      5668
#21    C  2023.1 2022      7798        NA      5750
#22    C  2023.2 2022      5856        NA      4415
#23    C  2023.3 2022      3896        NA      2864
#24    C  2023.4 2023      2033        NA      1512

across also accepts column indices, e.g. across(4:6, sum_run, ...) applies sum_run on columns 4-6.

  • Related