I am aggregating selected column values of a data frame df, say
LoB cc AY AQ age inj_part Pay00 Pay01 Pay02 Pay03 Pay04 Pay05 Pay06 Pay07 Pay08 Pay09 Pay10 Pay11
1603528 1 1 1997 1 15 10 573 0 0 0 0 0 0 0 0 0 0 0
2135181 1 1 1999 1 15 10 439 0 0 0 0 0 0 0 0 0 0 0
3005060 1 1 2001 1 15 10 46 0 0 0 0 0 0 0 0 0 0 0
3140988 1 1 2001 1 15 10 349 0 0 0 0 0 0 0 0 0 0 0
4280242 1 1 2004 1 15 10 345 0 0 0 0 0 0 0 0 0 0 0
4992637 1 1 2005 1 15 10 811 0 0 0 0 0 0 0 0 0 0 0
through implementing
df[7:18] <- do.call("cbind", Reduce(` `, df[7:18], accumulate = TRUE))
so that
LoB cc AY AQ age inj_part Pay00 Pay01 Pay02 Pay03 Pay04 Pay05 Pay06 Pay07 Pay08 Pay09 Pay10 Pay11
1603528 1 1 1997 1 15 10 573 573 573 573 573 573 573 573 573 573 573 573
2135181 1 1 1999 1 15 10 439 439 439 439 439 439 439 439 439 439 439 439
3005060 1 1 2001 1 15 10 46 46 46 46 46 46 46 46 46 46 46 46
3140988 1 1 2001 1 15 10 349 349 349 349 349 349 349 349 349 349 349 349
4280242 1 1 2004 1 15 10 345 345 345 345 345 345 345 345 345 345 345 345
4992637 1 1 2005 1 15 10 811 811 811 811 811 811 811 811 811 811 811 811
I have several dfs like this (contained in list df_list) and I've been using lapply()
to apply other changes on each of them in the following format
result <- lapply(df_list, \(x) ...)
However, I am finding it hard to figure out how to only apply the changes on columns 7:18 of these dfs.
How do I do this?
CodePudding user response:
We can use the same code with x
being the individual dataset, subset the columns with Pay
prefix (in the updated data), use Reduce
with accumulate = TRUE
, assign it back, and return the data x
result <- lapply(df_list, \(x) {
nm <- grep("^Pay", names(x))
x[nm] <-do.call("cbind", Reduce(` `, x[nm], accumulate = TRUE))
x
})
-output
result
$`1.1.1.15.10`
LoB cc AY AQ age inj_part Pay00 Pay01 Pay02 Pay03 Pay04 Pay05 Pay06 Pay07 Pay08 Pay09 Pay10 Pay11
1603528 1 1 1997 1 15 10 573 573 573 573 573 573 573 573 573 573 573 573
2135181 1 1 1999 1 15 10 439 439 439 439 439 439 439 439 439 439 439 439
3005060 1 1 2001 1 15 10 46 46 46 46 46 46 46 46 46 46 46 46
3140988 1 1 2001 1 15 10 349 349 349 349 349 349 349 349 349 349 349 349
4280242 1 1 2004 1 15 10 345 345 345 345 345 345 345 345 345 345 345 345
4992637 1 1 2005 1 15 10 811 811 811 811 811 811 811 811 811 811 811 811
$`2.1.1.15.10`
LoB cc AY AQ age inj_part Pay00 Pay01 Pay02 Pay03 Pay04 Pay05 Pay06 Pay07 Pay08 Pay09 Pay10 Pay11
434863 2 1 1995 1 15 10 4046 4046 4046 4046 4046 4046 4046 4046 4046 4046 4046 4046
923365 2 1 1996 1 15 10 0 0 0 0 0 0 0 0 0 0 0 0
1225196 2 1 1996 1 15 10 0 0 0 0 0 0 0 0 0 0 0 0
4295570 2 1 2004 1 15 10 375 375 375 375 375 375 375 375 375 375 375 375
Or with tidyverse
library(purrr)
library(dplyr)
result <- map(df_list, ~ .x %>%
# or use pick in newer version
# mutate(pick(starts_with("Pay")) %>%
mutate(across(starts_with("Pay")) %>%
accumulate(` `) %>%
bind_cols))
-output
result
$`1.1.1.15.10`
LoB cc AY AQ age inj_part Pay00 Pay01 Pay02 Pay03 Pay04 Pay05 Pay06 Pay07 Pay08 Pay09 Pay10 Pay11
1603528 1 1 1997 1 15 10 573 573 573 573 573 573 573 573 573 573 573 573
2135181 1 1 1999 1 15 10 439 439 439 439 439 439 439 439 439 439 439 439
3005060 1 1 2001 1 15 10 46 46 46 46 46 46 46 46 46 46 46 46
3140988 1 1 2001 1 15 10 349 349 349 349 349 349 349 349 349 349 349 349
4280242 1 1 2004 1 15 10 345 345 345 345 345 345 345 345 345 345 345 345
4992637 1 1 2005 1 15 10 811 811 811 811 811 811 811 811 811 811 811 811
$`2.1.1.15.10`
LoB cc AY AQ age inj_part Pay00 Pay01 Pay02 Pay03 Pay04 Pay05 Pay06 Pay07 Pay08 Pay09 Pay10 Pay11
434863 2 1 1995 1 15 10 4046 4046 4046 4046 4046 4046 4046 4046 4046 4046 4046 4046
923365 2 1 1996 1 15 10 0 0 0 0 0 0 0 0 0 0 0 0
1225196 2 1 1996 1 15 10 0 0 0 0 0 0 0 0 0 0 0 0
4295570 2 1 2004 1 15 10 375 375 375 375 375 375 375 375 375 375 375 375