Home > Software design >  How to apply changes on selected columns for multiple data frames in R?
How to apply changes on selected columns for multiple data frames in R?

Time:12-30

I am aggregating selected column values of a data frame df, say

        LoB cc   AY AQ age inj_part Pay00 Pay01 Pay02 Pay03 Pay04 Pay05 Pay06 Pay07 Pay08 Pay09 Pay10 Pay11
1603528   1  1 1997  1  15       10   573     0     0     0     0     0     0     0     0     0     0     0
2135181   1  1 1999  1  15       10   439     0     0     0     0     0     0     0     0     0     0     0
3005060   1  1 2001  1  15       10    46     0     0     0     0     0     0     0     0     0     0     0
3140988   1  1 2001  1  15       10   349     0     0     0     0     0     0     0     0     0     0     0
4280242   1  1 2004  1  15       10   345     0     0     0     0     0     0     0     0     0     0     0
4992637   1  1 2005  1  15       10   811     0     0     0     0     0     0     0     0     0     0     0

through implementing

df[7:18] <- do.call("cbind", Reduce(` `, df[7:18], accumulate = TRUE))

so that

        LoB cc   AY AQ age inj_part Pay00 Pay01 Pay02 Pay03 Pay04 Pay05 Pay06 Pay07 Pay08 Pay09 Pay10 Pay11
1603528   1  1 1997  1  15       10   573   573   573   573   573   573   573   573   573   573   573   573
2135181   1  1 1999  1  15       10   439   439   439   439   439   439   439   439   439   439   439   439
3005060   1  1 2001  1  15       10    46    46    46    46    46    46    46    46    46    46    46    46
3140988   1  1 2001  1  15       10   349   349   349   349   349   349   349   349   349   349   349   349
4280242   1  1 2004  1  15       10   345   345   345   345   345   345   345   345   345   345   345   345
4992637   1  1 2005  1  15       10   811   811   811   811   811   811   811   811   811   811   811   811

I have several dfs like this (contained in list df_list) and I've been using lapply() to apply other changes on each of them in the following format

result <- lapply(df_list, \(x) ...)

However, I am finding it hard to figure out how to only apply the changes on columns 7:18 of these dfs.

How do I do this?

CodePudding user response:

We can use the same code with x being the individual dataset, subset the columns with Pay prefix (in the updated data), use Reduce with accumulate = TRUE, assign it back, and return the data x

result <- lapply(df_list, \(x)  {
           nm <- grep("^Pay", names(x))
           x[nm] <-do.call("cbind", Reduce(` `, x[nm], accumulate = TRUE))
           x
        })

-output

result
$`1.1.1.15.10`
        LoB cc   AY AQ age inj_part Pay00 Pay01 Pay02 Pay03 Pay04 Pay05 Pay06 Pay07 Pay08 Pay09 Pay10 Pay11
1603528   1  1 1997  1  15       10   573   573   573   573   573   573   573   573   573   573   573   573
2135181   1  1 1999  1  15       10   439   439   439   439   439   439   439   439   439   439   439   439
3005060   1  1 2001  1  15       10    46    46    46    46    46    46    46    46    46    46    46    46
3140988   1  1 2001  1  15       10   349   349   349   349   349   349   349   349   349   349   349   349
4280242   1  1 2004  1  15       10   345   345   345   345   345   345   345   345   345   345   345   345
4992637   1  1 2005  1  15       10   811   811   811   811   811   811   811   811   811   811   811   811

$`2.1.1.15.10`
        LoB cc   AY AQ age inj_part Pay00 Pay01 Pay02 Pay03 Pay04 Pay05 Pay06 Pay07 Pay08 Pay09 Pay10 Pay11
434863    2  1 1995  1  15       10  4046  4046  4046  4046  4046  4046  4046  4046  4046  4046  4046  4046
923365    2  1 1996  1  15       10     0     0     0     0     0     0     0     0     0     0     0     0
1225196   2  1 1996  1  15       10     0     0     0     0     0     0     0     0     0     0     0     0
4295570   2  1 2004  1  15       10   375   375   375   375   375   375   375   375   375   375   375   375

Or with tidyverse

library(purrr)
library(dplyr)
result <- map(df_list, ~ .x %>%
           # or use pick in newer version
           # mutate(pick(starts_with("Pay")) %>%
            mutate(across(starts_with("Pay")) %>%
                      accumulate(` `) %>%
                      bind_cols))

-output

result
$`1.1.1.15.10`
        LoB cc   AY AQ age inj_part Pay00 Pay01 Pay02 Pay03 Pay04 Pay05 Pay06 Pay07 Pay08 Pay09 Pay10 Pay11
1603528   1  1 1997  1  15       10   573   573   573   573   573   573   573   573   573   573   573   573
2135181   1  1 1999  1  15       10   439   439   439   439   439   439   439   439   439   439   439   439
3005060   1  1 2001  1  15       10    46    46    46    46    46    46    46    46    46    46    46    46
3140988   1  1 2001  1  15       10   349   349   349   349   349   349   349   349   349   349   349   349
4280242   1  1 2004  1  15       10   345   345   345   345   345   345   345   345   345   345   345   345
4992637   1  1 2005  1  15       10   811   811   811   811   811   811   811   811   811   811   811   811

$`2.1.1.15.10`
        LoB cc   AY AQ age inj_part Pay00 Pay01 Pay02 Pay03 Pay04 Pay05 Pay06 Pay07 Pay08 Pay09 Pay10 Pay11
434863    2  1 1995  1  15       10  4046  4046  4046  4046  4046  4046  4046  4046  4046  4046  4046  4046
923365    2  1 1996  1  15       10     0     0     0     0     0     0     0     0     0     0     0     0
1225196   2  1 1996  1  15       10     0     0     0     0     0     0     0     0     0     0     0     0
4295570   2  1 2004  1  15       10   375   375   375   375   375   375   375   375   375   375   375   375
  • Related