Home > database >  Row-wise cumulative sum over some columns, followed by row-wise division
Row-wise cumulative sum over some columns, followed by row-wise division

Time:08-09

I have the following data.frame:

New_Sign_ups   Spend   2021-05  2021-06      2021-07  MRR_sum  time
100             100       0       10         100       110     1.5
50              200       10      10          40        60     1.8

First, I need to do a row-wise cumulative sum for columns 3 ~ 5:

New_Sign_ups   Spend   2021-05  2021-06      2021-07  MRR_sum  time
100             100       0       10         110       110     1.5
50              200       10      20          60        60     1.8

Then I need to divide the cumulative sum by the Spend value of that row:

New_Sign_ups   Spend   2021-05  2021-06      2021-07  MRR_sum  time
100             100       0       0.1         1.1       110     1.5
50              200       0.05    0.1         0.3        60     1.8

How can I do this?

CodePudding user response:

dat <- structure(list(New_Sign_ups = c(100L, 50L), Spend = c(100L, 200L
), `2021-05` = c(0L, 10L), `2021-06` = c(10L, 10L), `2021-07` = c(100L, 
40L), MRR_sum = c(110L, 60L), time = c(1.5, 1.8)), class = "data.frame", 
row.names = c(NA, -2L))

Without using any packages

dat[3:5] <- do.call("cbind", Reduce(` `, dat[3:5], accumulate = TRUE)) / dat$Spend
#  New_Sign_ups Spend 2021-05 2021-06 2021-07 MRR_sum time
#1          100   100    0.00     0.1     1.1     110  1.5
#2           50   200    0.05     0.1     0.3      60  1.8

Using package matrixStats

library(matrixStats)
dat[3:5] <- rowCumsums(as.matrix(dat[3:5])) / dat$Spend
#  New_Sign_ups Spend 2021-05 2021-06 2021-07 MRR_sum time
#1          100   100    0.00     0.1     1.1     110  1.5
#2           50   200    0.05     0.1     0.3      60  1.8

CodePudding user response:

Here is also a tidyverse solution with mostly purrr package functions. Thank you Zheyuan Li for the data:

library(dplyr)
library(purrr)

dat %>%
  mutate(pmap_df(dat, ~ {
    accumulate(c(...)[str_detect(names(dat), '\\d ')], ` `) / c(...)['Spend' == names(dat)]}))

  New_Sign_ups Spend 2021-05 2021-06 2021-07 MRR_sum time
1          100   100    0.00     0.1     1.1     110  1.5
2           50   200    0.05     0.1     0.3      60  1.8

CodePudding user response:

  • We can use rowwise paired with c_across from dplyr library , then update the values of the data.frame
library(dplyr)

df <- cohort_table3 |> rowwise() |> 
      mutate(ans = list(cumsum(c_across(3:5))/Spend))

cohort_table3[3:5] <- do.call(rbind , df$ans)
  • output
 New_Sign_ups Spend X2021.05 X2021.06 X2021.07 MRR_sum time
1          100   100     0.00      0.1      1.1     110  1.5
2           50   200     0.05      0.1      0.3      60  1.8
  • Related