Cumulative count for a column using R-CodePudding

I got data like this

structure(list(id = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2), drug_1 = c(0, 
0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1), drug_2 = c(0, 1, 1, 1, 1, 0, 
1, 0, 0, 1, 0, 1)), class = "data.frame", row.names = c(NA, -12L
))

I would like to get the cumulative count of each column for each id and get the data like this

structure(list(id2 = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2), drug_1_b = c(0, 
0, 0, 0, 0, 1, 2, 0, 0, 1, 0, 2), drug_2_b = c(0, 1, 2, 3, 4, 
0, 5, 0, 0, 1, 0, 2)), class = "data.frame", row.names = c(NA, 
-12L))

CodePudding user response：

Base R,

ave(df$drug_2, df$id, FUN = function(z) ifelse(z == 0, z, cumsum(z)))
#  [1] 0 1 2 3 4 0 5 0 0 1 0 2

CodePudding user response：

Edit Simplified the solution after reading r2evans' approach.

You could use

library(dplyr)

df %>% 
  group_by(id) %>% 
  mutate(across(starts_with("drug"), 
                ~ifelse(.x == 0, 0, cumsum(.x)))) %>%
  ungroup()

This returns

# A tibble: 12 x 3
      id drug_1 drug_2
   <dbl>  <dbl>  <dbl>
 1     1      0      0
 2     1      0      1
 3     1      0      2
 4     1      0      3
 5     1      0      4
 6     1      1      0
 7     1      2      5
 8     2      0      0
 9     2      0      0
10     2      1      1
11     2      0      0
12     2      2      2

CodePudding user response：

You can get a cumulative sum with cumsum.

To split data.frame into subsets, you can use split and then lapply cumsum over the list of the data.frames and again over the list of the columns, or you can use the ave function which does exactly that:

data = structure(list(id = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2), drug_1 = c(0, 
0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1), drug_2 = c(0, 1, 1, 1, 1, 0, 
1, 0, 0, 1, 0, 1)), class = "data.frame", row.names = c(NA, -12L
))
data[-1] = ave(data[-1], data$id, FUN=cumsum)

edit: I assumed that the cumulative sum is requested (as per instructions) and that there is a mistake in the example data. If the example data is correct, then the condition is If the count is zero, don't do cumulative sum and leave at zero or ifelse(x == 0, 0, cumsum(x)) (as per @r2evans). However, this construct doesn't work when applied for the data.frame. A more complex helper function is required:

data[-1] = ave(data[-1], data$id, FUN=function(x){
    y = cumsum(x)
    y[x == 0] = 0
    y
    })

We can now compare it with the requested (renamed) data:

result = structure(list(id = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2), drug_1 = c(0, 
0, 0, 0, 0, 1, 2, 0, 0, 1, 0, 2), drug_2 = c(0, 1, 2, 3, 4, 
0, 5, 0, 0, 1, 0, 2)), class = "data.frame", row.names = c(NA, 
-12L))
identical(data, result)

CodePudding user response：

Base R solution:

# Resolve the names of vectors we want to cumulatively sum:
# drug_vec_names => character vector
drug_vec_names <- grep( "^drug\\_", colnames(df), value = TRUE)

# Resolve the names of vectors we want to keep: 
# not_drug_vec_names => character vector
not_drug_vec_names <- names(df)[!(names(df) %in% drug_vec_names)]

# Calculate the result: res => data.frame
res <- setNames(
  cbind(
    df[,not_drug_vec_names],
    replace(
      ave(
        df[,drug_vec_names], 
        df[,not_drug_vec_names],
        FUN = cumsum
      ),
      df[,drug_vec_names] == 0,
      0
    )
  ),
  c(not_drug_vec_names, drug_vec_names)
)

CodePudding user response：

If you have binary values (1/0) in drug columns, you can multiply the cumulative sum with itself to get 0 for 0 values.

library(dplyr)

df %>%
  group_by(id) %>%
  mutate(across(starts_with('drug'), ~cumsum(.) * .)) %>%
  ungroup

#      id drug_1 drug_2
#   <dbl>  <dbl>  <dbl>
# 1     1      0      0
# 2     1      0      1
# 3     1      0      2
# 4     1      0      3
# 5     1      0      4
# 6     1      1      0
# 7     1      2      5
# 8     2      0      0
# 9     2      0      0
#10     2      1      1
#11     2      0      0
#12     2      2      2