Calculating the cumulative sum of the columns in a dataframe with NAs-CodePudding

I have a dataframe that has some NAs in it, and want to create a new set of columns with the cumulative sum of a subset of the original columns, with NAs being ignored. A minimal example follows:

x = data.frame(X1 = c(NA, NA, 1,2,3),
                X2 = 1:5)
> x
  X1 X2
1 NA  1
2 NA  2
3  1  3
4  2  4
5  3  5

If I now write

> cumsum(x)
  X1 X2
1 NA  1
2 NA  3
3 NA  6
4 NA 10
5 NA 15

I tried using ifelse

> cumsum(ifelse(is.na(x), 0, x))
Error: 'list' object cannot be coerced to type 'double'

but I have no difficulty working with one column at a time

> cumsum(ifelse(is.na(x$X1), 0, x$X1))
[1] 0 0 1 3 6

I suppose I could loop through the columns in my chosen subset, create a cumulative sum for each one, and then assign it to a new column in the dataframe, but this seems tedious, If I have a vector with the names of the columns whose cumulative sum I want to compute, how can I do so while ignoring the NAs (i,e, treating them as 0), and add the resulting set of cumulative sums to the dataframe with new names?

Sincerely

Thomas Philips

CodePudding user response：

We could do

library(dplyr)
x %>%
   mutate(across(everything(), 
   ~ replace(.x, complete.cases(.x), cumsum(.x[complete.cases(.x)]))))

-output

Or more compactly with fcumsum from collapse

library(collapse)
fcumsum(x)
  X1 X2
1 NA  1
2 NA  3
3  1  6
4  3 10
5  6 15

Or using base R with replace

cumsum(replace(x, is.na(x), 0))
  X1 X2
1  0  1
2  0  3
3  1  6
4  3 10
5  6 15

CodePudding user response：

library(dplyr)
mutate(x, across(everything(), ~cumsum(coalesce(.x, 0))))

  X1 X2
1  0  1
2  0  3
3  1  6
4  3 10
5  6 15

x[is.na(x)] <- 0
cumsum(x)

# but we lose the NA's
  X1 X2
1  0  1
2  0  3
3  1  6
4  3 10
5  6 15