Home > Enterprise >  Calculating the cumulative sum of the columns in a dataframe with NAs
Calculating the cumulative sum of the columns in a dataframe with NAs

Time:08-25

I have a dataframe that has some NAs in it, and want to create a new set of columns with the cumulative sum of a subset of the original columns, with NAs being ignored. A minimal example follows:

x = data.frame(X1 = c(NA, NA, 1,2,3),
                X2 = 1:5)
> x
  X1 X2
1 NA  1
2 NA  2
3  1  3
4  2  4
5  3  5

If I now write

> cumsum(x)
  X1 X2
1 NA  1
2 NA  3
3 NA  6
4 NA 10
5 NA 15

I tried using ifelse

> cumsum(ifelse(is.na(x), 0, x))
Error: 'list' object cannot be coerced to type 'double'

but I have no difficulty working with one column at a time

> cumsum(ifelse(is.na(x$X1), 0, x$X1))
[1] 0 0 1 3 6

I suppose I could loop through the columns in my chosen subset, create a cumulative sum for each one, and then assign it to a new column in the dataframe, but this seems tedious, If I have a vector with the names of the columns whose cumulative sum I want to compute, how can I do so while ignoring the NAs (i,e, treating them as 0), and add the resulting set of cumulative sums to the dataframe with new names?

Sincerely

Thomas Philips

CodePudding user response:

We could do

library(dplyr)
x %>%
   mutate(across(everything(), 
   ~ replace(.x, complete.cases(.x), cumsum(.x[complete.cases(.x)]))))

-output

  X1 X2
1 NA  1
2 NA  3
3  1  6
4  3 10
5  6 15

Or more compactly with fcumsum from collapse

library(collapse)
fcumsum(x)
  X1 X2
1 NA  1
2 NA  3
3  1  6
4  3 10
5  6 15

Or using base R with replace

cumsum(replace(x, is.na(x), 0))
  X1 X2
1  0  1
2  0  3
3  1  6
4  3 10
5  6 15

CodePudding user response:

library(dplyr)
mutate(x, across(everything(), ~cumsum(coalesce(.x, 0))))

  X1 X2
1  0  1
2  0  3
3  1  6
4  3 10
5  6 15

Or

x[is.na(x)] <- 0
cumsum(x)

# but we lose the NA's
  X1 X2
1  0  1
2  0  3
3  1  6
4  3 10
5  6 15
  • Related