I have a dataframe that has some NAs in it, and want to create a new set of columns with the cumulative sum of a subset of the original columns, with NAs being ignored. A minimal example follows:
x = data.frame(X1 = c(NA, NA, 1,2,3),
X2 = 1:5)
> x
X1 X2
1 NA 1
2 NA 2
3 1 3
4 2 4
5 3 5
If I now write
> cumsum(x)
X1 X2
1 NA 1
2 NA 3
3 NA 6
4 NA 10
5 NA 15
I tried using ifelse
> cumsum(ifelse(is.na(x), 0, x))
Error: 'list' object cannot be coerced to type 'double'
but I have no difficulty working with one column at a time
> cumsum(ifelse(is.na(x$X1), 0, x$X1))
[1] 0 0 1 3 6
I suppose I could loop through the columns in my chosen subset, create a cumulative sum for each one, and then assign it to a new column in the dataframe, but this seems tedious, If I have a vector with the names of the columns whose cumulative sum I want to compute, how can I do so while ignoring the NAs (i,e, treating them as 0), and add the resulting set of cumulative sums to the dataframe with new names?
Sincerely
Thomas Philips
CodePudding user response:
We could do
library(dplyr)
x %>%
mutate(across(everything(),
~ replace(.x, complete.cases(.x), cumsum(.x[complete.cases(.x)]))))
-output
X1 X2
1 NA 1
2 NA 3
3 1 6
4 3 10
5 6 15
Or more compactly with fcumsum
from collapse
library(collapse)
fcumsum(x)
X1 X2
1 NA 1
2 NA 3
3 1 6
4 3 10
5 6 15
Or using base R
with replace
cumsum(replace(x, is.na(x), 0))
X1 X2
1 0 1
2 0 3
3 1 6
4 3 10
5 6 15
CodePudding user response:
library(dplyr)
mutate(x, across(everything(), ~cumsum(coalesce(.x, 0))))
X1 X2
1 0 1
2 0 3
3 1 6
4 3 10
5 6 15
Or
x[is.na(x)] <- 0
cumsum(x)
# but we lose the NA's
X1 X2
1 0 1
2 0 3
3 1 6
4 3 10
5 6 15