Home > Enterprise >  Replacing leading NAs by group with 0s, but Keep other NAs
Replacing leading NAs by group with 0s, but Keep other NAs

Time:10-11

I have a COVID data frame grouped by state with 60 columns. As the COVID started at different times across states, therefore there are NAs before values for different states. Different indicators (column9) also have data starting differently. Below is a sample df I made for the demonstration.

state <- c(rep("A", 6), rep("B", 6))
time <- c(1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6)
x1 <- c(NA, NA, NA, 4, 5, 6, NA, NA, 3, 4, 5, NA)
x2 <- c(NA, 2, 3, NA, 5, 6, NA, NA, NA, 4, 5, 6)
x3 <- c(NA, NA, 3, 4, 5, NA, NA, 2, NA, 4, 5, 6)
df <- data.frame(state, time, x1, x2, x3)
df

   state time x1 x2 x3
1      A    1 NA NA NA
2      A    2 NA  2 NA
3      A    3 NA  3  3
4      A    4  4 NA  4
5      A    5  5  5  5
6      A    6  6  6 NA
7      B    1 NA NA NA
8      B    2 NA NA  2
9      B    3  3 NA NA
10     B    4  4  4  4
11     B    5  5  5  5
12     B    6 NA  6  6

I'm trying to replace all the leading NAs with 0 for each state, but keep other NAs. The results should look like below:

   state time x1 x2 x3
1      A    1  0  0  0
2      A    2  0  2  0
3      A    3  0  3  3
4      A    4  4 NA  4
5      A    5  5  5  5
6      A    6  6  6 NA
7      B    1  0  0  0
8      B    2  0  0  2
9      B    3  3  0 NA
10     B    4  4  4  4
11     B    5  5  5  5
12     B    6 NA  6  6

One solution I came up with is to replace NAs by the condition of the cumulative sums, as below:

df1 <- df %>% 
  group_by(state) %>% 
  mutate(
    check.sum1 = cumsum(replace_na(x1, 0)),
    x1 = if_else(check.sum1 != 0, x1, 0),
    check.sum2 = cumsum(replace_na(x2, 0)),
    x2 = if_else(check.sum2 != 0, x2, 0),
    check.sum3 = cumsum(replace_na(x3, 0)),
    x3 = if_else(check.sum3 != 0, x3, 0)
  )
df1

This method worked fine. But since there are 60 columns, I want to wrap it up with a function and/or use apply(). But it gives out error messages:

df2 <- df %>% 
  group_by(state) %>% 
  apply(
    df[3:5], MARGIN = 2, FUN = function(x) mutate(
      check.sum = cumsum(replace_na(x, 0)),
      x = if_else(check.sum != 0, x, 0)
    ) 
  )

Error in FUN(newX[, i], ...) : unused argument (df[3:5])

#or
func <- function(x) {
  mutate(
    check.sum = cumsum(replace_na(x, 0)),
    x = if_else(check.sum != 0, x, 0)
  )
}

df3 <- df %>% 
  group_by(state) %>% 
  apply(
    df[3:5], MARGIN = 2, func
  )

Error in match.fun(FUN) : 
  'df[3:5]' is not a function, character or symbol

So there are three specific questions:

  1. How to create the user-defined functions by using columns as arguments.
  2. How to use apply() function. and
  3. Are there any other ways of using exiting functions, such as na.locf() or na.trim() to do the job?

Thank you!

CodePudding user response:

Using by and looking where a column is.na and NA is not repeated, i.e. boolean differences are smaller or equal to zero.

do.call(rbind, by(df, df$state, \(x) {
  x[] <- lapply(x, \(z) {z[is.na(z) & c(0, diff(is.na(z))) <= 0] <- 0; z})
  return(x)
}))
#      state time x1 x2 x3
# A.1      A    1  0  0  0
# A.2      A    2  0  2  0
# A.3      A    3  0  3  3
# A.4      A    4  4 NA  4
# A.5      A    5  5  5  5
# A.6      A    6  6  6 NA
# B.7      B    1  0  0  0
# B.8      B    2  0  0  2
# B.9      B    3  3  0 NA
# B.10     B    4  4  4  4
# B.11     B    5  5  5  5
# B.12     B    6 NA  6  6

Note: Please use update R>=4.1 for \(x) function shorthand notation or write function(x).

CodePudding user response:

Using dplyr, we can do

library(dplyr)
df %>%
    group_by(state) %>% 
    mutate(across(starts_with('x'), ~ replace(., !cumsum(!is.na(.)), 0))) %>% 
    ungroup
# A tibble: 12 × 5
   state  time    x1    x2    x3
   <chr> <dbl> <dbl> <dbl> <dbl>
 1 A         1     0     0     0
 2 A         2     0     2     0
 3 A         3     0     3     3
 4 A         4     4    NA     4
 5 A         5     5     5     5
 6 A         6     6     6    NA
 7 B         1     0     0     0
 8 B         2     0     0     2
 9 B         3     3     0    NA
10 B         4     4     4     4
11 B         5     5     5     5
12 B         6    NA     6     6
  • Related