How to use dplyr to return the grouped sum of all numeric columns when there are NA values?-CodePudding

I'm was attempting to sum all numeric columns using dplyr's group_by and summarise functions as below. I didn't understand the error returned from the summarise function and cannot seem to find a similar example on stack overflow ... however after two members pointed out my error in making the example data I found that the code I had to prepared to provide a grouped summary sum report was correct!

    # Dummy data
    a <- c(1, NA, 1, NA, 1, 1)
    b <- c( NA, 1, NA, 1, NA, NA)
    c <- c( 1, 1, 1, NA, 1, 1)
    d <- c( 1, 1, 1, NA, 1, NA)
    e <- c( NA, 1, 1, NA, 1, 1)
    f <- c( 1, NA, 1, NA, 1, 1)
    
# Make a tibble
tmp <- bind_cols(a, b, c, d, e) 
names(tmp) <- c("A", "B", "C", "D", "E")

ID <- c("X", "X", "Y", "Y", "Z", "Z")

tmp <-bind_cols(ID, tmp)
names(tmp)[1] <- "ID"

    # Return a sum report
    tmp %>% 
      group_by(ID) %>% 
      summarise(across(everything(), ~ sum(.x, na.rm = TRUE)))

    # A tibble: 3 × 6
      ID        A     B     C     D     E
      <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
    1 X         1     1     2     2     1
    2 Y         1     1     1     1     1
    3 Z         2     0     2     1     2

CodePudding user response：

It's best to avoid defining a vector with different data types because R will convert the vector to a single data type.

I think you might want to create your data like this:

tmp = tibble(
         ID = c('X', 'X', 'Y', 'Y', 'Z', 'Z'),
         A = c(1, NA, 1, 1, NA, 1),
         B = c(NA, 1, 1, 1, 1, NA),
         C = c(1, NA, 1, 1, 1, 1),
         D = c(NA, 1, NA, NA, NA, NA),
         E = c(1, NA, 1, 1, 1, 1))

And then do:

tmp %>%
  group_by(ID) %>% 
  summarise(across(everything(), ~ sum(.x, na.rm = TRUE)))

To get:

# A tibble: 3 x 6
  ID        A     B     C     D     E
  <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 X         1     1     1     1     1
2 Y         2     2     2     0     2
3 Z         1     1     2     0     2