Home > Software design >  Sum of specific columns using mutate and sum with NA values
Sum of specific columns using mutate and sum with NA values

Time:12-11

I have the following data frame Data:

a b c
1 1 2
2 NA 4
NA 3 NA
NA NA NA

I want to calculate the sum of columns a to c using dplyrs mutate function and sum to get the following:

a b c d
1 1 2 5
2 NA 4 6
NA 3 NA 3
NA NA NA NA

The important thing is for NAs to be treated like 0 basically except when they are all NA then it will return the sum as NA.

I think I can do this:

Data<-Data %>%
mutate(d=sum(a,b,c,na.rm=TRUE))

The issue is I dont want to list all the variables a b and c, but want to make use of the : functionality so that I can list the variables like this a:c.

I was thinking of writing the code like this:

Data<-Data %>%
mutate(d=sum(a:c,na.rm=TRUE))

but this doesn't work. I don't know if I should put a quotation around a and around c?

I swear I've done this million of times before, but I can't find previous code I've used this in and for some reason my brain froze today.

CodePudding user response:

Just keep it simple:

 df %>% 
    mutate(
        d = rowSums(across(everything()), na.rm =T),
        d = na_if(d, 0)
    )
# A tibble: 4 x 4
      a     b     c     d
  <int> <int> <int> <dbl>
1     1     1     2     4
2     2    NA     4     6
3    NA     3    NA     3
4    NA    NA    NA    NA

CodePudding user response:

You can change everything for any tidyverse selector like matches or starts_with

library(tidyverse)

df_example <- tibble::tribble(
  ~a, ~b, ~c,
  1L, 1L, 2L,
  2L, NA, 4L,
  NA, 3L, NA,
  NA, NA, NA
  )

df_example |> 
  rowwise() |> 
  mutate(d = c_across(everything()) |> sum(na.rm = TRUE)) |> 
  ungroup()
#> # A tibble: 4 x 4
#>       a     b     c     d
#>   <int> <int> <int> <int>
#> 1     1     1     2     4
#> 2     2    NA     4     6
#> 3    NA     3    NA     3
#> 4    NA    NA    NA     0

Created on 2021-12-10 by the reprex package (v2.0.1)

A less general solution is to use a rowwise function like rowSums

library(tidyverse)

df_example <- tibble::tribble(
  ~a, ~b, ~c,
  1L, 1L, 2L,
  2L, NA, 4L,
  NA, 3L, NA,
  NA, NA, NA
  )

df_example |> 
  mutate(d = across(everything()) |> rowSums(na.rm = TRUE))
#> # A tibble: 4 x 4
#>       a     b     c     d
#>   <int> <int> <int> <dbl>
#> 1     1     1     2     4
#> 2     2    NA     4     6
#> 3    NA     3    NA     3
#> 4    NA    NA    NA     0

Created on 2021-12-10 by the reprex package (v2.0.1)

CodePudding user response:

Like so:

df  %>%
  rowwise() %>%
  mutate(d=sum(across(a:c),na.rm=TRUE)) %>%
  ungroup()

Also rowSums could be an option

  • Related