Home > database >  How to average per row based on number of NAs columns
How to average per row based on number of NAs columns

Time:07-19

I am having the following dataset.

df <- data.frame(a = c(NA, NA, 1,5),
             b = c(NA, NA, 2, 3),
             c = c(NA, 5, NA, 5),
             d = c(3, NA, NA, NA))

I want to average per row all these variables. But I want to R to return NA when there is only one value per each row.

How am I going to do that?

The results should look like that:

a  b  c  d  average
NA NA NA 3    NA
NA NA 5  NA   NA
1  2  NA NA   1.5
5  3  5  NA   4,33

Thanks a lot

CodePudding user response:

Another option is first checking the index of rows which have at least more than 1 non-NA values and then calculate the rowMeans for these rows like this:

df <- data.frame(a = c(NA, NA, 1,5),
                 b = c(NA, NA, 2, 3),
                 c = c(NA, 5, NA, 5),
                 d = c(3, NA, NA, NA))

index <- rowSums(!is.na(df)) > 1 
df[index, "average"] <- rowMeans(df[index, ], na.rm = TRUE)
df
#>    a  b  c  d  average
#> 1 NA NA NA  3       NA
#> 2 NA NA  5 NA       NA
#> 3  1  2 NA NA 1.500000
#> 4  5  3  5 NA 4.333333

Created on 2022-07-18 by the reprex package (v2.0.1)

CodePudding user response:

Similar to SamR

require(tidyverse)

df <- tibble(a = c(NA, NA, 1,5),
                 b = c(NA, NA, 2, 3),
                 c = c(NA, 5, NA, 5),
                 d = c(3, NA, NA, NA))

df %>% 
  rowwise() %>% 
  mutate(average = ifelse(sum(!is.na(cur_data())) <= 1, 
                          NA, 
                          mean(c_across(where(is.numeric)), na.rm = TRUE)
         ))

# A tibble: 4 × 5
# Rowwise: 
      a     b     c     d average
  <dbl> <dbl> <dbl> <dbl>   <dbl>
1    NA    NA    NA     3   NA   
2    NA    NA     5    NA   NA   
3     1     2    NA    NA    1.5 
4     5     3     5    NA    4.33

With case_when()

df %>%  
  rowwise %>% 
  mutate(average = case_when(sum(!is.na(cur_data())) <= 1 ~ NA_real_, 
                             TRUE ~ mean(c_across(where(is.numeric)), na.rm = TRUE)))

Self-selecting columns

df %>%  
  rowwise %>% 
  mutate(average = case_when(sum(!is.na(cur_data())) <= 1 ~ NA_real_, 
                             TRUE ~ rowMeans(across(c(a, b)), na.rm = TRUE)
                             )
         )

df %>% 
  select(a, b) %>% 
  rowwise() %>% 
  mutate(average = ifelse(sum(!is.na(cur_data())) <= 1, 
                          NA, 
                          mean(c_across(where(is.numeric)), na.rm = TRUE)
  ))
     

CodePudding user response:

You can use ifelse() to set the value of average to NA where there is 1 or 0 values which are not NA, and otherwise to the rowMeans().

df$average  <- ifelse(
  rowSums(!is.na(df)) <=1, 
  NA,
  rowMeans(df, na.rm = T)
)

df
#    a  b  c  d  average
# 1 NA NA NA  3       NA
# 2 NA NA  5 NA       NA
# 3  1  2 NA NA 1.500000
# 4  5  3  5 NA 4.333333

CodePudding user response:

In tidyverse, we may create the condition in case_when

library(dplyr)
library(purrr)
df %>%
   mutate(average = case_when(across(everything(), complete.cases) %>% 
       reduce(` `) %>% 
       magrittr::is_greater_than(1) 
    ~ rowMeans(across(everything()), na.rm = TRUE)))

-output

   a  b  c  d  average
1 NA NA NA  3       NA
2 NA NA  5 NA       NA
3  1  2 NA NA 1.500000
4  5  3  5 NA 4.333333
  •  Tags:  
  • r
  • Related