I am having the following dataset.
df <- data.frame(a = c(NA, NA, 1,5),
b = c(NA, NA, 2, 3),
c = c(NA, 5, NA, 5),
d = c(3, NA, NA, NA))
I want to average per row all these variables. But I want to R to return NA when there is only one value per each row.
How am I going to do that?
The results should look like that:
a b c d average
NA NA NA 3 NA
NA NA 5 NA NA
1 2 NA NA 1.5
5 3 5 NA 4,33
Thanks a lot
CodePudding user response:
Another option is first checking the index of rows which have at least more than 1 non-NA values and then calculate the rowMeans
for these rows like this:
df <- data.frame(a = c(NA, NA, 1,5),
b = c(NA, NA, 2, 3),
c = c(NA, 5, NA, 5),
d = c(3, NA, NA, NA))
index <- rowSums(!is.na(df)) > 1
df[index, "average"] <- rowMeans(df[index, ], na.rm = TRUE)
df
#> a b c d average
#> 1 NA NA NA 3 NA
#> 2 NA NA 5 NA NA
#> 3 1 2 NA NA 1.500000
#> 4 5 3 5 NA 4.333333
Created on 2022-07-18 by the reprex package (v2.0.1)
CodePudding user response:
Similar to SamR
require(tidyverse)
df <- tibble(a = c(NA, NA, 1,5),
b = c(NA, NA, 2, 3),
c = c(NA, 5, NA, 5),
d = c(3, NA, NA, NA))
df %>%
rowwise() %>%
mutate(average = ifelse(sum(!is.na(cur_data())) <= 1,
NA,
mean(c_across(where(is.numeric)), na.rm = TRUE)
))
# A tibble: 4 × 5
# Rowwise:
a b c d average
<dbl> <dbl> <dbl> <dbl> <dbl>
1 NA NA NA 3 NA
2 NA NA 5 NA NA
3 1 2 NA NA 1.5
4 5 3 5 NA 4.33
With case_when()
df %>%
rowwise %>%
mutate(average = case_when(sum(!is.na(cur_data())) <= 1 ~ NA_real_,
TRUE ~ mean(c_across(where(is.numeric)), na.rm = TRUE)))
Self-selecting columns
df %>%
rowwise %>%
mutate(average = case_when(sum(!is.na(cur_data())) <= 1 ~ NA_real_,
TRUE ~ rowMeans(across(c(a, b)), na.rm = TRUE)
)
)
df %>%
select(a, b) %>%
rowwise() %>%
mutate(average = ifelse(sum(!is.na(cur_data())) <= 1,
NA,
mean(c_across(where(is.numeric)), na.rm = TRUE)
))
CodePudding user response:
You can use ifelse()
to set the value of average
to NA
where there is 1 or 0 values which are not NA
, and otherwise to the rowMeans()
.
df$average <- ifelse(
rowSums(!is.na(df)) <=1,
NA,
rowMeans(df, na.rm = T)
)
df
# a b c d average
# 1 NA NA NA 3 NA
# 2 NA NA 5 NA NA
# 3 1 2 NA NA 1.500000
# 4 5 3 5 NA 4.333333
CodePudding user response:
In tidyverse
, we may create the condition in case_when
library(dplyr)
library(purrr)
df %>%
mutate(average = case_when(across(everything(), complete.cases) %>%
reduce(` `) %>%
magrittr::is_greater_than(1)
~ rowMeans(across(everything()), na.rm = TRUE)))
-output
a b c d average
1 NA NA NA 3 NA
2 NA NA 5 NA NA
3 1 2 NA NA 1.500000
4 5 3 5 NA 4.333333