I am a seldom user of R and have some very basic coding skills. I believe I am completely misunderstanding how to solve this problem. Assuming following data
data <- tibble(a = letters[1:4], b1 = 1:4, b2 = 4:1,
b3 = c(NA, 2, NA,4), b4 = c(NA, 1, 3, 6))
data <- data %>%
mutate(average = rowMeans(select(., starts_with("b")), na.rm = TRUE),
NA_count = rowSums(is.na(select(., starts_with("b"))))
)
data
# A tibble: 4 × 7
a b1 b2 b3 b4 average NA_count
<chr> <int> <int> <dbl> <dbl> <dbl> <dbl>
1 a 1 4 NA NA 2.5 2
2 b 2 3 2 1 2 0
3 c 3 2 NA 3 2.67 1
4 d 4 1 4 6 3.75 0
I am trying to replace NA
with the value in average
only if there is one NA
on the line (I defined NA_count
as helper column). I've tried the below code, but I think I am misunderstanding how to use it, as I get multiple errors.
Also, I am trying to find a solution using dplyr and/or the tidyverse package.
data %>%
mutate(across(starts_with("b"), ~ if_else(NA_count == 1, average, .)))
Thanks for your help !
CodePudding user response:
It is due to the column type difference as b1
, b2
are integer
type and average
is numeric
. We may need to use as.numeric
on if_else
as if_else
checks for the type for each of the yes, no output and it needs to be the same (same for case_when
, but it can work for ifelse
as there is type coersion). In addition, the test condition needs to also check for the NA location with is.na
library(dplyr)
data %>%
mutate(across(starts_with("b"), ~ if_else(NA_count ==1 & is.na(.x),
average, as.numeric(.x))))
-output
# A tibble: 4 × 7
a b1 b2 b3 b4 average NA_count
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a 1 4 NA NA 2.5 2
2 b 2 3 2 1 2 0
3 c 3 2 2.67 3 2.67 1
4 d 4 1 4 6 3.75 0