I have a dataframe that has many columns describing the number of a particular asset that a household owns. I would like set all missing values NA
equal to 0 conditional on there being at least one non NA
value in at least one other asset_
column. For example, in the dataframe:
dat <- data.frame(asset_1 = c(NA, 2, NA), asset_2 = c(1,3,NA), asset_3 = c(NA, NA, NA))
asset_1 | asset_2 | asset_3 |
---|---|---|
NA | 3 | NA |
0 | 1 | NA |
NA | NA | NA |
I would to achieve:
asset_1 | asset_2 | asset_3 |
---|---|---|
0 | 3 | 0 |
0 | 1 | 0 |
NA | NA | NA |
Is there any nice way to do this with Dplyr? Any more efficient alternatives in any other package are also welcome.
My guess would be to use mutate
in combination with across
and if_else
but I can't figure out a way to consider all columns except the one that is being mutated.
CodePudding user response:
Please try
library(dplyr)
dat <- data.frame(asset_1 = c(NA, 2, NA), asset_2 = c(1,3,NA), asset_3 = c(NA, NA, NA))
dat2 <- dat %>% mutate(new=coalesce(asset_1,asset_2,asset_3), across(starts_with('asset'), ~ ifelse(!is.na(new) & is.na(.x), 0, .x))) %>% dplyr::select(-new)
CodePudding user response:
library(tidyverse)
dat %>%
mutate(helper = apply(across(everything()), 1, function(x) sum(is.na(x))),
across(-helper, ~ifelse(helper != 3 & is.na(.), 0, .)))
asset_1 asset_2 asset_3 helper
1 0 1 0 2
2 2 3 0 1
3 NA NA NA 3