I'm using the mutate
and case_when
function to create a new variable with the result dependent on whether certain conditions are met.
The following code works until it gets to the NA values.
Running the code is successful, as in it does not produce any error or warning messages, and when I inspect the dataset it has worked where the conditions == 1 or 2, but it has not worked for the NA values.
I have tried putting NA in "" but this doesn't help
mutate(Variable1 = case_when
(VariableA == 1 & VariableB ==1 ~ result.y,
VariableA == 1 & VariableB ==2 ~ result.y,
VariableA == 2 & VariableB ==1 ~ result.y,
VariableA == 2 & VariableB ==2 ~ result.x,
VariableA == 2 & VariableB ==NA ~ result.x,
VariableA == NA & VariableB ==1 ~ result.y,
VariableA == NA & VariableB ==2 ~ result.y,
VariableA == NA & VariableB ==NA ~ result.x,
))
VariableA and VariableB have values of 1, 2 and NA result.y and result.x are variables already in the dataset
First time posting, and fairly new to using R so apologies if I've not included all the information required. Thanks in advance for any suggestions
Sample of Data
VariableA VariableB result.x result.y
<dbl lbl> <dbl lbl> <chr> <chr>
1 2 valid postcode valid postcode
1 2 valid postcode valid postcode
1 NA valid postcode NA
1 NA partial postcode NA
1 2 valid postcode valid postcode
1 1 valid postcode valid postcode
NA NA valid postcode NA
2 2 partial postcode NA
2 NA valid postcode NA
2 1 valid postcode valid postcode
Output from
> dput(head(Test2))
structure(list(VariableA = structure(c(NA, NA, 2, NA, 2, 2), label = "[VariableA]--Have you moved in to your current house since June of this year?", format.spss = "F1.0", labels = c(Yes = 1,
No = 2), class = c("haven_labelled", "vctrs_vctr", "double")),
VariableB = structure(c(NA, NA, 2, NA, 2, NA), label = "[VariableB]-- Have you moved into your current house since January 2021?", format.spss = "F1.0", labels = c(Yes = 1,
No = 2), class = c("haven_labelled", "vctrs_vctr", "double"
)), result.x = structure(c("vaild postcode", "vaild postcode",
"vaild postcode", "vaild postcode", "vaild postcode", "vaild postcode"
), label = "[category]--", format.spss = "A16", display_width = 16L),
result.y = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
CodePudding user response:
According to the example provided, VariableA
and VariableB
are of class dbl lbl
which cannot be used in a logical with numeric values.
You need to extract the dbl
information form VariableA and VariableB with zap_labels()
from haven
package. Additionally, it is best practice to use is.na()
as recommended in comments.
install.packages("haven")
library(haven)
Test2 %>% zap_labels() %>% mutate(Variable1 = case_when
(VariableA == 1 & VariableB ==1 ~ result.y,
VariableA == 1 & VariableB ==2 ~ result.y,
VariableA == 2 & VariableB ==1 ~ result.y,
VariableA == 2 & VariableB ==2 ~ result.x,
VariableA == 2 & is.na(VariableB) ~ result.x,
is.na(VariableA) & VariableB ==1 ~ result.y,
is.na(VariableA) & VariableB ==2 ~ result.y,
is.na(VariableA) & is.na(VariableB) ~ result.x,
))
# A tibble: 6 × 5
VariableA VariableB result.x result.y Variable1
<dbl> <dbl> <chr> <chr> <chr>
1 NA NA vaild postcode NA vaild postcode
2 NA NA vaild postcode NA vaild postcode
3 2 2 vaild postcode NA vaild postcode
4 NA NA vaild postcode NA vaild postcode
5 2 2 vaild postcode NA vaild postcode
6 2 NA vaild postcode NA vaild postcode