Trying to make a catch all variable - so if a respondent answers "yes" to at least one of the 9 yes/no variables, then they will be placed into the "yes" category in the overall variable.
I've done this by:
overallvariable <- ifelse(df$v1 == "yes" | df$v2 == "yes" | df$v3 == "yes" | df$v4 == "yes" |df$v5 == "yes" | df$v6 == "yes" | df$v7 == "yes" | df$v8 == "yes" | df$v9 == "yes", "yes", "no")
However the table(overallvariable)
comes up with:
no |
---|
## |
instead of
yes | no |
---|---|
### | ## |
Thank you for your help!
Note: everything seems to work until I add v9 Note: Just played around with where v9 goes in, it doesn't seem to be a problem attached to the variable as it produces the output I needed. So it seems to be an issue with adding a ninth condition.
CodePudding user response:
here is a data.table
approach, and also an instriction how to create some sample data ;-)
sample data
set.seed(123)
mydata <- data.frame(id = 1:15,
v1 = sample(c("yes", "no"), 15, replace = TRUE),
v2 = sample(c("yes", "no"), 15, replace = TRUE),
v3 = sample(c("yes", "no"), 15, replace = TRUE),
v4 = sample(c("yes", "no"), 15, replace = TRUE))
code
library(data.table)
# convert to data.table formast
setDT(mydata)
# columns to look in
cols <- grep("v[1-4]", names(mydata), value = TRUE)
# initialise overallvariable to "no"
mydata[, overallvariable := "no"]
# if 1 or more columns in cols have the value "yes", set overallvariable to "yes"
mydata[ rowSums(mydata[, ..cols] == "yes", na.rm = TRUE) >= 1,
overallvariable := "yes"]
output
# id v1 v2 v3 v4 overallvariable
# 1: 1 yes yes yes yes yes
# 2: 2 yes no no yes yes
# 3: 3 yes yes yes no yes
# 4: 4 no yes no yes yes
# 5: 5 yes yes no yes yes
# 6: 6 no yes yes no yes
# 7: 7 no no yes yes yes
# 8: 8 no yes yes yes yes
# 9: 9 yes yes yes yes yes
#10: 10 yes yes no yes yes
#11: 11 no yes yes no yes
#12: 12 no no no no no
#13: 13 no no no yes yes
#14: 14 yes yes yes no yes
#15: 15 no no yes yes yes
CodePudding user response:
Base R:
df$overallvariable <- c('no','yes')[1 (rowSums(df == "yes") > 0)]
data:
df <- structure(list(V1 = c("no", "no", "no", "no", "no", "no", "no",
"no", "no", "no"), V2 = c("yes", "yes", "yes", "no", "no", "no",
"yes", "no", "yes", "yes"), V3 = c("yes", "yes", "yes", "no",
"no", "no", "yes", "no", "yes", "yes"), V4 = c("no", "no", "no",
"no", "no", "no", "no", "no", "no", "no"), V5 = c("yes", "yes",
"yes", "no", "no", "no", "yes", "no", "yes", "yes"), V6 = c("no",
"no", "no", "no", "no", "no", "no", "no", "no", "no"), V7 = c("no",
"no", "no", "no", "no", "no", "no", "no", "no", "no"), V8 = c("yes",
"yes", "yes", "no", "no", "no", "yes", "no", "yes", "yes"), V9 = c("no",
"no", "no", "no", "no", "no", "no", "no", "no", "no"), V10 = c("no",
"no", "no", "no", "no", "no", "no", "no", "no", "no")), class = "data.frame", row.names = c(NA,
-10L))
CodePudding user response:
The dplyr package has the perfect function for the desired transformation: if_any
.
library(dplyr)
df %>% mutate(overallvariable = if_any(V1:V10, ~ .x=='yes') %>% ifelse('yes', 'no'))
We can also use purrr::reduce
library(purrr)
library(dplyr)
df %>% mutate(overallvariable = reduce(across(V1:V10, ~.x=='yes'), `|`) %>% ifelse('yes', 'no'))
output using the data from @TarJae:
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 overallvariable
1 no yes yes no yes no no yes no no yes
2 no yes yes no yes no no yes no no yes
3 no yes yes no yes no no yes no no yes
4 no no no no no no no no no no no
5 no no no no no no no no no no no
6 no no no no no no no no no no no
7 no yes yes no yes no no yes no no yes
8 no no no no no no no no no no no
9 no yes yes no yes no no yes no no yes
10 no yes yes no yes no no yes no no yes