I have 2 dummy variables
physical_violence
andsexual_violence
.
I tried to combine them with the ifelse()
function and the |
-operator to create a dummy variable, which returns 1 if at least one violence has occured.
The following approach outputs different results:
df <- mutate(df, physical_violence = iffelse(e03bidummy == 1 | e03cidummy == 1 |
e03didummy == 1 | e03eidummy == 1 | e03fidummy == 1 |
e03gidummy == 1 | e03hidummy == 1 | e03iidummy == 1 |
e03jidummy == 1, 1, 0))
df <- mutate(df, sexual_violence = ifelse(e04aidummy == 1 |
e04bidummy == 1 | e04cidummy == 1 | e04didummy == 1, 1, 0))
The code for the dummy combining the two variables above:
df <- mutate(df, physical_sexual_violence =
ifelse(physical_violence == 1 | sexual_violence == 1, 1, 0))
The results I got from the are:
table(df$physical_sexual_violence)
: # 875 "yes", 26.614 "no"`
This is contradictionary to:
table(df$physical_violence)
: # 846 "yes" (3.07%) and 26.643 "no"table(df$sexual_violence)
# 634 "yes" and 26.855 "no".
I expect 1.480 cases of violence.
Could anyone please help me identify what am I doing wrong?
CodePudding user response:
Whenever we have rowwise logical operations that can be simplified into a single TRUE/FALSE per row, we can use dplyr::if_any
or dplyr::if_all
.
-) First mutate()
: if_any
of the variables whose names matches
the regex "e03[b-j]idummy"
, is .x==1
, physical_violence will be TRUE
(this evaluates to 1).
-) The seccond mutate
uses a similar logic, with the other parameters you gave.
-) The third mutate will output 1 if_any
of the other two new columns is 1.
dummy data
e03bidummy e03cidummy e04aidummy e04bidummy
1 1 0 0 0
2 0 1 0 0
3 0 0 1 1
4 0 0 0 0
solution with dplyr
library(dplyr)
df %>% mutate(physical_violence = if_any(matches("e03[b-j]idummy"), ~.x==1),
sexual_violence = if_any(matches("e04[a-d]idummy"), ~.x==1),
physical_sexual_violence= if_any(contains('violence')))
e03bidummy e03cidummy e04aidummy e04bidummy physical_violence sexual_violence physical_sexual_violence
1 1 0 0 0 1 0 1
2 0 1 0 0 1 0 1
3 0 0 1 1 0 1 1
4 0 0 0 0 0 0 0
if all the dummy variables are strictly 0 or 1, the code can be further simplified, ommiting the .x==1
part, as logicals are implicitly coerced to 1/0 during sum operations:
df %>% mutate(physical_violence = if_any(matches("e03[b-j]idummy")),
sexual_violence = if_any(matches("e04[a-d]idummy")),
physical_sexual_violence= if_any(contains('violence')))
CodePudding user response:
Does this help? Of course you need to adapt for your variable names.
Sample dataframe:
# just a synthetic sample dataframe
df <- data.frame(physical_violence = c(0, 0, 1, 0, 1), # assuming no NAs
sexual_violence = c(0, 1, 1, 1, 0)) # assuming no NAs
for-loop if-else statement:
for(i in 1:nrow(df)){
df$dummy[i] <- NA
if(df$physical_violence[i]== 0 & df$sexual_violence[i]== 0) {
df$dummy[i] <- FALSE
} else {
df$dummy[i] <- TRUE
}
}
Output:
df
#> physical_violence sexual_violence dummy
#> 1 0 0 FALSE
#> 2 0 1 TRUE
#> 3 1 1 TRUE
#> 4 0 1 TRUE
#> 5 1 0 TRUE
Created on 2021-09-13 by the reprex package (v2.0.1)
Note, this approach is neither the fastest nor the safest way, but the syntax is easy to understand for beginners.
EDIT: If you need 0-1, just replace TRUE
by 1 and FALSE
by 0. (Do not forget to change df$dummy
to a factor variable if needed.)