Home > OS >  Combining two dummy variables into a new one
Combining two dummy variables into a new one

Time:09-17

I have 2 dummy variables

  1. physical_violence and
  2. sexual_violence.

I tried to combine them with the ifelse() function and the |-operator to create a dummy variable, which returns 1 if at least one violence has occured. The following approach outputs different results:

df <- mutate(df, physical_violence = iffelse(e03bidummy == 1 | e03cidummy == 1 |
e03didummy == 1 | e03eidummy == 1 | e03fidummy == 1 | 
e03gidummy == 1 | e03hidummy == 1 | e03iidummy == 1 | 
e03jidummy == 1, 1, 0)) 
df <- mutate(df, sexual_violence = ifelse(e04aidummy == 1 | 
e04bidummy == 1 | e04cidummy == 1 | e04didummy == 1, 1, 0))

The code for the dummy combining the two variables above:

df <- mutate(df, physical_sexual_violence = 
ifelse(physical_violence == 1 | sexual_violence == 1, 1, 0))

The results I got from the are: table(df$physical_sexual_violence): # 875 "yes", 26.614 "no"` This is contradictionary to:

  1. table(df$physical_violence): # 846 "yes" (3.07%) and 26.643 "no"
  2. table(df$sexual_violence) # 634 "yes" and 26.855 "no".

I expect 1.480 cases of violence.

Could anyone please help me identify what am I doing wrong?

CodePudding user response:

Whenever we have rowwise logical operations that can be simplified into a single TRUE/FALSE per row, we can use dplyr::if_any or dplyr::if_all.
-) First mutate(): if_any of the variables whose names matches the regex "e03[b-j]idummy", is .x==1, physical_violence will be TRUE(this evaluates to 1).
-) The seccond mutate uses a similar logic, with the other parameters you gave.
-) The third mutate will output 1 if_any of the other two new columns is 1.

dummy data

  e03bidummy e03cidummy e04aidummy e04bidummy
1          1          0          0          0
2          0          1          0          0
3          0          0          1          1
4          0          0          0          0

solution with dplyr

library(dplyr)

df %>% mutate(physical_violence =  if_any(matches("e03[b-j]idummy"), ~.x==1),
              sexual_violence =  if_any(matches("e04[a-d]idummy"), ~.x==1),
              physical_sexual_violence=  if_any(contains('violence')))

  e03bidummy e03cidummy e04aidummy e04bidummy physical_violence sexual_violence physical_sexual_violence
1          1          0          0          0                 1               0                        1
2          0          1          0          0                 1               0                        1
3          0          0          1          1                 0               1                        1
4          0          0          0          0                 0               0                        0

if all the dummy variables are strictly 0 or 1, the code can be further simplified, ommiting the .x==1 part, as logicals are implicitly coerced to 1/0 during sum operations:

df %>% mutate(physical_violence =  if_any(matches("e03[b-j]idummy")),
              sexual_violence =  if_any(matches("e04[a-d]idummy")),
              physical_sexual_violence=  if_any(contains('violence')))

CodePudding user response:

Does this help? Of course you need to adapt for your variable names.

Sample dataframe:

# just a synthetic sample dataframe
df <- data.frame(physical_violence = c(0, 0, 1, 0, 1), # assuming no NAs
                 sexual_violence = c(0, 1, 1, 1, 0)) # assuming no NAs 

for-loop if-else statement:

for(i in 1:nrow(df)){
  df$dummy[i] <- NA
  if(df$physical_violence[i]== 0 & df$sexual_violence[i]== 0) { 
    df$dummy[i] <- FALSE
  } else {
    df$dummy[i] <- TRUE
  }
}

Output:

df
#>   physical_violence sexual_violence dummy
#> 1                 0               0 FALSE
#> 2                 0               1  TRUE
#> 3                 1               1  TRUE
#> 4                 0               1  TRUE
#> 5                 1               0  TRUE

Created on 2021-09-13 by the reprex package (v2.0.1)

Note, this approach is neither the fastest nor the safest way, but the syntax is easy to understand for beginners. EDIT: If you need 0-1, just replace TRUE by 1 and FALSE by 0. (Do not forget to change df$dummy to a factor variable if needed.)

  • Related