Home > other >  creating dummy variable from ordinal data in r
creating dummy variable from ordinal data in r

Time:04-19

I have an ordinal variable with the following categories

very favorable (1) somewhat favorable (2) somewhat unfavorable (3) very unfavorable (4) don't know (8) refuse to answer (9)

I want my output binary variable to display:

favorable (1) unfavorable (0)

I want to do that by grouping together "very favorable" and "somewhat favorable" to the new "favorable" outcome coded in "1" and also group together "very unfavorable" and "somewhat favorable" to new outcome "unfavorable coded as "0".

So basically I want to turn "1" = "1" and "2" = "1" then "3" = "0" and "4" = "0"

CodePudding user response:

Lots of ways to do this, easiest way I can think of is making use of some %in%.

e.g, in base R:

data$column_to_recode = as.character(data$column_to_recode) #failing to do this may result in R coercing existing factors to numeric integers representing ranks
data$column_to_recode[which(data$column_to_recode %in% c(1,2))] = 1
data$column_to_recode[which(data$column_to_recode %in% c(3,4))] = 0
data$column_to_recode[which(!(data$column_to_recode %in% c(1:4)))] = NA #or whatever else you want to do with the values that aren't 1 through 4`

Then if you really want bonus points you could coerce this back into a factor variable, but I find this is usually excessive.

data$column_to_recode = factor(data$column_to_recode,levels=c(0,1),ordered = TRUE)

I couldn't tell from your original question if the numeric codes were fine or if you wanted to use character strings instead, but the same logic applies, e.g:

data$column_to_recode[which(data$column_to_recode %in% c("(1) somewhat favorable","(2) somewhat unfavorable"))] = "Favorable"

should get you what you need.

CodePudding user response:

Here's a dplyr solution with case_when() which is really useful for creating dummies.

My starting data is as follows:

  # A tibble: 6 x 2
      participant category              
            <int> <chr>                 
    1           1 somewhat favorable (2)
    2           2 very unfavorable (4)  
    3           3 very favorable (1)    
    4           4 don't know (8)        
    5           5 very favorable (1)    
    6           6 somewhat favorable (2)

So, basically, when it detects 1 or 2, it will convert the row value into "favorable (1)" and 3 or 4 into "unfavorable (0)"

data %>%  
  mutate(category = case_when(
    str_detect(category, "(1)|(2)") ~ "favorable (1)", 
    str_detect(category, "(3)|(4)") ~ "unfavorable (0)"))

Since (8) and (9) s not specified, the code returns them as NAs. Final dataset is as follows:

# A tibble: 10 x 2
   participant category       
         <int> <chr>          
 1           1 favorable (1)  
 2           2 unfavorable (0)
 3           3 favorable (1)  
 4           4 NA             
 5           5 favorable (1)  
 6           6 favorable (1)  
 7           7 unfavorable (0)
 8           8 unfavorable (0)
 9           9 favorable (1)  
10          10 unfavorable (0)
  • Related