Home > front end >  How to convert an ordinal variable to a binary variable
How to convert an ordinal variable to a binary variable

Time:11-09

I have a question about binary variable. As you can see in the photo, I have an ordinal variable consisting of 4 categories, on this variable, I need to apply classification algorithms in machine learning. How can I make this variable a binary variable, can you help me write the necessary codes in R ?

str(belonging) dbl lbl [1:2993] 1, 1, 1, 4, 1, 3, 2, 2, 3, 3, 2, 2, 3, 1, 2, 2, 1, 2, 4, 1, 3, 2, 1, 2, 1, 4, 1, 2, 1, 1, 3, 1, 3,... @ label : chr "GEN\AGREE\BELONG AT SCHOOL" @ format.spss: chr "F1.0" @ labels : Named num [1:5] 1 2 3 4 9 ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...

Levels of the variable : "agree a lot", "agree a little", "disagree a lot", "disagree a little", I want to make these into "agree" and "disagree", and then I want to label the agree to 1 and the disagree level to 0.

CodePudding user response:

Sample data

# A tibble: 15 x 2
      id result           
   <int> <chr>            
 1     1 agree a little   
 2     2 agree a lot      
 3     3 agree a lot      
 4     4 agree a lot      
 5     5 agree a lot      
 6     6 agree a lot      
 7     7 disagree alot    
 8     8 agree a little   
 9     9 disagree alot    
10    10 disagree alot    
11    11 disagree alot    
12    12 agree a lot      
13    13 agree a little   
14    14 agree a lot      
15    15 disagree a little

When disagree is detected result is changed to 0

library(tidyverse)

df %>%  
  mutate(result = case_when(str_detect(result, "disagree") ~ 0, 
                            TRUE ~ 1) %>% 
           as_factor)

# A tibble: 15 x 2
      id result
   <int> <fct> 
 1     1 1     
 2     2 1     
 3     3 1     
 4     4 1     
 5     5 1     
 6     6 1     
 7     7 0     
 8     8 1     
 9     9 0     
10    10 0     
11    11 0     
12    12 1     
13    13 1     
14    14 1     
15    15 0     

CodePudding user response:

Thank you very much for your quick response. I mean actually, how could I convert an ordinal variable of 4 categories into a binary variable? I want to group the tag agree a little and agree a lot as "agree". I am copying the structure of the variable here. In this case, should I use the code you mentioned?

str(belonging) dbl lbl [1:2993] 1, 1, 1, 4, 1, 3, 2, 2, 3, 3, 2, 2, 3, 1, 2, 2, 1, 2, 4, 1, 3, 2, 1, 2, 1, 4, 1, 2, 1, 1, 3, 1, 3,... @ label : chr "GEN\AGREE\BELONG AT SCHOOL" @ format.spss: chr "F1.0" @ labels : Named num [1:5] 1 2 3 4 9 ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" .

CodePudding user response:

As you didn't write a reproducible example, here you have a "simulated" data in order to make what you are asking for:

# 1. GENERATE RANDOM VALUES FOR THE 4 DIFFERENT CATEGORIES

  # 1.1 Different values
    values <- c("agree a lot", "agree a little", "disagree a lot", "disagree a little")
  # 1.2 Fix a seed [to make reproducible]
    set.seed(12345)
  # 1.2 Generate VAR
    VAR <- sample(values, size = 100, replace = T)
  # 1.3 Visualize the result
    table(VAR)


# 2. REPLACE THE VALUES TO "AGREE" AND "DISAGREE"
  VAR2 <- factor(VAR, labels = c("AGREE", "AGREE", "DISAGREE", "DISAGREE"))

   # 2.1 Visualize the result
     table(VAR2)

In order to understand better what we are doing: it is importan to take into account that when you factorize a variable, R orders its levels in alphabetical order; in this case "agree a little", "agree a lot" "disagree a little", "disagree a lot". So, the labels writen above correspond to these levels, that is why there are two "AGREE" and then two "DISAGREE".

If you want a numeric variable, what you have to change form the previous code is the response:

VAR2 <- factor(VAR, labels = c(1,1,0,0))
  • Related