I have a question about binary variable. As you can see in the photo, I have an ordinal variable consisting of 4 categories, on this variable, I need to apply classification algorithms in machine learning. How can I make this variable a binary variable, can you help me write the necessary codes in R ?
str(belonging) dbl lbl [1:2993] 1, 1, 1, 4, 1, 3, 2, 2, 3, 3, 2, 2, 3, 1, 2, 2, 1, 2, 4, 1, 3, 2, 1, 2, 1, 4, 1, 2, 1, 1, 3, 1, 3,... @ label : chr "GEN\AGREE\BELONG AT SCHOOL" @ format.spss: chr "F1.0" @ labels : Named num [1:5] 1 2 3 4 9 ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
Levels of the variable : "agree a lot", "agree a little", "disagree a lot", "disagree a little", I want to make these into "agree" and "disagree", and then I want to label the agree to 1 and the disagree level to 0.
CodePudding user response:
Sample data
# A tibble: 15 x 2
id result
<int> <chr>
1 1 agree a little
2 2 agree a lot
3 3 agree a lot
4 4 agree a lot
5 5 agree a lot
6 6 agree a lot
7 7 disagree alot
8 8 agree a little
9 9 disagree alot
10 10 disagree alot
11 11 disagree alot
12 12 agree a lot
13 13 agree a little
14 14 agree a lot
15 15 disagree a little
When disagree
is detected result
is changed to 0
library(tidyverse)
df %>%
mutate(result = case_when(str_detect(result, "disagree") ~ 0,
TRUE ~ 1) %>%
as_factor)
# A tibble: 15 x 2
id result
<int> <fct>
1 1 1
2 2 1
3 3 1
4 4 1
5 5 1
6 6 1
7 7 0
8 8 1
9 9 0
10 10 0
11 11 0
12 12 1
13 13 1
14 14 1
15 15 0
CodePudding user response:
Thank you very much for your quick response. I mean actually, how could I convert an ordinal variable of 4 categories into a binary variable? I want to group the tag agree a little and agree a lot as "agree". I am copying the structure of the variable here. In this case, should I use the code you mentioned?
str(belonging) dbl lbl [1:2993] 1, 1, 1, 4, 1, 3, 2, 2, 3, 3, 2, 2, 3, 1, 2, 2, 1, 2, 4, 1, 3, 2, 1, 2, 1, 4, 1, 2, 1, 1, 3, 1, 3,... @ label : chr "GEN\AGREE\BELONG AT SCHOOL" @ format.spss: chr "F1.0" @ labels : Named num [1:5] 1 2 3 4 9 ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" .
CodePudding user response:
As you didn't write a reproducible example, here you have a "simulated" data in order to make what you are asking for:
# 1. GENERATE RANDOM VALUES FOR THE 4 DIFFERENT CATEGORIES
# 1.1 Different values
values <- c("agree a lot", "agree a little", "disagree a lot", "disagree a little")
# 1.2 Fix a seed [to make reproducible]
set.seed(12345)
# 1.2 Generate VAR
VAR <- sample(values, size = 100, replace = T)
# 1.3 Visualize the result
table(VAR)
# 2. REPLACE THE VALUES TO "AGREE" AND "DISAGREE"
VAR2 <- factor(VAR, labels = c("AGREE", "AGREE", "DISAGREE", "DISAGREE"))
# 2.1 Visualize the result
table(VAR2)
In order to understand better what we are doing: it is importan to take into account that when you factorize a variable, R orders its levels in alphabetical order; in this case "agree a little", "agree a lot" "disagree a little", "disagree a lot"
. So, the labels writen above correspond to these levels, that is why there are two "AGREE" and then two "DISAGREE".
If you want a numeric variable, what you have to change form the previous code is the response:
VAR2 <- factor(VAR, labels = c(1,1,0,0))