Home > database >  How to combine several binary variables into a new categorical variable
How to combine several binary variables into a new categorical variable

Time:08-13

I am trying to combine several binary variables into one categorical variable. I have ten categorial variables, each describing tasks of a job.

Data looks something like this:

Personal_Help <- c(1,1,2,1,2,1)
PR <- c(2,1,1,2,1,2)
Fundraising <- c(1,2,1,2,2,1)
# etc.

My goal is to combine them into one variable, where the value 1 (=Yes) of each binary variable will be a seperate level of the categorical variable.

To illustrate what I imagine (wrong code obviously):

If Personal_Help = 1 -> Jobcontent = 1
If PR = 1 -> Jobcontent = 2
If Fundraising = 1 -> Jobcontent = 3

etc.

Thank you very much in advance!

CodePudding user response:

if you're interested only in the first occurrence of 1 among your variables:

df <- data.frame(t(data.frame(Personal_Help, PR,Fundraising)))
result <- sapply(df, function(x) which(x==1)[1])

X1 X2 X3 X4 X5 X6 
 1  1  2  1  2  1 

Of course, this will depend on what you want to do when multiple values are 1 as asked in comments.

CodePudding user response:

Since there are three different variables, and each variable can take either of 2 values, there are 2^3 = 8 possible unique combinations of the three variables, each of which should have a unique number associated.

One way to do this is to imagine each column as being a digit in a three digit binary number. If we subtract 1 from each column, we get a 1 for "no" and a 0 for "yes". This means that our eight possible unique values, and the binary numbers associated with each would be:

binary    decimal
0 0 0   = 0
0 0 1   = 1
0 1 0   = 2
0 1 1   = 3
1 0 0   = 4
1 0 1   = 5
1 1 0   = 6
1 1 1   = 7

This system will work for any number of columns, and can be achieved as follows:

Personal_Help <- c(1,1,2,1,2,1)
PR <- c(2,1,1,2,1,2)
Fundraising <- c(1,2,1,2,2,1)
df <- data.frame(Personal_Help, PR, Fundraising)

New_var <- 0

for(i in seq_along(df)) New_var <- New_var   (2^(i - 1)) * (df[[i]] - 1)

df$New_var <- New_var

The end result would then be:

df
#>   Personal_Help PR Fundraising New_var
#> 1             1  2           1       2
#> 2             1  1           2       4
#> 3             2  1           1       1
#> 4             1  2           2       6
#> 5             2  1           2       5
#> 6             1  2           1       2

In your actual data, there will be 1024 possible combinations of tasks, so this will generate numbers for New_var between 0 and 1023. Because of how it is generated, you can actually use this single number to reverse engineer the entire row as long as you know the original column order.

CodePudding user response:

As @ulfelder commented, you need to clarify how you want to handle cases where more than one column is 1.

Assuming you want to use the first column equal to 1, you can use which.min(), applied by row:

data <- data.frame(Personal_Help, PR, Fundraising)

data$Jobcontent <- apply(data, MARGIN = 1, which.min)

Result:

  Personal_Help PR Fundraising Jobcontent
1             1  2           1          1
2             1  1           2          1
3             2  1           1          2
4             1  2           2          1
5             2  1           2          2
6             1  2           1          1

If you’d like Jobcontent to include the name of each job, you can index into names(data):

data$Jobcontent <- names(data)[apply(data, MARGIN = 1, which.min)]

Result:

  Personal_Help PR Fundraising    Jobcontent
1             1  2           1 Personal_Help
2             1  1           2 Personal_Help
3             2  1           1            PR
4             1  2           2 Personal_Help
5             2  1           2            PR
6             1  2           1 Personal_Help
  •  Tags:  
  • r
  • Related