Home > OS >  How to create a categorical variable in R using 2 binary variables that takes on 1 of 4 possible val
How to create a categorical variable in R using 2 binary variables that takes on 1 of 4 possible val

Time:12-10

I'm trying to construct a multinomial logit regression model using this categorical variable as my dependent variable.

In my data, the two binary variables represent whether an individual lives in a metropolitan area (RESMETRO) and whether an individual works in a metropolitan area (JOBMETRO).

There are four possible location outcomes when combining the two binary variables.

I'm struggling trying to get these four possible combinations into one variable:

  • (RESMETRO == TRUE & JOBMETRO == TRUE)
  • (RESMETRO == TRUE & JOBMETRO == FALSE)
  • (RESMETRO == FALSE & JOBMETRO == TRUE)
  • (RESMETRO == FALSE & JOBMETRO == FALSE)

I've tried creating a new variable but I've only been capable of creating just another binary variable.

CodePudding user response:

You mean something like this? I simulated your data whilst including a predictor variable and used case_when to make an ifelse statement based off your two binaries, which creates four outcomes in one column.

#### Simulate Data ####
resmetro <- rbinom(n=100,
                   size=1,
                   prob=.5)

jobmetro <- rbinom(n=100,
                   size=1,
                   prob=.5)

predictor <- rnorm(n=100,
                   mean=50,
                   sd=10)

tib <- tibble(resmetro,
              jobmetro,
              predictor)

You can then use case_when to make the new variable.

#### Use Case When ####
tib_2 <- tib %>% 
  mutate(metro_type = case_when(
    (resmetro == 0) & (jobmetro == 0) ~ "No Metro",
    (resmetro == 0) & (jobmetro == 1) ~ "Only Job",
    (resmetro == 1) & (jobmetro == 0) ~ "Only Res",
    (resmetro == 1) & (jobmetro == 1) ~ "Full Metro"
  ))

Which looks like this:

# A tibble: 100 × 4
   resmetro jobmetro predictor metro_type
      <int>    <int>     <dbl> <chr>     
 1        1        1      58.3 Full Metro
 2        0        0      54.2 No Metro  
 3        0        1      39.9 Only Job  
 4        1        1      54.1 Full Metro
 5        0        0      31.5 No Metro  
 6        1        0      43.3 Only Res  
 7        0        0      30.1 No Metro  
 8        1        1      53.3 Full Metro
 9        1        0      46.4 Only Res  
10        0        1      51.3 Only Job  
# … with 90 more rows

Then just fit the model:

fit <- nnet::multinom(metro_type ~ predictor, tib_2)
summary(fit)

Shown here:

Call:
nnet::multinom(formula = metro_type ~ predictor, data = tib_2)

Coefficients:
         (Intercept)    predictor
No Metro  0.05991963  0.004357301
Only Job -0.97875054  0.021891747
Only Res  0.39298053 -0.006230505

Std. Errors:
         (Intercept)  predictor
No Metro    1.416491 0.02797334
Only Job    1.493587 0.02901240
Only Res    1.467119 0.02925577

Residual Deviance: 275.1406 
AIC: 287.1406
  • Related