Home > Back-end >  Using R to measure congruence between levels in variable
Using R to measure congruence between levels in variable

Time:04-06

I have two different variables in R. The first ("candimmi") represents political candidates' opinion on immigration. The second variable (voterimmi) represents voters opinion on immigration. Both variables have the same 3 levels being either anti-immigration, intermediate or pro-immigration.

My issue is that I want to create a new variable stating wether there is congruence or not between the voter and the political candidates. The levels in the new variable would be called "both anti-immigrant", "both intermediate", "both pro-immigration" and "mismatch".

Can any of you give me some advice on how to do this?

Thanks in advance!

Best, Malte

I have tried finding solutions already, but can't find any answers to my question online.

CodePudding user response:

You can use case_when, which is just dplyr's version of ifelse:

set.seed(05062020)
library(dplyr)
responses <- c("Anti","Intermed","Pro")
df <- data.frame(candidate = sample(responses, 10, replace = TRUE),
                 voter = sample(responses, 10, replace = TRUE))

df2 <- df %>% mutate(result = case_when(candidate %in% "Anti" & voter %in% "Anti" ~ "Both Anti",
                        candidate %in% "Intermed" & voter %in% "Intermed" ~ "Both Intermed",
                        candidate %in% "Pro" & voter %in% "Pro" ~ "Both Pro",
                        candidate != voter ~ "Discordant"))

# candidate    voter        result
# 1        Pro Intermed    Discordant
# 2       Anti     Anti     Both Anti
# 3        Pro      Pro      Both Pro
# 4        Pro     Anti    Discordant
# 5        Pro     Anti    Discordant
# 6        Pro      Pro      Both Pro
# 7        Pro Intermed    Discordant
# 8   Intermed      Pro    Discordant
# 9   Intermed Intermed Both Intermed
# 10      Anti      Pro    Discordant

A base R way to do it is using nested ifelse statements:

df$result <- ifelse(df$candidate %in% "Anti" & df$voter %in% "Anti", "Both Anti",
                    ifelse(df$candidate %in% "Intermed" & df$voter %in% "Intermed", "Both Intermed",
                           ifelse(df$candidate %in% "Pro" & df$voter %in% "Pro", "Both Pro",
                                  ifelse(df$candidate != df$voter, "Discordant", NA))))

# > df
# candidate    voter        result
# 1        Pro Intermed    Discordant
# 2       Anti     Anti     Both Anti
# 3        Pro      Pro      Both Pro
# 4        Pro     Anti    Discordant
# 5        Pro     Anti    Discordant
# 6        Pro      Pro      Both Pro
# 7        Pro Intermed    Discordant
# 8   Intermed      Pro    Discordant
# 9   Intermed Intermed Both Intermed
# 10      Anti      Pro    Discordant

CodePudding user response:

Here is a simple approach using base R functions factor and interaction (using @jpsmith example data.frame with different random seed). At the core of this, interaction will automatically create a new factor with combined levels, then you can just rename these if you like (might be useful with many factor levels).

set.seed(234) # fixed random seed for reproducibility
responses <- c("Anti", "Intermed", "Pro")
congruence <- c("both anti-immigrant", "both intermediate", "both pro-immigration", "mismatch")
df <- data.frame(candidate = sample(responses, 10, replace = TRUE),
                 voter = sample(responses, 10, replace = TRUE))
df$candidate <- factor(df$candidate, levels=responses)   # make sure you have all the levels
df$voter <- factor(df$voter, levels=responses)           # make sure you have all the levels
df$congruence <- with(df, interaction(candidate, voter)) # create new factor representing both levels
levels(df$congruence) <- congruence[c(1,4,4,4,2,4,4,4,3)] # match up factor levels to rename
df
#>    candidate    voter           congruence
#> 1       Anti      Pro             mismatch
#> 2        Pro      Pro both pro-immigration
#> 3   Intermed Intermed    both intermediate
#> 4   Intermed      Pro             mismatch
#> 5   Intermed Intermed    both intermediate
#> 6   Intermed Intermed    both intermediate
#> 7       Anti     Anti  both anti-immigrant
#> 8       Anti     Anti  both anti-immigrant
#> 9        Pro Intermed             mismatch
#> 10  Intermed      Pro             mismatch

Created on 2022-04-05 by the reprex package (v2.0.1)

CodePudding user response:

Both of the other answers work fine, but the simplest solution is to use just one ifelse(). Below I first create some sample data and then show how you would use ifelse() in either the tidyverse or base R if you prefer.

library(tidyverse)

# Create data sample
d <- crossing(
  candimmi = c("anti", "inter", "pro"),
  voterimmi = candimmi
  )
d |>
  mutate(new_tidy = ifelse(candimmi != voterimmi,
                           "mismatch",
                           str_c("both ", candimmi)))
#> # A tibble: 9 × 3
#>   candimmi voterimmi new_tidy  
#>   <chr>    <chr>     <chr>     
#> 1 anti     anti      both anti 
#> 2 anti     inter     mismatch  
#> 3 anti     pro       mismatch  
#> 4 inter    anti      mismatch  
#> 5 inter    inter     both inter
#> 6 inter    pro       mismatch  
#> 7 pro      anti      mismatch  
#> 8 pro      inter     mismatch  
#> 9 pro      pro       both pro
d$new_base <- ifelse(d$candimmi != d$voterimmi,
                     "mismatch",
                     paste("both", d$candimmi))
d
#> # A tibble: 9 × 3
#>   candimmi voterimmi new_base  
#>   <chr>    <chr>     <chr>     
#> 1 anti     anti      both anti 
#> 2 anti     inter     mismatch  
#> 3 anti     pro       mismatch  
#> 4 inter    anti      mismatch  
#> 5 inter    inter     both inter
#> 6 inter    pro       mismatch  
#> 7 pro      anti      mismatch  
#> 8 pro      inter     mismatch  
#> 9 pro      pro       both pro

Created on 2022-04-05 by the reprex package (v2.0.1)

  • Related