Recoding in R and making between group comparisons-CodePudding

I have a dataframe with a column that has "Yes", "No" and "Maybe" values. Here is a sample of how the dataframe looks like for context (not actual data I'm working with as that's more sensitive):

State	City	Do you like the color Blue?	Yes	Maybe	No
Arizona	Phoenix	Yes	1	0	0
Arizona	Phoenix	Yes	1	0	0
Arizona	Phoenix	Maybe	0	1	0
Arizona	Phoenix	No	0	0	1
Arizona	Scottsdale	No	0	0	1
Arizona	Scottsdale	Yes	1	0	0
Arizona	Scottsdale	Maybe	0	1	0
California	San Francisco	Yes	1	0	0
California	San Francisco	No	0	0	1
California	San Francisco	Maybe	0	1	0
California	Los Angeles	Yes	1	0	0
California	Los Angeles	Yes	1	0	0
California	Los Angeles	No	0	0	1

This is a two part question:

I would like to convert the "Yes" and "Maybe" in the "Do you like the color Blue?" column to equal 1 and the "No" to equal 0 (so categorical to numeric) and add it as a separate column.
I want to also make between states statistical comparisons as well (e.g. proportion of those who said "No" in California versus in Arizona). I was thinking of subsetting the data set by state and then making the comparisons, but the data set I'm working with has about 15 states. Is there a faster/more efficient way to do so?

CodePudding user response：

We may do this easily with model.matrix from base R

df1[c("Maybe", "No", "Yes")] <- model.matrix(~ df1$Do_you_like_the_color_Blue - 1)

Or using tidyverse

library(dplyr)
df1 %>%
    mutate(new_col =  (Do_you_like_the_color_Blue == "Yes")) %>%
    group_by(State) %>%
    mutate(Prop = mean(!new_col)) %>%
    ungroup

data

df1 <- structure(list(State = c("Arizona", "Arizona", "Arizona", "Arizona", 
"Arizona", "Arizona", "Arizona", "California", "California", 
"California", "California", "California", "California"), City = c("Phoenix", 
"Phoenix", "Phoenix", "Phoenix", "Scottsdale", "Scottsdale", 
"Scottsdale", "San Francisco", "San Francisco", "San Francisco", 
"Los Angeles", "Los Angeles", "Los Angeles"), Do_you_like_the_color_Blue = c("Yes", 
"Yes", "Maybe", "No", "No", "Yes", "Maybe", "Yes", "No", "Maybe", 
"Yes", "Yes", "No"), Yes = c(1L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 
0L, 0L, 1L, 1L, 0L), Maybe = c(0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 
0L, 1L, 0L, 0L, 0L), No = c(0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 
0L, 0L, 0L, 1L)), class = "data.frame", row.names = c(NA, -13L
))

CodePudding user response：

To create the columns, we can use the fastDummies package, wich is usually much faster to write than recode or pivoting wider

library(fastDummies)
library(dplyr)

df %>% dummy_cols(`Do you like the color Blue?`)