I'm cleaning up some survey data which appear to have allowed respondents to select multiple race categories. I'm wondering how I can recode these into a "multiracial" response for the purposes of analysis.
Right now I've been doing rather laborious hand-coding that hasn't panned out. Here's my attempt at using recode to turn each of the responses with multiple entries into a number that can then be recoded with case_when.
rawdat$race <- recode(rawdat$race, "White, non-Hispanic,Asian" = 1,
"White, non-Hispanic,American Indian or Alaska Native" = 2,
"White, non-Hispanic,Black or African American,Asian" = 3,
"Black or African American,American Indian or Alaska Native" = 4,
"White, non-Hispanic,Hispanic" = 5,
"Asian,Native Hawaiian or Pacific Islander" = 6,
"White, non-Hispanic,Black or African American" = 7,
"Black or African American,American Indian or Alaska Native,Asian,Hispanic" = 8,
"White, non-Hispanic,Black or African American,American Indian or Alaska Native,Asian,Native Hawaiian or Pacific Islander,Hispanic" = 9,
"Black or African American,Hispanic" = 10,
"Black or African American,Asian" = 11,
"White, non-Hispanic,Native Hawaiian or Pacific Islander" =12,
"White, non-Hispanic,Black or African American,American Indian or Alaska Native,Asian,Hispanic",
"American Indian or Alaska Native,Hispanic" = 13)
There are a number of problems with this approach (I only attempted it because I assumed it would work as as brute-force short-term fix -- it didn't), and I'd much prefer to initialize a vector containing each of the possible values presented to respondents for this question and then recode any cell that contains multiple of those values to the value "multiracial," but as far as I'm aware the recode()function won't accept such a vector as an argument. Any ideas as to how I can accomplish this latter approach?
CodePudding user response:
It looks like the cells that you want to recode as "multiracial" contain a comma - is that correct? If so, you only need to identify cells with commas.
library(tidyverse)
race <- c("White", "White, Asian", "Black or African American", "White", "White, Black or African American")
df <- as.data.frame(race)
df$multiracial <- ifelse(grepl(",", df$race), "Multiracial", "Not multiracial")
df$race <- ifelse(df$multiracial == "Multiracial", "Multiracial", df$race)
head(df$race)
#> [1] "White" "Multiracial"
#> [3] "Black or African American" "White"
#> [5] "Multiracial"
EDIT
Created a separate Hispanic/non-Hispanic column. This may not work as cleanly on your original data, it depends if there is consistent spacing/commas between each choice.
library(tidyverse)
library(stringr)
race <- c("White, non-Hispanic",
"White, Asian",
"Black or African American",
"White, Hispanic, Asian",
"White, Black or African American",
"White, non-Hispanic",
"Black or African American, Hispanic")
df <- as.data.frame(race)
# original
df$original <- df$race
# create separate Hispanic/non-Hispanic column
df$hispanic <- ifelse(grepl("non-Hispanic",df$race),"non-Hispanic",
ifelse(grepl("Hispanic",df$race),"Hispanic", "Unknown"))
# remove Hispanic/non-Hispanic
df$race <- str_remove(df$race, ", non-Hispanic")
df$race <- str_remove(df$race, ", Hispanic")
# recode as multiracial
df$multiracial <- ifelse(grepl(",", df$race), "Multiracial", "Not multiracial")
df$race <- ifelse(df$multiracial == "Multiracial", "Multiracial", df$race)
head(df)
#> race original hispanic
#> 1 White White, non-Hispanic non-Hispanic
#> 2 Multiracial White, Asian Unknown
#> 3 Black or African American Black or African American Unknown
#> 4 Multiracial White, Hispanic, Asian Hispanic
#> 5 Multiracial White, Black or African American Unknown
#> 6 White White, non-Hispanic non-Hispanic
#> multiracial
#> 1 Not multiracial
#> 2 Multiracial
#> 3 Not multiracial
#> 4 Multiracial
#> 5 Multiracial
#> 6 Not multiracial