I have a dataset, where a couple of variables are dichotomised as yes/no.
> df[1:20,]
# A tibble: 20 × 2
black white
<fct> <fct>
1 No Yes
2 No Yes
3 No Yes
4 No Yes
5 No Yes
6 No Yes
7 No Yes
8 No Yes
9 No Yes
10 No Yes
11 No Yes
12 No Yes
13 No Yes
14 No Yes
15 No Yes
16 Yes No
17 No Yes
18 No Yes
19 No Yes
20 Yes No
This creates a lot of variables (my real data has more than one option for race) and doesn’t look very neat as it means a lot of variables unnecessarily. I want to create a new variable (e.g 'race'), where the now individual variables 'black', 'white' etc are levels to that variable. Like in this example
> df2[1:20,]
# A tibble: 20 × 1
race
<fct>
1 White
2 White
3 White
4 White
5 White
6 White
7 White
8 White
9 White
10 White
11 White
12 White
13 White
14 White
15 White
16 Black
17 White
18 White
19 White
20 Black
How would I go about this?
CodePudding user response:
To account for multiple races, use apply on the rows (MARGIN = 1
), and paste toString
the column names with "Yes"
:
df <- structure(list(asian = c("No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "Yes"), black = c("No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "Yes", "No", "No", "No", "Yes"), white = c("Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "Yes", "Yes", "Yes", "No")), row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame"))
data.frame(race = apply(df == "Yes", 1, \(x) toString(colnames(df)[which(x)])))
race
1 white
2 white
3 white
4 white
5 white
6 white
7 white
8 white
9 white
10 white
11 white
12 white
13 white
14 white
15 white
16 black
17 white
18 white
19 white
20 asian, black
Using max.col
(works only for one value per individual):
data.frame(race = colnames(df)[max.col(df == "Yes")])
CodePudding user response:
Here is a solution that works with multiracial cases.
library(tidyverse)
# Sample data with multiracial case
df <- structure(list(respondent = 1:20, asian = c("No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "Yes"), black = c("No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "Yes", "No", "No", "No", "Yes"), white = c("Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "Yes", "Yes", "Yes", "No")), row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame"))
df %>%
select(asian:white) %>%
`==`("Yes") %>%
apply(1,
\(.row) colnames(.)[.row] %>%
str_c(collapse = "-"))
#> [1] "white" "white" "white" "white" "white"
#> [6] "white" "white" "white" "white" "white"
#> [11] "white" "white" "white" "white" "white"
#> [16] "black" "white" "white" "white" "asian-black"
Created on 2022-04-04 by the reprex package (v2.0.1)
CodePudding user response:
Using dplyr
(this assumes that in your dataset a person can only be of 1 race):
library(dplyr)
dat <- data.frame(id = 1:2,
black = c("No", "Yes"),
white = c("Yes", "No"))
dat |> mutate(
race = case_when(black == "Yes" ~ "black",
white == "Yes" ~ "white")
)
Output:
#> id black white race
#> 1 1 No Yes white
#> 2 2 Yes No black