How can I "merge" binary data from multiple columns into a new column with categorical dat-CodePudding

My data has the following columns regarding race:

Black   White   Asian   Hispanic
No      Yes     No      No
Yes     No      No      No
No      No      Yes     No
No      Yes     No      Yes
...

How can I combine this into one column that I would title "Race" and takes a form like this?

Race
White Non-Hispanic
Black
Asian
White Hispanic
...

I have hundreds of rows of data, so I am looking for a way to automate this in R.

CodePudding user response：

Ignoring that I'm not really clear on how you're getting your expected output from the input data (see my comment above), this might be a starting point in base R:

data.frame(
    Race = apply(df, 1, function(x) paste(names(df)[x == "Yes"], collapse = " ")))
#            Race
#1          White
#2          Black
#3          Asian
#4 White Hispanic

To replace "White" with "White Non-Hispanic", you can wrap the above inside a transform replace call.

transform(
    data.frame(
        Race = apply(df, 1, function(x) paste(names(df)[x == "Yes"], collapse = " "))),
    Race = replace(Race, Race == "White", "White Non-Hispanic"))
#                Race
#1 White Non-Hispanic
#2              Black
#3              Asian
#4     White Hispanic

Sample data

df <- read.table(text = "Black   White   Asian   Hispanic
No      Yes     No      No
Yes     No      No      No
No      No      Yes     No
No      Yes     No      Yes", header = T)

CodePudding user response：

You will probably not only have the different race names in one column but also have a table-like summary of how often the names have the "Yes" value. If that's correct then this should work (thanks to @Maurits for the reproducible data!):

library(dplyr)
library(tidyr)
df %>%
  pivot_longer(everything(), names_to = "Race") %>%
  group_by(Race) %>%
  summarise(N = sum(value == "Yes"))
# A tibble: 4 × 2
  Race         N
  <chr>    <int>
1 Asian        1
2 Black        1
3 Hispanic     1
4 White        2