My data has the following columns regarding race:
Black White Asian Hispanic
No Yes No No
Yes No No No
No No Yes No
No Yes No Yes
...
How can I combine this into one column that I would title "Race" and takes a form like this?
Race
White Non-Hispanic
Black
Asian
White Hispanic
...
I have hundreds of rows of data, so I am looking for a way to automate this in R.
CodePudding user response:
Ignoring that I'm not really clear on how you're getting your expected output from the input data (see my comment above), this might be a starting point in base R:
data.frame(
Race = apply(df, 1, function(x) paste(names(df)[x == "Yes"], collapse = " ")))
# Race
#1 White
#2 Black
#3 Asian
#4 White Hispanic
To replace "White"
with "White Non-Hispanic"
, you can wrap the above inside a transform
replace
call.
transform(
data.frame(
Race = apply(df, 1, function(x) paste(names(df)[x == "Yes"], collapse = " "))),
Race = replace(Race, Race == "White", "White Non-Hispanic"))
# Race
#1 White Non-Hispanic
#2 Black
#3 Asian
#4 White Hispanic
Sample data
df <- read.table(text = "Black White Asian Hispanic
No Yes No No
Yes No No No
No No Yes No
No Yes No Yes", header = T)
CodePudding user response:
You will probably not only have the different race names in one column but also have a table-like summary of how often the names have the "Yes" value. If that's correct then this should work (thanks to @Maurits for the reproducible data!):
library(dplyr)
library(tidyr)
df %>%
pivot_longer(everything(), names_to = "Race") %>%
group_by(Race) %>%
summarise(N = sum(value == "Yes"))
# A tibble: 4 × 2
Race N
<chr> <int>
1 Asian 1
2 Black 1
3 Hispanic 1
4 White 2