My dataset looks like this:
year sex age income white black hispanic other_race degree
1 1981 male 34 10000 white 0 0 0 1
2 1981 male 32 9095 0 black 0 0 0
3 2009 male 64 45200 0 black 0 0 0
4 1999 male 50 25000 white 0 0 0 0
5 1990 male 26 24500 white 0 0 0 0
6 2011 male 39 46500 white 0 0 0 2
7 2007 male 40 60000 white 0 0 0 0
8 1990 male 47 39200 white 0 0 0 0
9 1981 male 30 20500 white 0 0 0 0
10 2007 male 55 33000 white 0 0 0 0
I want to merge together columns white, black, hispanic, other_race into one column
CodePudding user response:
If your data is d
, then you can do this:
1. baseR
d$race = gsub("0", "", apply(d[,c("white", "black", "hispanic", "other_race")], 1, paste0, collapse=""))
2. tidyverse
d %>%
rowwise() %>%
mutate(race=paste(c_across(white:other_race),collapse=""),
race=str_remove_all(race,"0"))
3. data.table
d[, race:=gsub("0","",paste0(white, black, hispanic, other_race)), by=1:nrow(d)]
Output:
year sex age income white black hispanic other_race degree race
1 1981 male 34 10000 white 0 0 0 1 white
2 1981 male 32 9095 0 black 0 0 0 black
3 2009 male 64 45200 0 black 0 0 0 black
4 1999 male 50 25000 white 0 0 0 0 white
5 1990 male 26 24500 white 0 0 0 0 white
6 2011 male 39 46500 white 0 0 0 2 white
7 2007 male 40 60000 white 0 0 0 0 white
8 1990 male 47 39200 white 0 0 0 0 white
9 1981 male 30 20500 white 0 0 0 0 white
10 2007 male 55 33000 white 0 0 0 0 white
CodePudding user response:
In base R
, we may use max.col
on a logical matrix (df1[nm1] != '0'
) to find the column index of first max value (TRUE
-> 1 and FALSE
- > 0) per row and use that to subset the column names
nm1 <- c("white", "black", "hispanic", "other_race")
df1$race <- nm1[max.col(df1[nm1] != "0", "first")]
-output
> df1
year sex age income white black hispanic other_race degree race
1 1981 male 34 10000 white 0 0 0 1 white
2 1981 male 32 9095 0 black 0 0 0 black
3 2009 male 64 45200 0 black 0 0 0 black
4 1999 male 50 25000 white 0 0 0 0 white
5 1990 male 26 24500 white 0 0 0 0 white
6 2011 male 39 46500 white 0 0 0 2 white
7 2007 male 40 60000 white 0 0 0 0 white
8 1990 male 47 39200 white 0 0 0 0 white
9 1981 male 30 20500 white 0 0 0 0 white
10 2007 male 55 33000 white 0 0 0 0 white
data
df1 <- structure(list(year = c(1981L, 1981L, 2009L, 1999L, 1990L, 2011L,
2007L, 1990L, 1981L, 2007L), sex = c("male", "male", "male",
"male", "male", "male", "male", "male", "male", "male"), age = c(34L,
32L, 64L, 50L, 26L, 39L, 40L, 47L, 30L, 55L), income = c(10000L,
9095L, 45200L, 25000L, 24500L, 46500L, 60000L, 39200L, 20500L,
33000L), white = c("white", "0", "0", "white", "white", "white",
"white", "white", "white", "white"), black = c("0", "black",
"black", "0", "0", "0", "0", "0", "0", "0"), hispanic = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), other_race = c(0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L), degree = c(1L, 0L, 0L, 0L, 0L, 2L,
0L, 0L, 0L, 0L)), class = "data.frame", row.names = c("1", "2",
"3", "4", "5", "6", "7", "8", "9", "10"))