Home > OS >  Trying to merge columns together
Trying to merge columns together

Time:03-14

My dataset looks like this:

      year  sex age income white black hispanic other_race degree
  1   1981 male  34  10000 white     0        0          0      1
  2   1981 male  32   9095     0 black        0          0      0
  3   2009 male  64  45200     0 black        0          0      0
  4   1999 male  50  25000 white     0        0          0      0
  5   1990 male  26  24500 white     0        0          0      0
  6   2011 male  39  46500 white     0        0          0      2
  7   2007 male  40  60000 white     0        0          0      0
  8   1990 male  47  39200 white     0        0          0      0
  9   1981 male  30  20500 white     0        0          0      0
  10  2007 male  55  33000 white     0        0          0      0

I want to merge together columns white, black, hispanic, other_race into one column

CodePudding user response:

If your data is d, then you can do this:

1. baseR

d$race = gsub("0", "", apply(d[,c("white", "black", "hispanic", "other_race")], 1, paste0, collapse=""))

2. tidyverse

d %>% 
  rowwise() %>% 
  mutate(race=paste(c_across(white:other_race),collapse=""),
         race=str_remove_all(race,"0"))

3. data.table

d[, race:=gsub("0","",paste0(white, black, hispanic, other_race)), by=1:nrow(d)]

Output:

   year  sex age income white black hispanic other_race degree  race
1  1981 male  34  10000 white     0        0          0      1 white
2  1981 male  32   9095     0 black        0          0      0 black
3  2009 male  64  45200     0 black        0          0      0 black
4  1999 male  50  25000 white     0        0          0      0 white
5  1990 male  26  24500 white     0        0          0      0 white
6  2011 male  39  46500 white     0        0          0      2 white
7  2007 male  40  60000 white     0        0          0      0 white
8  1990 male  47  39200 white     0        0          0      0 white
9  1981 male  30  20500 white     0        0          0      0 white
10 2007 male  55  33000 white     0        0          0      0 white

CodePudding user response:

In base R, we may use max.col on a logical matrix (df1[nm1] != '0') to find the column index of first max value (TRUE -> 1 and FALSE - > 0) per row and use that to subset the column names

nm1 <- c("white", "black", "hispanic", "other_race")
df1$race <- nm1[max.col(df1[nm1] != "0", "first")]

-output

> df1
   year  sex age income white black hispanic other_race degree  race
1  1981 male  34  10000 white     0        0          0      1 white
2  1981 male  32   9095     0 black        0          0      0 black
3  2009 male  64  45200     0 black        0          0      0 black
4  1999 male  50  25000 white     0        0          0      0 white
5  1990 male  26  24500 white     0        0          0      0 white
6  2011 male  39  46500 white     0        0          0      2 white
7  2007 male  40  60000 white     0        0          0      0 white
8  1990 male  47  39200 white     0        0          0      0 white
9  1981 male  30  20500 white     0        0          0      0 white
10 2007 male  55  33000 white     0        0          0      0 white

data

df1 <- structure(list(year = c(1981L, 1981L, 2009L, 1999L, 1990L, 2011L, 
2007L, 1990L, 1981L, 2007L), sex = c("male", "male", "male", 
"male", "male", "male", "male", "male", "male", "male"), age = c(34L, 
32L, 64L, 50L, 26L, 39L, 40L, 47L, 30L, 55L), income = c(10000L, 
9095L, 45200L, 25000L, 24500L, 46500L, 60000L, 39200L, 20500L, 
33000L), white = c("white", "0", "0", "white", "white", "white", 
"white", "white", "white", "white"), black = c("0", "black", 
"black", "0", "0", "0", "0", "0", "0", "0"), hispanic = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), other_race = c(0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L), degree = c(1L, 0L, 0L, 0L, 0L, 2L, 
0L, 0L, 0L, 0L)), class = "data.frame", row.names = c("1", "2", 
"3", "4", "5", "6", "7", "8", "9", "10"))
  • Related