Home > OS >  How do I create a new column from this data?
How do I create a new column from this data?

Time:10-28

I have df

ID   White   Black  Asian  Hisp   Other
1     1      0      0      0      0
2     0      1      0      0      0
3     1      0      0      0      0
4     0      0      1      0      0
5     1      0      0      0      0
6     0      0      1      0      0
7     1      0      0      0      0
8     0      1      0      0      0
9     0      0      0      0      1
10    0      0      0      1      0
  

I would like to re-classify these columns into a single columns named "Race" where white = 0, black = 1, asian = 2, hispanic = 3, and other = 4. The final column would look like:

ID   White   Black  Asian  Hisp   Other   Race
1     1      0      0      0      0       0
2     0      1      0      0      0       1
3     1      0      0      0      0       0
4     0      0      1      0      0       2 
5     1      0      0      0      0       0
6     0      0      1      0      0       2
7     1      0      0      0      0       0
8     0      1      0      0      0       1
9     0      0      0      0      1       4
10    0      0      0      1      0       3

CodePudding user response:

We can use max.col

df$Race <- max.col(df[-1], 'first')-1

-output

> df
   ID White Black Asian Hisp Other Race
1   1     1     0     0    0     0    0
2   2     0     1     0    0     0    1
3   3     1     0     0    0     0    0
4   4     0     0     1    0     0    2
5   5     1     0     0    0     0    0
6   6     0     0     1    0     0    2
7   7     1     0     0    0     0    0
8   8     0     1     0    0     0    1
9   9     0     0     0    0     1    4
10 10     0     0     0    1     0    3

data

df <- structure(list(ID = 1:10, White = c(1L, 0L, 1L, 0L, 1L, 0L, 1L, 
0L, 0L, 0L), Black = c(0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L
), Asian = c(0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L), Hisp = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L), Other = c(0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 1L, 0L)), class = "data.frame", row.names = c(NA, 
-10L))

CodePudding user response:

You can use the dplyr package and use case_when and mutate functions.

library(dplyr)


df_new <- df %>% mutate(
        Race = case_when(
        White == 1 ~ 0,
        Black == 1 ~ 1,
        Asian == 1 ~ 2,
        Hisp  == 1 ~ 3,
        Other == 1 ~ 4
  )
)

CodePudding user response:

Use apply function to perform the loop by row and which function to find the index of 1 in the row.

df$Race <- apply(df[,-1], 1, function(x) which(1 == x)-1)

Output is

   ID White Black Asian Hisp Other Race
1   1     1     0     0    0     0    0
2   2     0     1     0    0     0    1
3   3     1     0     0    0     0    0
4   4     0     0     1    0     0    2
5   5     1     0     0    0     0    0
6   6     0     0     1    0     0    2
7   7     1     0     0    0     0    0
8   8     0     1     0    0     0    1
9   9     0     0     0    0     1    4
10 10     0     0     0    1     0    3

CodePudding user response:

You could use tidyr's gather function for this.

library(tidyverse)
df %>%
  mutate(Race =
           df %>% 
             gather(Race, flag, White:Other, factor_key = T) %>% 
             filter(flag == 1) %>% 
             arrange(ID) %>% 
             summarize(Race = as.numeric(as.factor(Race))-1)
         )

Which produces:

   ID White Black Asian Hisp Other Race
1   1     1     0     0    0     0    0
2   2     0     1     0    0     0    1
3   3     1     0     0    0     0    0
4   4     0     0     1    0     0    2
5   5     1     0     0    0     0    0
6   6     0     0     1    0     0    2
7   7     1     0     0    0     0    0
8   8     0     1     0    0     0    1
9   9     0     0     0    0     1    4
10 10     0     0     0    1     0    3

Note: Without the as.numeric(as.factor()) it will create a character column with "White", "Black", etc instead of indices. Maybe you prefer that?

  • Related