I have df
ID White Black Asian Hisp Other
1 1 0 0 0 0
2 0 1 0 0 0
3 1 0 0 0 0
4 0 0 1 0 0
5 1 0 0 0 0
6 0 0 1 0 0
7 1 0 0 0 0
8 0 1 0 0 0
9 0 0 0 0 1
10 0 0 0 1 0
I would like to re-classify these columns into a single columns named "Race" where white = 0, black = 1, asian = 2, hispanic = 3, and other = 4. The final column would look like:
ID White Black Asian Hisp Other Race
1 1 0 0 0 0 0
2 0 1 0 0 0 1
3 1 0 0 0 0 0
4 0 0 1 0 0 2
5 1 0 0 0 0 0
6 0 0 1 0 0 2
7 1 0 0 0 0 0
8 0 1 0 0 0 1
9 0 0 0 0 1 4
10 0 0 0 1 0 3
CodePudding user response:
We can use max.col
df$Race <- max.col(df[-1], 'first')-1
-output
> df
ID White Black Asian Hisp Other Race
1 1 1 0 0 0 0 0
2 2 0 1 0 0 0 1
3 3 1 0 0 0 0 0
4 4 0 0 1 0 0 2
5 5 1 0 0 0 0 0
6 6 0 0 1 0 0 2
7 7 1 0 0 0 0 0
8 8 0 1 0 0 0 1
9 9 0 0 0 0 1 4
10 10 0 0 0 1 0 3
data
df <- structure(list(ID = 1:10, White = c(1L, 0L, 1L, 0L, 1L, 0L, 1L,
0L, 0L, 0L), Black = c(0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L
), Asian = c(0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L), Hisp = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L), Other = c(0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 1L, 0L)), class = "data.frame", row.names = c(NA,
-10L))
CodePudding user response:
You can use the dplyr package and use case_when
and mutate
functions.
library(dplyr)
df_new <- df %>% mutate(
Race = case_when(
White == 1 ~ 0,
Black == 1 ~ 1,
Asian == 1 ~ 2,
Hisp == 1 ~ 3,
Other == 1 ~ 4
)
)
CodePudding user response:
Use apply function to perform the loop by row and which function to find the index of 1 in the row.
df$Race <- apply(df[,-1], 1, function(x) which(1 == x)-1)
Output is
ID White Black Asian Hisp Other Race
1 1 1 0 0 0 0 0
2 2 0 1 0 0 0 1
3 3 1 0 0 0 0 0
4 4 0 0 1 0 0 2
5 5 1 0 0 0 0 0
6 6 0 0 1 0 0 2
7 7 1 0 0 0 0 0
8 8 0 1 0 0 0 1
9 9 0 0 0 0 1 4
10 10 0 0 0 1 0 3
CodePudding user response:
You could use tidyr
's gather
function for this.
library(tidyverse)
df %>%
mutate(Race =
df %>%
gather(Race, flag, White:Other, factor_key = T) %>%
filter(flag == 1) %>%
arrange(ID) %>%
summarize(Race = as.numeric(as.factor(Race))-1)
)
Which produces:
ID White Black Asian Hisp Other Race
1 1 1 0 0 0 0 0
2 2 0 1 0 0 0 1
3 3 1 0 0 0 0 0
4 4 0 0 1 0 0 2
5 5 1 0 0 0 0 0
6 6 0 0 1 0 0 2
7 7 1 0 0 0 0 0
8 8 0 1 0 0 0 1
9 9 0 0 0 0 1 4
10 10 0 0 0 1 0 3
Note: Without the as.numeric(as.factor())
it will create a character column with "White", "Black", etc instead of indices. Maybe you prefer that?