Home > Enterprise >  Create binary columns based of presence of values in multiple columns
Create binary columns based of presence of values in multiple columns

Time:07-15

I have dataset with multiple timepoints for e.g. col1, col2, col3, col4 with each column having similar categories (A,B,C,D)

|ID  | col1 | col2| col3| col4| ColA
------------------------------
|1   |A     |A    |B    |B
------------------------------
|2   |A     |B    |B    |C
-------------------------------
|3   |B     |B    |C    |C
-------------------------------
|4   |B     |C    |C    |D
-------------------------------
|5   |C     |C    |D    |D
--------------------------------

Now I want to create a new variable ("colA") such that if A is found in any of the col1-col4 then it will be assigned value 1 else value 0. Similar for 1 for B new in "colB", C for "colC", and D for "colD". Here ColA will have 1,1,0,0,0 and ColB will have 1,1,1,1,0 values so on.

I tried multiple options which I found here For eg. (without loop)

df %>%
 mutate(colA=case_when(starts_with("col")=="A") ~ 1, TRUE~0)

or

(with a loop)

for (I in 1:4)  {
  df[,"colA"] <- ifelse(unlist(df[,paste0("col",i)]=="A",1,0)
}

but it did not work. in for loop, the data gets overwritten at the end of the loop, so colA reads as 0,0,0,0,0 as there are no A's in col4. I am using multiple datasets where range of i may vary from 1:2 to 1:4.

structure(list(id = 1:6, col1 = c(age11 = "A", age12 = "A", age13 = "B", 
age14 = "A", age15 = "C", age16 = "B"), col2 = c(age21 = "A", 
age22 = NA, age23 = "C", age24 = "A", age25 = "C", age26 = "B"
), col3 = c(age31 = "B", age32 = "B", age33 = NA, age34 = NA, 
age35 = "C", age36 = "B"), col4 = c(age41 = "B", age42 = NA, 
age43 = NA, age44 = NA, age45 = "D", age46 = "C")), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

CodePudding user response:

df %>%
   pivot_longer(-ID, names_to = 'value', values_to = 'name') %>%
   pivot_wider(values_fn = length, values_fill = 0, names_prefix = 'col')%>%
   mutate(across(-ID, ~ (.>0)))%>%
   right_join(df, 'ID')

# A tibble: 5 x 9
     ID  colA  colB  colC  colD col1  col2  col3  col4 
  <int> <int> <int> <int> <int> <chr> <chr> <chr> <chr>
1     1     1     1     0     0 A     A     B     B    
2     2     1     1     1     0 A     B     B     C    
3     3     0     1     1     0 B     B     C     C    
4     4     0     1     1     1 B     C     C     D    
5     5     0     0     1     1 C     C     D     D 

CodePudding user response:

in Base R:

cbind(df,  (table(cbind(df['ID'], unlist(df[-1])))>1))

  ID col1 col2 col3 col4 A B C D
1  1    A    A    B    B 1 1 0 0
2  2    A    B    B    C 0 1 0 0
3  3    B    B    C    C 0 1 1 0
4  4    B    C    C    D 0 0 1 0
5  5    C    C    D    D 0 0 1 1

CodePudding user response:

library(dplyr, warn.conflicts = FALSE)

df %>% 
  mutate(colA = coalesce(if_any(col1:col4, `==`, 'A'), 0))
#> # A tibble: 6 × 6
#>      id col1  col2  col3  col4   colA
#>   <int> <chr> <chr> <chr> <chr> <dbl>
#> 1     1 A     A     B     B         1
#> 2     2 A     <NA>  B     <NA>      1
#> 3     3 B     C     <NA>  <NA>      0
#> 4     4 A     A     <NA>  <NA>      1
#> 5     5 C     C     C     D         0
#> 6     6 B     B     B     C         0

Created on 2022-07-14 by the reprex package (v2.0.1)

  •  Tags:  
  • r
  • Related