R Convert column of lists into binary columns-CodePudding

I have an R data frame with a column that looks like this

codes
111:222:333
222
111:222

I want to expand the codes column into individual binary columns like this:

111 222 333
1   1   1
0   1   0
1   1   0

I tried converting the codes column to a list of characters using strsplit. Then, I unnested the codes column and wanted to perform pivot_wider, but it seems like I cannot do that with columns that have duplicate identifying columns.

df <- df %>% 
  mutate(codes = strsplit(codes, ":", TRUE))
  unnest(codes) %>%
  mutate(value = 1) %>%
  pivot_wider(names_from = codes,
                         values_from = value,
                         values_fill = 0)

Any recommendations for how to approach this would be greatly appreciated!

CodePudding user response：

We could use dummy_cols from fastDummies

library(fastDummies)
dummy_cols(df1, "codes", split = ":", remove_selected_columns = TRUE)

-output

  codes_111 codes_222 codes_333
1         1         1         1
2         0         1         0
3         1         1         0

NOTE: It may be better to have column names that start with alphabets. If we want to have only the values

library(dplyr)
library(stringr)
dummy_cols(df1, "codes", split = ":", remove_selected_columns = TRUE)  %>% 
  setNames(str_remove(names(.), "codes_"))
   111 222 333
1   1   1   1
2   0   1   0
3   1   1   0

data

df1 <- structure(list(codes = c("111:222:333", "222", "111:222")),
class = "data.frame", row.names = c(NA, 
-3L))

CodePudding user response：

Add an id column to your data.frame before unnest

library(dplyr)
library(tidyr)

df %>% 
  mutate(codes = strsplit(codes, ":", TRUE)) |>
  mutate(id = row_number()) |>
  unnest(codes) |>
  mutate(value = 1) %>%
  pivot_wider(names_from = codes,
                         values_from = value,
                         values_fill = 0)


##>   # A tibble: 3 × 4
##>      id `111` `222` `333`
##>   <int> <dbl> <dbl> <dbl>
##> 1     1     1     1     1
##> 2     2     0     1     0
##> 3     3     1     1     0