Home > other >  Convert a column of lists into binary columns
Convert a column of lists into binary columns

Time:02-04

I have an R data frame with a column that looks like this

codes
111:222:333
222
111:222

I want to expand the codes column into individual binary columns like this:

111 222 333
1   1   1
0   1   0
1   1   0

I tried converting the codes column to a list of characters using strsplit. Then, I unnested the codes column and wanted to perform pivot_wider, but it seems like I cannot do that with columns that have duplicate identifying columns.

df <- df %>% 
  mutate(codes = strsplit(codes, ":", TRUE))
  unnest(codes) %>%
  mutate(value = 1) %>%
  pivot_wider(names_from = codes,
                         values_from = value,
                         values_fill = 0)

CodePudding user response:

We could use dummy_cols from fastDummies

library(fastDummies)
dummy_cols(df1, "codes", split = ":", remove_selected_columns = TRUE)

-output

  codes_111 codes_222 codes_333
1         1         1         1
2         0         1         0
3         1         1         0

NOTE: It may be better to have column names that start with alphabets. If we want to have only the values

library(dplyr)
library(stringr)
dummy_cols(df1, "codes", split = ":", remove_selected_columns = TRUE)  %>% 
  setNames(str_remove(names(.), "codes_"))
   111 222 333
1   1   1   1
2   0   1   0
3   1   1   0

data

df1 <- structure(list(codes = c("111:222:333", "222", "111:222")),
class = "data.frame", row.names = c(NA, 
-3L))

CodePudding user response:

Add an id column to your data.frame before unnest

library(dplyr)
library(tidyr)

df %>% 
  mutate(codes = strsplit(codes, ":", TRUE)) |>
  mutate(id = row_number()) |>
  unnest(codes) |>
  mutate(value = 1) %>%
  pivot_wider(names_from = codes,
                         values_from = value,
                         values_fill = 0)


##>   # A tibble: 3 × 4
##>      id `111` `222` `333`
##>   <int> <dbl> <dbl> <dbl>
##> 1     1     1     1     1
##> 2     2     0     1     0
##> 3     3     1     1     0

CodePudding user response:

Another approach using separate_rows:

library(tidyr)
library(dplyr)

df1 %>%
  mutate(r = 1:n()) %>%
  separate_rows(., codes, sep=":") %>% 
  table %>% 
  t

#   codes
# r   111 222 333
#   1   1   1   1
#   2   0   1   0
#   3   1   1   0

Although this will give us a table, so if we need a dataframe we should use pivot_wider instead of table.

df1 %>%
  mutate(r = 1:n(), val = 1) %>%
  separate_rows(., codes, sep=":")
  pivot_wider(names_from = "codes", values_from = "val", values_fill = 0) %>%
  select(-r)

# # A tibble: 3 x 4
#  `111` `222` `333`
#  <dbl> <dbl> <dbl>
# 1    1     1     1
# 2    0     1     0
# 3    1     1     0

Data:

df1 <- data.frame(codes = c("111:222:333","222", "111:222"))
  • Related