I have an R data frame with a column that looks like this
codes
111:222:333
222
111:222
I want to expand the codes
column into individual binary columns like this:
111 222 333
1 1 1
0 1 0
1 1 0
I tried converting the codes
column to a list of characters using strsplit
. Then, I unnested the codes
column and wanted to perform pivot_wider
, but it seems like I cannot do that with columns that have duplicate identifying columns.
df <- df %>%
mutate(codes = strsplit(codes, ":", TRUE))
unnest(codes) %>%
mutate(value = 1) %>%
pivot_wider(names_from = codes,
values_from = value,
values_fill = 0)
Any recommendations for how to approach this would be greatly appreciated!
CodePudding user response:
We could use dummy_cols
from fastDummies
library(fastDummies)
dummy_cols(df1, "codes", split = ":", remove_selected_columns = TRUE)
-output
codes_111 codes_222 codes_333
1 1 1 1
2 0 1 0
3 1 1 0
NOTE: It may be better to have column names that start with alphabets. If we want to have only the values
library(dplyr)
library(stringr)
dummy_cols(df1, "codes", split = ":", remove_selected_columns = TRUE) %>%
setNames(str_remove(names(.), "codes_"))
111 222 333
1 1 1 1
2 0 1 0
3 1 1 0
data
df1 <- structure(list(codes = c("111:222:333", "222", "111:222")),
class = "data.frame", row.names = c(NA,
-3L))
CodePudding user response:
Add an id
column to your data.frame before unnest
library(dplyr)
library(tidyr)
df %>%
mutate(codes = strsplit(codes, ":", TRUE)) |>
mutate(id = row_number()) |>
unnest(codes) |>
mutate(value = 1) %>%
pivot_wider(names_from = codes,
values_from = value,
values_fill = 0)
##> # A tibble: 3 × 4
##> id `111` `222` `333`
##> <int> <dbl> <dbl> <dbl>
##> 1 1 1 1 1
##> 2 2 0 1 0
##> 3 3 1 1 0