I have an R data frame with a column that looks like this
codes
111:222:333
222
111:222
I want to expand the codes
column into individual binary columns like this:
111 222 333
1 1 1
0 1 0
1 1 0
I tried converting the codes
column to a list of characters using strsplit
. Then, I unnested the codes
column and wanted to perform pivot_wider
, but it seems like I cannot do that with columns that have duplicate identifying columns.
df <- df %>%
mutate(codes = strsplit(codes, ":", TRUE))
unnest(codes) %>%
mutate(value = 1) %>%
pivot_wider(names_from = codes,
values_from = value,
values_fill = 0)
CodePudding user response:
We could use dummy_cols
from fastDummies
library(fastDummies)
dummy_cols(df1, "codes", split = ":", remove_selected_columns = TRUE)
-output
codes_111 codes_222 codes_333
1 1 1 1
2 0 1 0
3 1 1 0
NOTE: It may be better to have column names that start with alphabets. If we want to have only the values
library(dplyr)
library(stringr)
dummy_cols(df1, "codes", split = ":", remove_selected_columns = TRUE) %>%
setNames(str_remove(names(.), "codes_"))
111 222 333
1 1 1 1
2 0 1 0
3 1 1 0
data
df1 <- structure(list(codes = c("111:222:333", "222", "111:222")),
class = "data.frame", row.names = c(NA,
-3L))
CodePudding user response:
Add an id
column to your data.frame before unnest
library(dplyr)
library(tidyr)
df %>%
mutate(codes = strsplit(codes, ":", TRUE)) |>
mutate(id = row_number()) |>
unnest(codes) |>
mutate(value = 1) %>%
pivot_wider(names_from = codes,
values_from = value,
values_fill = 0)
##> # A tibble: 3 × 4
##> id `111` `222` `333`
##> <int> <dbl> <dbl> <dbl>
##> 1 1 1 1 1
##> 2 2 0 1 0
##> 3 3 1 1 0
CodePudding user response:
Another approach using separate_rows
:
library(tidyr)
library(dplyr)
df1 %>%
mutate(r = 1:n()) %>%
separate_rows(., codes, sep=":") %>%
table %>%
t
# codes
# r 111 222 333
# 1 1 1 1
# 2 0 1 0
# 3 1 1 0
Although this will give us a table, so if we need a dataframe we should use pivot_wider
instead of table
.
df1 %>%
mutate(r = 1:n(), val = 1) %>%
separate_rows(., codes, sep=":")
pivot_wider(names_from = "codes", values_from = "val", values_fill = 0) %>%
select(-r)
# # A tibble: 3 x 4
# `111` `222` `333`
# <dbl> <dbl> <dbl>
# 1 1 1 1
# 2 0 1 0
# 3 1 1 0
Data:
df1 <- data.frame(codes = c("111:222:333","222", "111:222"))