I have a column like so. The column begins and ends with a ',' and each value is separated by ',,'.
col1
,101,,9,,201,,200,
,201,,101,,102,
,9,,101,,102,,200,,201,
,101,,200,,9,,102,,102,
How can i transform this column into the following:
col1_9 col1_101 col1_102 col1_200 col1_201
1 1 0 1 1
0 1 1 0 1
1 1 1 1 1
1 1 2 1 0
CodePudding user response:
An option could be:
- First remove the "," at begin and end using
str_sub
fromstringr
- One Hot encode the column using
mtabulate
andstrsplit
with sep of ",," - Order the column names based on number
- Finally, give the columns the "col1_" names using
paste0
Which gives this as result:
df <- read.table(text = "col1
,101,,9,,201,,200,
,201,,101,,102,
,9,,101,,102,,200,,201,
,101,,200,,9,,102,,102,", header = TRUE)
library(stringr)
library(qdapTools)
df$col1 <- str_sub(df$col1, 2, -2)
df <- mtabulate(strsplit(df$col1, ",,"))
df <- df[, order(as.numeric(names(df)))]
names(df) <- paste0("col1_", names(df))
df
#> col1_9 col1_101 col1_102 col1_200 col1_201
#> 1 1 1 0 1 1
#> 2 0 1 1 0 1
#> 3 1 1 1 1 1
#> 4 1 1 2 1 0
Created on 2022-07-21 by the reprex package (v2.0.1)
CodePudding user response:
df%>%
mutate(rowid = row_number(), value = 1)%>%
separate_rows(col1)%>%
filter(nzchar(col1)) %>%
pivot_wider(rowid, names_from = col1,
values_fn = sum, names_prefix = 'col1_',
values_fill = 0)
# A tibble: 4 x 6
rowid col1_101 col1_9 col1_201 col1_200 col1_102
<int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 1 1 1 0
2 2 1 0 1 0 1
3 3 1 1 1 1 1
4 4 1 1 0 1 2
in Base R:
a <- setNames(strsplit(trimws(df$col1,white=','), ', '), seq(nrow(df)))
as.data.frame.matrix(t(table(stack(a))))
101 102 200 201 9
1 1 0 1 1 1
2 1 1 0 1 0
3 1 1 1 1 1
4 1 2 1 0 1