I am trying to split a column of comma separated characters into several columns (as many as different characters overall). I read similar questions such as this:
Split data frame string column into multiple columns
But the solutions were based on a small number of possible characters, so that the new columns could be named in advanced before the columns was split.
This is my example:
subject <- c(1,2,3)
letters <- c("a, b, f, g", "b, g, m, l", "g, m, z")
df1 <- data.frame(subject, letters)
df1
subject letters
1 1 a, b, f, g
2 2 b, g, m, l
3 3 g, m, z
The desired result would be:
subject a b f g m z
1 1 1 1 1 1 0 0
2 2 0 1 0 1 1 0
3 3 0 0 0 1 1 1
Thanks in advance for any help.
CodePudding user response:
One option using str_split
, unnest_longer
and table
subject <- c(1,2,3)
letters <- c("a, b, f, g", "b, g, m, l", "g, m, z")
df1 <- data.frame(subject, letters)
library(tidyverse)
df1 %>%
mutate(letters = str_split(letters, ', ')) %>%
unnest_longer(letters) %>%
table
#> letters
#> subject a b f g l m z
#> 1 1 1 1 1 0 0 0
#> 2 0 1 0 1 1 1 0
#> 3 0 0 0 1 0 1 1
Created on 2022-02-10 by the reprex package (v2.0.0)
Seeing some of the other answers, separate_rows
is a better solution here
df1 %>%
separate_rows(letters) %>%
table
CodePudding user response:
Another tidyverse
possibility:
library(tidyverse)
df1 %>%
separate_rows(letters) %>%
mutate(value = 1) %>%
pivot_wider(names_from = letters, values_fill = 0)
subject a b f g m l z
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 1 1 1 0 0 0
2 2 0 1 0 1 1 1 0
3 3 0 0 0 1 1 0 1
CodePudding user response:
Another solution:
library(dplyr)
library(tidyr)
df1 %>%
mutate(row = row_number()) %>%
separate_rows(letters, sep = ', ') %>%
pivot_wider(names_from = letters, values_from = letters,
values_fn = function(x) 1, values_fill = 0) %>%
select(-row)
# A tibble: 3 × 8
subject a b f g m l z
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 1 1 1 0 0 0
2 2 0 1 0 1 1 1 0
3 3 0 0 0 1 1 0 1
CodePudding user response:
A base R approach that grepl
s the unique
list of letters against each row of trimws
and strsplit
ed letters. Has the advantage of being able to easily interact with each step manually in case any adjustment needs to be done, e.g. order changes or leaving out letters etc.
list_letter <- sapply(strsplit(df1$letters, ","), trimws)
letter <- unique(unlist(list_letter))
letter
[1] "a" "b" "f" "g" "m" "l" "z"
data.frame( subject=df1$subject,
sapply(letter, function(x) sapply(list_letter, function(y)
any(grepl(x, y))))*1 )
subject a b f g m l z
1 1 1 1 1 1 0 0 0
2 2 0 1 0 1 1 1 0
3 3 0 0 0 1 1 0 1