I think this should be simple, but I can't find another example that works for my purposes. I have many DNA sequences in 1 column in R, but I would like to split them into many columns with 1 base pair per column. For example:
V$1
ggggcc
cccctt
tttttt
aaaaaa
I want it to look like
V$1 V$2 V$3 V$4 V$5 V$6
g g g g c c
c c c c t t
t t t t t t
a a a a a a
I have tried
paste(L1HS2, collapse = "")
unlist(strsplit(L1HS2, split = ""))
and
data.frame(str_split_fixed(L1HS2, "", max(nchar(L1HS2))))
But I lose the data frame structure and end up with 1 very long row with many columns. This has to be easy, right? TIA!
CodePudding user response:
You could use
data.frame(Reduce(rbind, strsplit(df$V1, "")))
This returns
X1 X2 X3 X4 X5 X6
init g g g g c c
X c c c c t t
X.1 t t t t t t
X.2 a a a a a a
or
data.frame(do.call(rbind, strsplit(df$V1, "")))
which returns
X1 X2 X3 X4 X5 X6
1 g g g g c c
2 c c c c t t
3 t t t t t t
4 a a a a a a
CodePudding user response:
You can use separate
from tidyr
.
# first the data:
'V1
ggggcc
cccctt
tttttt
aaaaaa' %>% data.table::fread(data.table = FALSE) -> df
sl <- seq_len(nchar(df$V1[1]))
separate(df, V1, paste0('X', sl), sep = sl)
X1 X2 X3 X4 X5 X6 1 g g g g c c 2 c c c c t t 3 t t t t t t 4 a a a a a a
Separating on the empty string (""
) doesn't work very nicely with separate
, so I separate
on each numeric position instead.
CodePudding user response:
Another possible solution:
library(tidyverse)
df <- data.frame(V1 = c("ggggcc", "cccctt", "tttttt", "aaaaaa"))
df %>%
mutate(map_df(V1, ~ str_split(.x, "") %>% map(~ set_names(., str_c("V", 1:6)))))
#> V1 V2 V3 V4 V5 V6
#> 1 g g g g c c
#> 2 c c c c t t
#> 3 t t t t t t
#> 4 a a a a a a