I have some data that looks like this:
X1
A,B,C,D,E
A,B
A,B,C,D
A,B,C,D,E,F
I want to generate one column that holds the first element of each vector ("A"), and another column that holds all the rest of the values ("B","C" etc.):
X1 Col1 Col2
A,B,C,D,E A B,C,D,E
A,B A B
A,B,C,D A B,C,D
A,B,C,D,E,F A B,C,D,E,F
I have tried the following:
library(dplyr)
testdata <- data.frame(X1 = c("A,B,C,D,E",
"A,B",
"A,B,C,D",
"A,B,C,D,E,F")) %>%
mutate(Col1 = sapply(strsplit(X1, ","), "[", 1),
Col2 = sapply(strsplit(X1, ","), "[", -1))
However I cannot seem to get rid of the pesky vector brackets around the values in Col2. Any way of doing this?
CodePudding user response:
You can use tidyr::separate
with extra = "merge"
:
testdata %>%
tidyr::separate(X1, into = c("Col1","Col2"), sep = ",", extra = "merge", remove = F)
X1 Col1 Col2
1 A,B,C,D,E A B,C,D,E
2 A,B A B
3 A,B,C,D A B,C,D
4 A,B,C,D,E,F A B,C,D,E,F
CodePudding user response:
A possible solution, using tidyr::separate
:
library(tidyverse)
df <- data.frame(
stringsAsFactors = FALSE,
X1 = c("A,B,C,D,E", "A,B", "A,B,C,D", "A,B,C,D,E,F")
)
df %>%
separate(X1, into = str_c("col", 1:2), sep = "(?<=^.),", remove = F)
#> X1 col1 col2
#> 1 A,B,C,D,E A B,C,D,E
#> 2 A,B A B
#> 3 A,B,C,D A B,C,D
#> 4 A,B,C,D,E,F A B,C,D,E,F
CodePudding user response:
Try the base R code below using sub
read.table
cbind(
df,
read.table(
text = sub(",", " ", df$X1)
)
)
which gives
X1 V1 V2
1 A,B,C,D,E A B,C,D,E
2 A,B A B
3 A,B,C,D A B,C,D
4 A,B,C,D,E,F A B,C,D,E,F
CodePudding user response:
You can use str_sub()
function as follow:
> df
# A tibble: 4 x 1
X1
<chr>
1 A,B,C,D,E
2 A,B
3 A,B,C,D
4 A,B,C,D,E,F
> df %>% mutate(X2 = str_sub(X1, 1,1), X3 = str_sub(X1, 3))
# A tibble: 4 x 3
X1 X2 X3
<chr> <chr> <chr>
1 A,B,C,D,E A B,C,D,E
2 A,B A B
3 A,B,C,D A B,C,D
4 A,B,C,D,E,F A B,C,D,E,F