I want to split a column in my dataset into multiple columns based on assigned separated sign: "|".
My dataset looks like this:
vname<-c("x1", "x2", "x3","x4")
label<-c("1,Eng |2,Man", "1,yes|2,no|3,dont know", "1,never|2,sometimes|3,usually|4,always", "1,yes|2,No|3,dont know")
df<-data.frame(vname, label)
So, I want to split column: label in to multiple columns based on sign" "|". I used stringr::str_split to do that, and my code is like:
cd2<-df %>%
select(vname, everything())%>%
mutate(label=str_split(value, " \\| "))
However, the result returns a vector in column of label. It looks like this:
vname label
x1 c("1,Eng","2,Man")
x2 c("1,yes","2,no", "3,dont know")
....
My question is that how to get a expected result like this:
vname label1 label2 label3 label4
x1 1,Eng 2,Man
x2 1,yes 2,no 3, dont know
x3 1,never. 2,sometimes, 3,usually. 4,always
...
Thanks a lot for help~~~
CodePudding user response:
With the code used, it is returning a list
(perhaps we have to make sure that there are zero or more spaces as in the example there was no space), we can unnest_wider
to convert to new columns
library(dplyr)
library(stringr)
library(tidyr)
df %>%
select(vname, everything())%>%
mutate(label=str_split(label, "\\s*\\|\\s*")) %>%
unnest_wider(where(is.list), names_sep = "")
-output
# A tibble: 4 × 5
vname label1 label2 label3 label4
<chr> <chr> <chr> <chr> <chr>
1 x1 1,Eng 2,Man <NA> <NA>
2 x2 1,yes 2,no 3,dont know <NA>
3 x3 1,never 2,sometimes 3,usually 4,always
4 x4 1,yes 2,No 3,dont know <NA>
This maybe also done with separate
library(tidyr)
df %>%
separate(label, into = str_c('label',
seq_len(max(str_count(label, fixed("|"))) 1)),
sep = "\\|", fill = "right")
-output
vname label1 label2 label3 label4
1 x1 1,Eng 2,Man <NA> <NA>
2 x2 1,yes 2,no 3,dont know <NA>
3 x3 1,never 2,sometimes 3,usually 4,always
4 x4 1,yes 2,No 3,dont know <NA>
CodePudding user response:
you can do this simply by using separate()
from {tidyr}
library(tidyr)
library(dplyr)
vname <- c("x1", "x2", "x3", "x4")
label <-
c(
"1,Eng |2,Man",
"1,yes|2,no|3,dont know",
"1,never|2,sometimes|3,usually|4,always",
"1,yes|2,No|3,dont know"
)
df <- data.frame(vname, label)
df %>%
separate(label, sep = "\\s*\\|\\s*", into = paste0("label", 1:4))
#> Warning: Expected 4 pieces. Missing pieces filled with `NA` in 3 rows [1, 2, 4].
#> vname label1 label2 label3 label4
#> 1 x1 1,Eng 2,Man <NA> <NA>
#> 2 x2 1,yes 2,no 3,dont know <NA>
#> 3 x3 1,never 2,sometimes 3,usually 4,always
#> 4 x4 1,yes 2,No 3,dont know <NA>
Created on 2022-07-06 by the reprex package (v2.0.1)