Home > Enterprise >  Using R to split a string into multiple column instead of a vector in one column
Using R to split a string into multiple column instead of a vector in one column

Time:07-07

I want to split a column in my dataset into multiple columns based on assigned separated sign: "|".

My dataset looks like this:

vname<-c("x1", "x2", "x3","x4")
label<-c("1,Eng |2,Man", "1,yes|2,no|3,dont know", "1,never|2,sometimes|3,usually|4,always", "1,yes|2,No|3,dont know")
df<-data.frame(vname, label)

So, I want to split column: label in to multiple columns based on sign" "|". I used stringr::str_split to do that, and my code is like:

cd2<-df %>%
  select(vname, everything())%>%
  mutate(label=str_split(value, " \\| "))

However, the result returns a vector in column of label. It looks like this:

vname  label
x1     c("1,Eng","2,Man")
x2     c("1,yes","2,no", "3,dont know")
....

My question is that how to get a expected result like this:

vname  label1   label2      label3          label4
x1     1,Eng    2,Man
x2     1,yes    2,no         3, dont know
x3     1,never. 2,sometimes, 3,usually.     4,always
...

Thanks a lot for help~~~

CodePudding user response:

With the code used, it is returning a list (perhaps we have to make sure that there are zero or more spaces as in the example there was no space), we can unnest_wider to convert to new columns

library(dplyr)
library(stringr)
library(tidyr)
df %>%
  select(vname, everything())%>%
  mutate(label=str_split(label, "\\s*\\|\\s*")) %>% 
  unnest_wider(where(is.list), names_sep = "")

-output

# A tibble: 4 × 5
  vname label1  label2      label3      label4  
  <chr> <chr>   <chr>       <chr>       <chr>   
1 x1    1,Eng   2,Man       <NA>        <NA>    
2 x2    1,yes   2,no        3,dont know <NA>    
3 x3    1,never 2,sometimes 3,usually   4,always
4 x4    1,yes   2,No        3,dont know <NA>    

This maybe also done with separate

library(tidyr)
 df %>%
   separate(label, into = str_c('label', 
   seq_len(max(str_count(label, fixed("|")))   1)), 
      sep = "\\|", fill = "right")

-output

 vname  label1      label2      label3   label4
1    x1  1,Eng        2,Man        <NA>     <NA>
2    x2   1,yes        2,no 3,dont know     <NA>
3    x3 1,never 2,sometimes   3,usually 4,always
4    x4   1,yes        2,No 3,dont know     <NA>

CodePudding user response:

you can do this simply by using separate() from {tidyr}


library(tidyr)
library(dplyr)

vname <- c("x1", "x2", "x3", "x4")
label <-
  c(
    "1,Eng |2,Man",
    "1,yes|2,no|3,dont know",
    "1,never|2,sometimes|3,usually|4,always",
    "1,yes|2,No|3,dont know"
  )

df <- data.frame(vname, label)

df %>% 
  separate(label, sep = "\\s*\\|\\s*", into = paste0("label", 1:4))

#> Warning: Expected 4 pieces. Missing pieces filled with `NA` in 3 rows [1, 2, 4].

#>   vname  label1      label2      label3   label4
#> 1    x1   1,Eng       2,Man        <NA>     <NA>
#> 2    x2   1,yes        2,no 3,dont know     <NA>
#> 3    x3 1,never 2,sometimes   3,usually 4,always
#> 4    x4   1,yes        2,No 3,dont know     <NA>


Created on 2022-07-06 by the reprex package (v2.0.1)

  •  Tags:  
  • r
  • Related