Home > Software design >  Split a string first by semicolon and then by space and save it as a dataframe
Split a string first by semicolon and then by space and save it as a dataframe

Time:11-16

I have a string:

a = c("112 271 [X];313 179 [X];125 162;123 131 [X];124 107")

I want to first split it by semicolon ;

b = as.list(strsplit(a, ";")[[1]])

> b
[[1]]
[1] "112 271 [X]"

[[2]]
[1] "313 179 [X]"

[[3]]
[1] "125 162"

[[4]]
[1] "123 131 [X]"

[[5]]
[1] "124 107"

then I want to split b by space, and save the result as a 3-column data frame.

The result looks like:


    A   B   C
1 112 271 [X]
2 313 179 [X]
3 125 162    
4 123 131 [X]
5 124 107    

I don't know how to do it. Thanks for your help.

CodePudding user response:

Replace semicolon with newline then fread with fill, and set the column names:

data.table::fread(gsub(";", "\n", a, fixed = TRUE),
                  fill = TRUE,
                  col.names = LETTERS[1:3])
#      A   B   C
# 1: 112 271 [X]
# 2: 313 179 [X]
# 3: 125 162    
# 4: 123 131 [X]
# 5: 124 107 

CodePudding user response:

A base R option using read.table (similar to @zx8754's data.table solution)

> read.table(text = gsub(";", "\n", a), fill = TRUE, col.names = head(LETTERS, 3))
    A   B   C
1 112 271 [X]
2 313 179 [X]
3 125 162
4 123 131 [X]
5 124 107

CodePudding user response:

A tidyverse solution; the two functions separate and separate_rows are from tidyr (which is part of the tidyverse):

library(tidyr)
data.frame(a) %>%
  separate_rows(a, sep = ";") %>%
  separate(a,
           into = c("A","B","C"),
           sep = "\\s")
# A tibble: 5 × 3
  A     B     C    
  <chr> <chr> <chr>
1 112   271   [X]  
2 313   179   [X]  
3 125   162   NA   
4 123   131   [X]  
5 124   107   NA   

CodePudding user response:

You can also do this with stringr::str_split(). In the example below, I use two consecutive calls to str_split() with simplified outputs to create a character matrix that can then be converted into a data frame.

## Question data --------------------------------------------------
a <- c("112 271 [X];313 179 [X];125 162;123 131 [X];124 107")

require(stringr)
#> Loading required package: stringr
## Split into character matrix ------------------------------------
str_split(a, ";", simplify = TRUE) |>
  str_split("[:space:]", simplify = TRUE) |> 

  ## convert to data frame ----------------------------------------
  as.data.frame() |> 
  setNames(c("A", "B", "C"))

#>     A   B   C
#> 1 112 271 [X]
#> 2 313 179 [X]
#> 3 125 162    
#> 4 123 131 [X]
#> 5 124 107

Created on 2022-11-16 with reprex v2.0.2

  • Related