I have a string:
a = c("112 271 [X];313 179 [X];125 162;123 131 [X];124 107")
I want to first split it by semicolon ;
b = as.list(strsplit(a, ";")[[1]])
> b
[[1]]
[1] "112 271 [X]"
[[2]]
[1] "313 179 [X]"
[[3]]
[1] "125 162"
[[4]]
[1] "123 131 [X]"
[[5]]
[1] "124 107"
then I want to split b
by space
, and save the result as a 3-column data frame.
The result looks like:
A B C
1 112 271 [X]
2 313 179 [X]
3 125 162
4 123 131 [X]
5 124 107
I don't know how to do it. Thanks for your help.
CodePudding user response:
Replace semicolon with newline then fread with fill, and set the column names:
data.table::fread(gsub(";", "\n", a, fixed = TRUE),
fill = TRUE,
col.names = LETTERS[1:3])
# A B C
# 1: 112 271 [X]
# 2: 313 179 [X]
# 3: 125 162
# 4: 123 131 [X]
# 5: 124 107
CodePudding user response:
A base R option using read.table
(similar to @zx8754's data.table
solution)
> read.table(text = gsub(";", "\n", a), fill = TRUE, col.names = head(LETTERS, 3))
A B C
1 112 271 [X]
2 313 179 [X]
3 125 162
4 123 131 [X]
5 124 107
CodePudding user response:
A tidyverse
solution; the two functions separate
and separate_rows
are from tidyr
(which is part of the tidyverse
):
library(tidyr)
data.frame(a) %>%
separate_rows(a, sep = ";") %>%
separate(a,
into = c("A","B","C"),
sep = "\\s")
# A tibble: 5 × 3
A B C
<chr> <chr> <chr>
1 112 271 [X]
2 313 179 [X]
3 125 162 NA
4 123 131 [X]
5 124 107 NA
CodePudding user response:
You can also do this with stringr::str_split()
. In the example below, I use two consecutive calls to str_split()
with simplified outputs to create a character matrix that can then be converted into a data frame.
## Question data --------------------------------------------------
a <- c("112 271 [X];313 179 [X];125 162;123 131 [X];124 107")
require(stringr)
#> Loading required package: stringr
## Split into character matrix ------------------------------------
str_split(a, ";", simplify = TRUE) |>
str_split("[:space:]", simplify = TRUE) |>
## convert to data frame ----------------------------------------
as.data.frame() |>
setNames(c("A", "B", "C"))
#> A B C
#> 1 112 271 [X]
#> 2 313 179 [X]
#> 3 125 162
#> 4 123 131 [X]
#> 5 124 107
Created on 2022-11-16 with reprex v2.0.2