Home > Software engineering >  How can I parse out text from a string that uses pipes to separate elements of a vector in R?
How can I parse out text from a string that uses pipes to separate elements of a vector in R?

Time:05-18

For R, I have a string that contains information about 3 grades. They look like

"First Grade|Third Grade|Second Grade|Third Grade|First Grade"

I would like to convert this into a vector, which I am hoping to equivalent to the output of:

c("First Grade","Third Grade","Second Grade","Third Grade","First Grade")
> [1] "First Grade"  "Third Grade"  "Second Grade" "Third Grade"  "First Grade" 

Is there a way to do this in R? Thanks.

CodePudding user response:

With stringr, you can use str_split. With simplify = TRUE, the output would be a matrix, and we can use c() to combine them into a vector. Note that we'll need to escape the | sign with double slashes \\.

library(stringr)

string <- "First Grade|Third Grade|Second Grade|Third Grade|First Grade"
c(str_split(string, "\\|", simplify = T))

[1] "First Grade"  "Third Grade"  "Second Grade" "Third Grade" 
[5] "First Grade" 

CodePudding user response:

1) scan Assuming the input is x shown in the Note at the end, we can use scan. The text= argument is the input, the what= argument tells it to regard the fields as character, the sep= argument gives the separator character and the quiet= argument tells it not to display additional information. No packages are used.

scan(text = x, what = "", sep = "|", quiet = TRUE)
## [1] "First Grade"  "Third Grade"  "Second Grade" "Third Grade"  "First Grade"

2) strsplit/unlist Another possibility is strsplit followed by unlist. The fixed=TRUE argument tells it to regard | as an ordinary character, otherwise it has special meaning which we do not want here. strsplit produces a one element list containing the required vector so we unlist it to just get the vector. Again, no packages are used.

unlist(strsplit(x, "|", fixed = TRUE))
## [1] "First Grade"  "Third Grade"  "Second Grade" "Third Grade"  "First Grade"

This could also be expressed as a pipeline:

x |> strsplit("|", fixed = TRUE) |> unlist()
## [1] "First Grade"  "Third Grade"  "Second Grade" "Third Grade"  "First Grade"

If the input were actually a vector of character strings such as c(x, x) then we could omit the unlist part and we would get a list of character strings as output.

Note

x <- "First Grade|Third Grade|Second Grade|Third Grade|First Grade"
  •  Tags:  
  • r
  • Related