Home > Software engineering >  Split string column to multiple columns in R/Python
Split string column to multiple columns in R/Python

Time:01-25

I'm trying to split the following string

str = "A (B) C, D (E) F, G, H, a (b) c"

into 9 separate strings like: A, B, C, D, E, {F, G, H}, a, b, c

I've tried

str = "A (B) C, D (E) F, G, H, a (b) c"
strr = stri_split_regex(str, "\\(.*?\\)")
strr

and it returns me strr as A, {C, D}, {F, G, H, a}, c

The actual string I'm working with looks like

str2 = "Independent Spirit Award  (Co-Nominee)  for Anomalisa, Academy Award  (Co-Nominee)  for Anomalisa, Independent Spirit Award  (Co-Winner)  for Synecdoche, New York, Independent Spirit Award  (Nominee)  for Synecdoche, New York"

and I want that to be separated into

Independent Spirit Award; Co-Nominee; for Anomalisa; Academy Award; Co-Nominee; for Anomalisa; Independent Spirit Award; Co-Winner; for Synecdoche, New York; Independent Spirit Award; Nominee; for Synecdoche, New York;

So I think what I need is to split the string so that each separation is done at the brackets, and the letters both inside and outside of the brackets are kept. There's also a tricky part that the commas are placed irregularly, but that I only want the letter right after the closest comma of the next '(' is kept in a separate column.

CodePudding user response:

This pattern splits by open or close paren, or the last comma before an open paren, as well as any adjacent whitespace.

For str:

library(stringi)

stri_split_regex(str, "\\s*(\\(|\\)|,(?=[^,] \\)))\\s*") 
[[1]]
[1] "A"       "B"       "C"       "D"       "E"       "F, G, H" "a"      
[8] "b"       "c"

For str2:

stri_split_regex(str2, "\\s*(\\(|\\)|,(?=[^,] \\)))\\s*") 
[[1]]
 [1] "Independent Spirit Award" "Co-Nominee"              
 [3] "for Anomalisa"            "Academy Award"           
 [5] "Co-Nominee"               "for Anomalisa"           
 [7] "Independent Spirit Award" "Co-Winner"               
 [9] "for Synecdoche, New York" "Independent Spirit Award"
[11] "Nominee"                  "for Synecdoche, New York"
  • Related