Home > Blockchain >  Splitting string and keeping delimiter ends up splitting delimiter as well
Splitting string and keeping delimiter ends up splitting delimiter as well

Time:08-20

I've been looking around for answers to this but I just keep doing something wrong with regex, and other solutions haven't been able to fix this.

I am trying to split the following string, splitting by any word that ends with "ministeren" and until a ")" sign - for this string:

"og holder sig alene til den. \r\n Finansministeren (Scharling): For"

I want to get the following:

[1] "og holder sig alene til den. \r\n" [2] "Finansministeren (Scharling): For"

But this is what I get:

[1] "holder sig alene til den. \r\n "
[2] "F"
[3] "i"
[4] "n"
[5] "a"
[6] "n"
[7] "s"
[8] "m"
[9] "inisteren (Scharling): For

I use the following code in R:

strsplit(tekst_test, "(?<=.[a-zA-Z]*ministeren \\([a-zA-Z]*)", perl=T)

Any help would be hugely appreciated.

CodePudding user response:

With base R:

x <- "og holder sig alene til den. \r\n Finansministeren (Scharling): For"


a <- sub(" Finansministeren.*", "", x)  
b <- sub(".*\r\n ", "", x)

c(a,b)

[1] "og holder sig alene til den. \r\n"
[2] "Finansministeren (Scharling): For"

CodePudding user response:

You can also use strsplit with lookbehind:

> x="og holder sig alene til den. \r\n Finansministeren (Scharling): For"
> strsplit(x,"(?<=\r\n)",perl=T)[[1]]
[1] "og holder sig alene til den. \r\n"  " Finansministeren (Scharling): For"
  • Related