I've been looking around for answers to this but I just keep doing something wrong with regex, and other solutions haven't been able to fix this.
I am trying to split the following string, splitting by any word that ends with "ministeren" and until a ")" sign - for this string:
"og holder sig alene til den. \r\n Finansministeren (Scharling): For"
I want to get the following:
[1] "og holder sig alene til den. \r\n" [2] "Finansministeren (Scharling): For"
But this is what I get:
[1] "holder sig alene til den. \r\n "
[2] "F"
[3] "i"
[4] "n"
[5] "a"
[6] "n"
[7] "s"
[8] "m"
[9] "inisteren (Scharling): For
I use the following code in R:
strsplit(tekst_test, "(?<=.[a-zA-Z]*ministeren \\([a-zA-Z]*)", perl=T)
Any help would be hugely appreciated.
CodePudding user response:
With base R:
x <- "og holder sig alene til den. \r\n Finansministeren (Scharling): For"
a <- sub(" Finansministeren.*", "", x)
b <- sub(".*\r\n ", "", x)
c(a,b)
[1] "og holder sig alene til den. \r\n"
[2] "Finansministeren (Scharling): For"
CodePudding user response:
You can also use strsplit
with lookbehind:
> x="og holder sig alene til den. \r\n Finansministeren (Scharling): For"
> strsplit(x,"(?<=\r\n)",perl=T)[[1]]
[1] "og holder sig alene til den. \r\n" " Finansministeren (Scharling): For"