Remove first occurrence of special characters until the first word or word character in R using rege-CodePudding

For my project I am looking into removing parts of text based on the pattern of special characters. I have a long .txt file that has the below structure:

mycharobj=c("---------Some text is here.---------More text is here - [3548]----- Even more text is here.-----------More text is here - [408]--------- Even more text is here again.")

String continues following the above pattern.

My target is to remove parts that start with - and end - [number], such as:

"-----------------------More text is here - [3548]"
"-----------More text is here - [408]"

I am planning to use the below to remove these parts with (will be looped in the future)

library(stringr)
library(qdapRegex)

temp=unlist(regmatches(mycharobj, gregexpr("[[:digit:]] ", mycharobj)))
mycharobj=rm_between(mycharobj, "-", paste(temp[1],"]", sep=""))

but for this to work, I need a regex expression that will remove the first occurrence of "-----------" in text until the first word or word character. If a string starts with text (word or word characters), it needs to ignore this and identify the first occurrence of "-----------" for my potential loop to work.

I was wondering if this can be done with regular expressions? Any help is appreciated. I have a very computationally demanding solution for this; split the string based on the special character "-" and then identify the parts of the text that I need through a set of conditionals. But due to the fact that it takes a lot more of the processing time, this solution is not very scalable for processing a large number of such .txt files.

CodePudding user response：

You can use

gsub("-{9,}(?:(?!-{9}).)*?- \\[\\d ]", "", mycharobj, perl=TRUE)

See the regex demo.

Details:

-{9,} - nine or more - chars
(?:(?!-{9}).)*? - any one char, other than a line break char, zero or more but as few as possible occurrences, that does not start a nine hyphen char sequence
- \[ - a - [ string
\d - one or more digits
] - a ] char.