I'm trying to match multiple lines that do not begin with a HTML <p>
tag, using just the g
modifier with the Golang flavor of RegEx.
Here's an example:
Lorem ipsum
<p><span >INNEN. Wohnung, Erdgeschoss – Tag</span></p><br>
Dolor sit amet
1234
<p><span >INNEN. Wieslers Wohnung, Fahrstuhl – Tag</span></p><br>
Et respice finem
<p><span >AUSSEN. Wohnung - Nacht</span></p><br>
<p><span >INNEN. Wohnung, Erdgeschoss – Tag</span></p><br>
<p><span >Maik</span><span >(leise) Hallo.</span></p>
Quod erat demonstrandum
The regex should match the lines and paragraphs that begin with:
- Lorem ipsum
- Dolor sit amet
- 1234
- Et respice finem
- Quod erat demonstrandum
It's easy with the mg
modifiers of the Golang flavor: ^([^<\n\r]|<([^p]|$)).*
But I'm looking for a regular expression that works without the m
modifier. I can't make it work with just the g
modifier.
CodePudding user response:
Instead of matching what you want to keep, you could match what you don't want, and use that as pattern to split the string.
If your text sits in variable s
, you could continue like this:
para := regexp.MustCompile("[\n\r] (<p>.*[\n\r]*)*")
lines := para.Split(s, -1)
for _, line := range lines {
fmt.Println(line)
}
This will then output:
Lorem ipsum
Dolor sit amet
1234
Et respice finem
Quod erat demonstrandum