Match multiple lines that do NOT start with "<p>" without using the "m" mo-CodePudding

I'm trying to match multiple lines that do not begin with a HTML <p> tag, using just the g modifier with the Golang flavor of RegEx.

Here's an example:

Lorem ipsum

<p><span >INNEN. Wohnung, Erdgeschoss – Tag</span></p><br>

Dolor sit amet

1234

<p><span >INNEN. Wieslers Wohnung, Fahrstuhl – Tag</span></p><br>

Et respice finem

<p><span >AUSSEN. Wohnung - Nacht</span></p><br>

<p><span >INNEN. Wohnung, Erdgeschoss – Tag</span></p><br>

<p><span >Maik</span><span >(leise) Hallo.</span></p>

Quod erat demonstrandum

The regex should match the lines and paragraphs that begin with:

Lorem ipsum
Dolor sit amet
1234
Et respice finem
Quod erat demonstrandum

It's easy with the mg modifiers of the Golang flavor: ^([^<\n\r]|<([^p]|$)).*

But I'm looking for a regular expression that works without the m modifier. I can't make it work with just the g modifier.

CodePudding user response：

Instead of matching what you want to keep, you could match what you don't want, and use that as pattern to split the string.

If your text sits in variable s, you could continue like this:

    para := regexp.MustCompile("[\n\r] (<p>.*[\n\r]*)*")
    lines := para.Split(s, -1)
    for _, line := range lines {
        fmt.Println(line)
    }

This will then output:

Lorem ipsum
Dolor sit amet
1234
Et respice finem
Quod erat demonstrandum