Home > Software engineering >  Match all the characters in a regular expression untill an specific sequence or characters in order
Match all the characters in a regular expression untill an specific sequence or characters in order

Time:10-29

I want to match this text:

<SERIES>
<OWNER-CIK>0000003521
<SERIES-ID>S000020958
<SERIES-NAME>Alger Small Cap Focus Fund
<CLASS-CONTRACT>
<CLASS-CONTRACT-ID>C000059340
<CLASS-CONTRACT-NAME>Alger Small Cap Focus Fund Class I
<CLASS-CONTRACT-TICKER-SYMBOL>AOFIX
</CLASS-CONTRACT>
<CLASS-CONTRACT>
<CLASS-CONTRACT-ID>C000095961
<CLASS-CONTRACT-NAME>Alger Small Cap Focus Fund Class Z
<CLASS-CONTRACT-TICKER-SYMBOL>AGOZX
</CLASS-CONTRACT>
<CLASS-CONTRACT>
<CLASS-CONTRACT-ID>C000179520
<CLASS-CONTRACT-NAME>Class Y
<CLASS-CONTRACT-TICKER-SYMBOL>AOFYX
</CLASS-CONTRACT>
</SERIES>
<SERIES>

From:

<SERIES>

Untill

</SERIES>

I'm trying with:

<SERIES>[^/] 

but it fails at the line with:

</CLASS-CONTRACT>

enter image description here

If I add the S to the regex in finish even earlier since it ends with any of the character / or S appears. I need that both apear /S in that specific order

enter image description here

CodePudding user response:

Just use .*? between the end anchors. You'll need re.S so the . matches newlines. The ? makes it the shortest match, in case the ending anchor appears multiple times.

So the full string would be

r"<SERIES>.*?</SERIES>"

CodePudding user response:

This should work. It uses a lookahead so it knows when to stop.

import re

pattern = re.compile(r'<SERIES>.*(?=\n<SERIES&)',re.S)
print(pattern.findall(text)[0])

output.

<SERIES>
<OWNER-CIK>0000003521
<SERIES-ID>S000020958
<SERIES-NAME>Alger Small Cap Focus Fund
<CLASS-CONTRACT>
<CLASS-CONTRACT-ID>C000059340
<CLASS-CONTRACT-NAME>Alger Small Cap Focus Fund Class I
<CLASS-CONTRACT-TICKER-SYMBOL>AOFIX
</CLASS-CONTRACT>
<CLASS-CONTRACT>
<CLASS-CONTRACT-ID>C000095961
<CLASS-CONTRACT-NAME>Alger Small Cap Focus Fund Class Z
<CLASS-CONTRACT-TICKER-SYMBOL>AGOZX
</CLASS-CONTRACT>
<CLASS-CONTRACT>
<CLASS-CONTRACT-ID>C000179520
<CLASS-CONTRACT-NAME>Class Y
<CLASS-CONTRACT-TICKER-SYMBOL>AOFYX
</CLASS-CONTRACT>
</SERIES>
  • Related