regular expression search backwards, How to deal with words with and without?-CodePudding

I tested by https://regexr.com/ There two sample words.

BOND_aa_SB1_66-1.pdf

BOND_bb_SB2.pdf

I want to extract SB1, SB2 from each sample. but my regular expression is not perfect.

It is working

(?<=BOND_.*_).*

But It is difficult to write the following.

I try

(?<=BOND_.*_).*(?=(_|\.))

But first sample result is 'SB1_66-1'

I just want to extract SB1

sb1 The following may or may not exist. if there is content, it can be separated by starting with _.

How should I fix it?

CodePudding user response：

To extract the third underscore-separated term, we can use re.search as follows:

inp = ["BOND_aa_SB1_66-1.pdf", "BOND_bb_SB2.pdf"]
output = [re.search(r'^BOND_[^_] _([^_.] )', x).group(1) for x in inp]
print(output)  # ['SB1', 'SB2']

CodePudding user response：

s = "BOND_aa_SB1_66-1.pdf BOND_bb_SB2.pdf"

(re.findall(r'(SB\d )', s))

['SB1', 'SB2']