I tested by https://regexr.com/ There two sample words.
BOND_aa_SB1_66-1.pdf
BOND_bb_SB2.pdf
I want to extract SB1, SB2 from each sample. but my regular expression is not perfect.
It is working
(?<=BOND_.*_).*
But It is difficult to write the following.
I try
(?<=BOND_.*_).*(?=(_|\.))
But first sample result is 'SB1_66-1'
I just want to extract SB1
sb1 The following may or may not exist. if there is content, it can be separated by starting with _.
How should I fix it?
CodePudding user response:
To extract the third underscore-separated term, we can use re.search
as follows:
inp = ["BOND_aa_SB1_66-1.pdf", "BOND_bb_SB2.pdf"]
output = [re.search(r'^BOND_[^_] _([^_.] )', x).group(1) for x in inp]
print(output) # ['SB1', 'SB2']
CodePudding user response:
s = "BOND_aa_SB1_66-1.pdf BOND_bb_SB2.pdf"
(re.findall(r'(SB\d )', s))
['SB1', 'SB2']