Home > database >  regex finding number after first ocurance of substring
regex finding number after first ocurance of substring

Time:03-29

I have a sentence:

"Fourth-quarter 2021 net earnings per share (EPS) of $1.26, compared with 2020 EPS of $1.01; Fourth-quarter 2021 adjusted EPS of $1.11, down 25.5 percent compared with 2020 adjusted EPS of $1.49"

and would like to get number $1.11 after the first substring "adjusted EPS".

The best regex formula I could come with is:

re.search("^.*Adjusted EPS.*?(\$\d .\d ).*", text,re.IGNORECASE).group(1)

but this gives me number $1.49 after second occurrence of "adjusted EPS".

How can I modify the search so I get the number $1.11?

CodePudding user response:

The problem here is greedy regex which you use just in the beginning:

^.*Adj ...

^ means the start of the string. Being greedy, .* "eats" as much characters as possible up until the last "adjusted EPS"

There're two solutions here, either make it non-greedy (i.e. lazy) ^.*?Adj ..., or remove ^.* completely - I see no use of it here

CodePudding user response:

This regex string should work. /adjusted EPS of ?(\$\d .\d )/g

Input:

Fourth-quarter 2021 net earnings per share (EPS) of $1.26, compared with 2020 
EPS of $1.01; Fourth-quarter 2021 adjusted EPS of $1.11, down 25.5 percent 
compared with 2020 adjusted EPS of $1.49

Output: adjusted EPS of $1.11, adjusted EPS of $1.49

Edit: Remove the g at the end of the Regex string to only find one match.

CodePudding user response:

You could use this pattern which looks for "adjusted EPS" and only allows one "$" between it and the end of the line.

/adjusted EPS[^\$] (\$\d \.\d )[^\$] $/gm

the pattern without the endings is

adjusted EPS[^\$] (\$\d \.\d )[^\$] $
  • Related