I have a sentence:
"Fourth-quarter 2021 net earnings per share (EPS) of $1.26, compared with 2020 EPS of $1.01; Fourth-quarter 2021 adjusted EPS of $1.11, down 25.5 percent compared with 2020 adjusted EPS of $1.49"
and would like to get number $1.11 after the first substring "adjusted EPS".
The best regex formula I could come with is:
re.search("^.*Adjusted EPS.*?(\$\d .\d ).*", text,re.IGNORECASE).group(1)
but this gives me number $1.49 after second occurrence of "adjusted EPS".
How can I modify the search so I get the number $1.11?
CodePudding user response:
The problem here is greedy regex which you use just in the beginning:
^.*Adj ...
^
means the start of the string. Being greedy, .*
"eats" as much characters as possible up until the last "adjusted EPS"
There're two solutions here, either make it non-greedy (i.e. lazy) ^.*?Adj ...
, or remove ^.*
completely - I see no use of it here
CodePudding user response:
This regex string should work.
/adjusted EPS of ?(\$\d .\d )/g
Input:
Fourth-quarter 2021 net earnings per share (EPS) of $1.26, compared with 2020
EPS of $1.01; Fourth-quarter 2021 adjusted EPS of $1.11, down 25.5 percent
compared with 2020 adjusted EPS of $1.49
Output: adjusted EPS of $1.11, adjusted EPS of $1.49
Edit: Remove the g
at the end of the Regex string to only find one match.
CodePudding user response:
You could use this pattern which looks for "adjusted EPS" and only allows one "$" between it and the end of the line.
/adjusted EPS[^\$] (\$\d \.\d )[^\$] $/gm
the pattern without the endings is
adjusted EPS[^\$] (\$\d \.\d )[^\$] $