so I have this example string out of a html mail given:
Abholstellenname (Firmenname, Details): Musterfirma GmbH<br>
I'm using the following expression to find the company name, in this case Musterfirma GmbH
:
(?<=Abholstellenname \(Firmenname, Details\): ).*
But I need to exclude the <br>
tag following the company name.
How can I achieve this?
I would not ask here if I haven't read through the tutorials and still didn't get it.
CodePudding user response:
You can use
(?<=Abholstellenname \(Firmenname, Details\): ).*?(?=<br>|$)
The main idea is to turn the .*
part into a .*?(?=<br>|$)
pattern that matches any zero or more chars other than line break chars as few as possible followed with either <br>
or end of string.
See the regex demo.
If the spaces can be any whitespace chars, replace the literal spaces in the pattern with \s
.
CodePudding user response:
You would need to escape spaces with \s
and escape parenthesis with \(
and \)
[^<br>]
matches any char other than <
, >
, b
and r
. This could work for your <br>
but if you have anything after that, it will be captured again.
(?<=Abholstellenname\s\(Firmenname,\sDetails\):\s).*[^<br>]