Home > Enterprise >  Regex - select first html tag in line with string
Regex - select first html tag in line with string

Time:11-25

I am trying to write a regex which match the first tag in line with string "Apple"

Example:

<div class="tn-atom" field="tn_text_1584898828640" target="_blank">Apple</div>

I need to select this tag:

<div class="tn-atom" field="tn_text_1584898828640" target="_blank">

I can select all lines with regex ^(?!.*(Apple)).*$

But how can I select only the first tag?

CodePudding user response:

^.*(?=(Apple)) will select everything up until the last case-insensitive instance of "Apple"

CodePudding user response:

This pattern ^(?!.*(Apple)).*$ will match a line that does not contain Apple.

If there are no angle brackets in between, you can use a capture group and capture from <...> without any brackets in between followed by matching Apple.

^\s*(<([^\s<>] )[^<>]*>)Apple<\/\2>

See a regex demo.

The value is in capture group 1.

Note that this can be an error prone solution when dealing with HTML that can contain nested elements and < > chars by themselves. If you have the availability to use a DOM parser I would advice to use that instead.

  • Related