I'd like to process with regex the following string, such that I'd get groups of matches having and\or not-having html tags:
zero<pre>one</pre>two three<span>four</span>five
- I also care about the content of the html tag.
Expected result (I denote the group numbering with x
, because I'm not sure what would be the result group number):
match 1: zero
match 2: <pre>one</pre>, group x: one -- and other groups having pre tags
match 3: two three
match 4: <span>four</span>, group x: four -- and other groups having span tags
match 5: five
What I tried (live demo):
((<(.*?)>)?(.*?)(<\/(.*?)>))?
Or differently (live demo):
(<(.*?)>)?(?:.?)
Both don't work.
I think I should just control for the beginning and ending (zero
and five
in the above example), but I can't get it right.
CodePudding user response:
I would replace the .*?
everywhere with what you are really looking for.
- When finding the tagname:
[^>]
- When finding text in tags:
[^<]
The regular expression could be this:
((<([^>] )>)?([^<] )(<\/([^>] )>)?)?
Regex101 playground:
https://regex101.com/r/eXT7YR/1