I am not an expert with regex, and have an issue trying to convert a html string to an array of html elements, so the idea was if we get by example:
Sample String:
<p>Welcome to my awesome website for more info <a href="www.myanotherawesomewebsite.com" target="_blank">click here</a></p>
(which actually can be any possible combination)
So I wanted to get something like :
'<p>', 'Welcome to my awesome website for more info','<a href="www.myanotherawesomewebsite.com" target="_blank">', 'click here','</a>',</p>'
So this could be achieved with the next regex:
/(<[^>] >|[a-zè A-Z0-9] )?/g
So using match function, for testing:
'<p>Welcome to my awesome website for more info <a href="www.myanotherawesomewebsite.com" target="_blank">click here</a></p>'.match(/(<[^>] >|[a-zè A-Z0-9] )?/g)
and this one works, however there is a problem going on with the languages, for everything apart english works okay, but when I have characters in french, or german, this doesn't work anymore...
The work around was to do something like:
/(<[^>] >|[a-zàâäèéêëîïôœùûüÿçäöüÄÖÜÀÂÄÈÉÊËÎÏÔŒÙÛÜŸÇß!#.?”“«» A-Z0-9\-\u00A0] )?/g
which works but not 100%, also, not working at all with things like 'sup' or 'sub', etc...
so my question is... there is a way to improve this? Help and advices will be very welcome. Thank you in advance for reading...
CodePudding user response:
You can simply use [^<]
for the non-tag node instead of enumeration of characters.
Also, I don't think you need question mark at the end. It would help only if you had an empty string input.
So the result regexp is /(<[^>] >|[^<] )/g