regular expressions regex problem with three dots in text ... the search patter fails after . dot-CodePudding

It failed to detect after first . dot of ...

test expression at regex101 https://regex101.com/r/7fJG8W/1

search pattern

"(<textarea [a-zA-Z=\s\d\w\"]*>)([|~a-zA-Z\s\w\d.:,\ #\-()/[\]?=!*%$@&`\{}\^;:'\"  ]*)"gm

sample text

<textarea id="source">
```markdown
1. First ordered list item
2. Another item* Unordered sub-list. 
1. Actual numbers don't matter, just that it's a number1. Ordered sub-list
4. And another item.

You can have properly indented paragraphs within list items. Notice the blank line above, and the leading spaces (at least one, but we'll use three here to also align the raw Markdown).

To have a line break without a paragraph, you will need to use two trailing spaces...
...Note that this line is separate but within the same paragraph.⋅⋅
⋅⋅⋅(This is contrary to the typical GFM line break behavior, where trailing spaces are not required.)

CodePudding user response：

The reason is simple: those characters at the end of the line (⋅⋅) are not in the character class you have in the regular expression. But there are many, many more characters that would be allowed in a textarea element.

It is not advised to parse HTML with a regular expression, but to use a DOM parser instead.

But a quick fix for the actual problem you encountered is to make the match stop at </textarea> and nothing else:

(<textarea\b[^>]*>)((?!</textarea>).)*

This regex needs the s flag so . can also match newline characters.

See regex101.com