Home > Mobile >  RegEx to match all lines that have a specific quantity of delimiters or less
RegEx to match all lines that have a specific quantity of delimiters or less

Time:10-02

I am looking for a RegEx to match all lines that have a specific quantity of delimiters or less.

For example, I have a large file with - as the delimiter

IWant-id-name-email-tel
this-919-yoda-yoda@republic.com- 10107327863876
this-350-mando-mando@fuckeverything.com-null
this-838-vader-vaderules@empire.com- 83389389083
oops-111-c-3po-c3po@nopenis.tatooine- 190012904829

As you can see, entry 111 is corrupted by the excess of ----
And I don't know how many dashes can contain:
[email protected] 190012904829

/^(.*?-){4,}.*$/ - I can match the lines that exceed with this.

/^(.*?-){,4}.*$/ - But I can not reverse it to match the entire line with few delimiters.

/^(.*?-){4}[^-] $/ - Nor does it specify the exact quantity as, to match the entire line, it will match the incorrect ones as well.

That's necessary to leave the corrupted lines in some large file editor to export them for analysis.
This is the way

CodePudding user response:

Using .* can also match a - and {4,} will match 4 or more occurrences.

In your last pattern ^(.*?-){4}[^-] $ you match exactly 4 repetitions.


You could use a quantifier 1-4 instead to match 1 to 4 times a dash.

Using [^\n-]* will match any char except - and the \n in the character class is to not cross matching a newline.

^(?:[^\n-]*-){1,4}[^\n-]*$

Regex demo

CodePudding user response:

To match a line with 4 - or less, try

/^([^-]*-){0,4}[^-]*$/

This assumes ^ and $ to match the beginning and end of a line as opposed to the beginning and end of a string. Depending on your regex engine you may have to enable this mode first.

CodePudding user response:

Another way is to use a negative lookahead to reject lines that have at least five -'s:

^(?!(?:.*\-){5}).*

with the multi-line option invoked that causes ^ and $ to match the beginning and end of a line.

In action!

  • Related