Regex: Exclude matches containing specific strings-CodePudding

I'm not really versatile in regex, especially multi-line so i hope someone can help me out here.

Based on the following example, I'm trying to find all the field definitions of type Code that don't have the "TableRelation"-property set.

so in this example, this would be the field "Holding Name"

table 123 "MyTable"
{
    fields
    {
        field(1000; "Created on"; Date)
        {
            Caption = 'Created on';
            DataClassification = CustomerContent;
            Editable = false;
        }
        field(2000; "Created by"; Code[50])
        {
            Caption = 'Created by';
            TableRelation = User."User Name";
            DataClassification = CustomerContent;
            Editable = false;
        }
        field(3000; Resigned; Boolean)
        {
            Caption = 'Resigned';
            DataClassification = CustomerContent;
        }
        field(4000; "Holding No."; Code[20])
        {
            Caption = 'Holding No.';
            TableRelation = Contact."No." where(Type = const(Company));
            DataClassification = CustomerContent;

            trigger OnValidate()
            var
               [...]
            begin
               [...]
            end;

        }
        field(4010; "Holding Name"; Code[100])
        {
            Caption = 'Holding Name';
            DataClassification = CustomerContent;
        }
        field(5000; "Geocoding Entry No."; Integer)
        {
            Caption = 'Geocoding Entry No.';
            DataClassification = CustomerContent;
        }
    }
    keys
    {
        key(AppliesToContact; "Holding No.", "Holding Name", "Company Level") { }
    }
}

I Managed to match the fields of type "Code" properly... field\(\d ;. ; ?(?:C|c)ode\[\d \]\)\n?\s*\{(?:\n|.)*?\}

But i don't know how to correctly exclude matches containing "TableRelation" at least this doesn't work the way I hoped. - I get one HUGE match with it :-(

field\(\d ;. ; ?(?:C|c)ode\[\d \]\)\n?\s*\{((?!(T|t)able(R|r)elation)\n*.*)*?\}

p.s. if you're wondering: The sample I'm parsing is written in AL-Language, a proprietary language for MS Business central.

CodePudding user response：

You can match field( and the digits between the square brackets before the closing parenthesis using a negated character class starting with [^

The same negated character class approach can also be taken for asserting not TableRelation between curly braces.

Not that you can write (?:C|c) as [Cc] using a character class instead of using an alternation |

Assuming the curly brace after field has a single non nested closing curly:

field\([^()] ; ?[Cc]ode\[\d \]\)\s*{(?![^{}]*[Tt]able[Rr]elation)[^{}]*}

The pattern matches:

field\([^()] Match field( and 1 chars other than ( ) (which can also match a newline)
; ?[Cc]ode Match ; optional space and Code/code
\[\d \]\) Match [ 1 digits ])
\s*{ Match optional whitespace chars (which can also match a newline) and {
(?![^{}]*[Tt]able[Rr]elation) Negative lookahead, assert not TableRelation after the openin curly
[^{}]* Match optional repetitions of any character except { }
} Match closing }

See a regex demo.

CodePudding user response：

The following regex can capture the Code[...] value of areas not having 'TableRelation'.

/field\([^)] ; Code\[(\d )\]\)\n\s {((?!TableRelation).) ?}\n/gs

It uses g(global) and s(dotall) flags.

A notable part of this regexp is the ((?!TableRelation).) ? expression.

(?!TableRelation) : negative lookahead(should not appear)
((?!TableRelation).) ? : not having 'TableRelation', match as few as possible

I created a simple JS snippet. The code uses two steps to extract.

const regexp = /field\([^)] ; Code\[(\d )\]\)\n\s {((?!TableRelation).) ?}\n/gs;

const target = `
table 123 "MyTable"
{
    fields
    {
        field(1000; "Created on"; Date)
        {
            Caption = 'Created on';
            DataClassification = CustomerContent;
            Editable = false;
        }
        field(2000; "Created by"; Code[50])
        {
            Caption = 'Created by';
            TableRelation = User."User Name";
            DataClassification = CustomerContent;
            Editable = false;
        }
        field(3000; Resigned; Boolean)
        {
            Caption = 'Resigned';
            DataClassification = CustomerContent;
        }
        field(4000; "Holding No."; Code[20])
        {
            Caption = 'Holding No.';
            TableRelation = Contact."No." where(Type = const(Company));
            DataClassification = CustomerContent;

            trigger OnValidate()
            var
               [...]
            begin
               [...]
            end;

        }
        field(4010; "Holding Name"; Code[100])
        {
            Caption = 'Holding Name';
            DataClassification = CustomerContent;
        }
        field(4050; "Holding Name"; Code[80])
        {
            Caption = 'Holding Name 2';
            DataClassification = CustomerContent;
        }
        field(5000; "Geocoding Entry No."; Integer)
        {
            Caption = 'Geocoding Entry No.';
            DataClassification = CustomerContent;
        }
    }
    keys
    {
        key(AppliesToContact; "Holding No.", "Holding Name", "Company Level") { }
    }
}
`;

// step 1: extract field(...){...} chunks that do not contain "TableRelation"
const matchedBlocks = target.match(regexp);

// step 2: extract code values
const codes = matchedBlocks.map(m => m.match(/; Code\[(\d )\]/)[1] );
console.log(codes);

CodePudding user response：

With a caseless research and with a regex engine that allows atomic groups and possessive quantifiers, you can write:

\bfield\((?>[^);]*;\s*)*code\b[^)]*\)\s*{(?>[^\w}]* (?!tablerelation\s*=)\w )*[^\w}]*}

demo

This pattern is based on negative character classes to stops the greedy quantifiers as in The four birds answer. Atomic groups (?>...) and possessive quantifiers * are used to reduce the backtracking. In particular, the presence of tablerelation is only tested after a range of non-word characters with a negative lookahead. Note that the code part can be everywhere between the parenthesis after field.