I'm not really versatile in regex, especially multi-line so i hope someone can help me out here.
Based on the following example, I'm trying to find all the field definitions of type Code that don't have the "TableRelation"-property set.
so in this example, this would be the field "Holding Name"
table 123 "MyTable"
{
fields
{
field(1000; "Created on"; Date)
{
Caption = 'Created on';
DataClassification = CustomerContent;
Editable = false;
}
field(2000; "Created by"; Code[50])
{
Caption = 'Created by';
TableRelation = User."User Name";
DataClassification = CustomerContent;
Editable = false;
}
field(3000; Resigned; Boolean)
{
Caption = 'Resigned';
DataClassification = CustomerContent;
}
field(4000; "Holding No."; Code[20])
{
Caption = 'Holding No.';
TableRelation = Contact."No." where(Type = const(Company));
DataClassification = CustomerContent;
trigger OnValidate()
var
[...]
begin
[...]
end;
}
field(4010; "Holding Name"; Code[100])
{
Caption = 'Holding Name';
DataClassification = CustomerContent;
}
field(5000; "Geocoding Entry No."; Integer)
{
Caption = 'Geocoding Entry No.';
DataClassification = CustomerContent;
}
}
keys
{
key(AppliesToContact; "Holding No.", "Holding Name", "Company Level") { }
}
}
I Managed to match the fields of type "Code" properly... field\(\d ;. ; ?(?:C|c)ode\[\d \]\)\n?\s*\{(?:\n|.)*?\}
But i don't know how to correctly exclude matches containing "TableRelation" at least this doesn't work the way I hoped. - I get one HUGE match with it :-(
field\(\d ;. ; ?(?:C|c)ode\[\d \]\)\n?\s*\{((?!(T|t)able(R|r)elation)\n*.*)*?\}
p.s. if you're wondering: The sample I'm parsing is written in AL-Language, a proprietary language for MS Business central.
CodePudding user response:
You can match field(
and the digits between the square brackets before the closing parenthesis using a negated character class starting with [^
The same negated character class approach can also be taken for asserting not TableRelation
between curly braces.
Not that you can write (?:C|c)
as [Cc]
using a character class instead of using an alternation |
Assuming the curly brace after field has a single non nested closing curly:
field\([^()] ; ?[Cc]ode\[\d \]\)\s*{(?![^{}]*[Tt]able[Rr]elation)[^{}]*}
The pattern matches:
field\([^()]
Matchfield(
and 1 chars other than(
)
(which can also match a newline); ?[Cc]ode
Match;
optional space and Code/code\[\d \]\)
Match[
1 digits])
\s*{
Match optional whitespace chars (which can also match a newline) and{
(?![^{}]*[Tt]able[Rr]elation)
Negative lookahead, assert notTableRelation
after the openin curly[^{}]*
Match optional repetitions of any character except{
}
}
Match closing}
See a regex demo.
CodePudding user response:
The following regex can capture the Code[...]
value of areas not having 'TableRelation'.
/field\([^)] ; Code\[(\d )\]\)\n\s {((?!TableRelation).) ?}\n/gs
It uses g
(global) and s
(dotall) flags.
A notable part of this regexp is the ((?!TableRelation).) ?
expression.
(?!TableRelation)
: negative lookahead(should not appear)((?!TableRelation).) ?
: not having 'TableRelation', match as few as possible
I created a simple JS snippet. The code uses two steps to extract.
const regexp = /field\([^)] ; Code\[(\d )\]\)\n\s {((?!TableRelation).) ?}\n/gs;
const target = `
table 123 "MyTable"
{
fields
{
field(1000; "Created on"; Date)
{
Caption = 'Created on';
DataClassification = CustomerContent;
Editable = false;
}
field(2000; "Created by"; Code[50])
{
Caption = 'Created by';
TableRelation = User."User Name";
DataClassification = CustomerContent;
Editable = false;
}
field(3000; Resigned; Boolean)
{
Caption = 'Resigned';
DataClassification = CustomerContent;
}
field(4000; "Holding No."; Code[20])
{
Caption = 'Holding No.';
TableRelation = Contact."No." where(Type = const(Company));
DataClassification = CustomerContent;
trigger OnValidate()
var
[...]
begin
[...]
end;
}
field(4010; "Holding Name"; Code[100])
{
Caption = 'Holding Name';
DataClassification = CustomerContent;
}
field(4050; "Holding Name"; Code[80])
{
Caption = 'Holding Name 2';
DataClassification = CustomerContent;
}
field(5000; "Geocoding Entry No."; Integer)
{
Caption = 'Geocoding Entry No.';
DataClassification = CustomerContent;
}
}
keys
{
key(AppliesToContact; "Holding No.", "Holding Name", "Company Level") { }
}
}
`;
// step 1: extract field(...){...} chunks that do not contain "TableRelation"
const matchedBlocks = target.match(regexp);
// step 2: extract code values
const codes = matchedBlocks.map(m => m.match(/; Code\[(\d )\]/)[1] );
console.log(codes);
CodePudding user response:
With a caseless research and with a regex engine that allows atomic groups and possessive quantifiers, you can write:
\bfield\((?>[^);]*;\s*)*code\b[^)]*\)\s*{(?>[^\w}]* (?!tablerelation\s*=)\w )*[^\w}]*}
This pattern is based on negative character classes to stops the greedy quantifiers as in The four birds answer.
Atomic groups (?>...)
and possessive quantifiers *
are used to reduce the backtracking. In particular, the presence of tablerelation
is only tested after a range of non-word characters with a negative lookahead.
Note that the code
part can be everywhere between the parenthesis after field
.