Home > Software design >  RegEx to capture a Pascal procedure's body
RegEx to capture a Pascal procedure's body

Time:09-23

I'm trying to write a RegEx to capture a Pascal procedure's body. My biggest problem so far is capturing a procedure which has a nested procedure inside.

Test string:

Test

procedure A;
   procedure B; 
   begin
   end;
begin
   if True then
   begin
   end;
end;

procedure C;
begin
   if True then
   begin
   end;
end;

The following RegEx captures the body of the procedure A successfully:
/procedure A(?:(?!\nbegin)[\s\S])*\n(begin(?:(?!begin|end;)[\s\S]|(?1))*end;)/g

It avoids the inner procedure by matching everything until it finds a "begin" with no indentation, then it uses recursion to find the matching "end". The problem is that it works on the premise that the code will be properly formatted, which is not something I can count on (and if I could, then I wouldn't even need recursion, just match until it finds an "end" with no indentation as well).

String it should work on too:

Test

procedure a;
procedure b; 
begin
end;
begin
if True then
begin
end;
end;

procedure c;
begin
if True then
begin
end;
end;

Desired match:

procedure a;
procedure b; 
begin
end;
begin
if True then
begin
end;
end;

After hours trying to figure out a solution, I couldn't come up with one that works with an arbitrary number of inner procedures and inner begins/ends. Do you guys have any idea on how to make it work?

CodePudding user response:

You can use

(?=procedure A;)(procedure \w ;\s*(?:(?!procedure|begin|end;)[\s\S]|(?1))*(begin(?:(?!begin|end;)[\s\S]|(?2))*end;))

See the regex demo.

Details:

  • (?=procedure A;) - the current position must be followed with procedure A; text
  • ( - Group 1 start:
    • procedure \w ; - procedure , one or more word chars, ;
    • \s* - zero or more whitespaces
    • (?:(?!procedure|begin|end;)[\s\S]|(?1))* - zero or more repetitions of
      • (?!procedure|begin|end;)[\s\S] - any char other that does not start the procedure, begin or end; char sequence
      • | - or
      • (?1) - regex subroutine recursing Group 1 pattern
    • (begin(?:(?!begin|end;)[\s\S]|(?2))*end;) - Group 2:
      • begin - begin string
      • (?:(?!begin|end;)[\s\S]|(?2))* - zero or more repetitions of any char that does not start the begin or end; char sequence, or Group 2
      • end; - end; string
  • ) - end of Group 1.
  • Related