Home > Net >  Regular expression for equations, variable number of inside parenthesis
Regular expression for equations, variable number of inside parenthesis

Time:01-13

I'm trying to write Regex for the case where I have series of equations, for example:

a = 2 / (1   exp(-2*n)) - 1
a = 2 / (1   e) - 1
a = 2 / (3*(1   exp(-2*n))) - 1

In any case I need to capture content of the outer parenthesis, so 1 exp(-2*n), 1 e and 3*(1 exp(-2*n)) respectively.

I can write expression that will catch one of them, like:

\(([\w\W]*?\))\) will perfectly catch 1 exp(-2*n)

\(([\w\W]*?)\) will catch 1 e

\(([\w\W]*?\))\)\) will catch 3*(1 exp(-2*n))

But it seems silly to pass three lines of code for something such simple. How can I bundle it? Please take a note that I will be processing text (in loop) line-by-line anyway, so you don't have to bother for securing operator to not greedy take next line.

Edit: Un-nested brackets are also allowed: a = 2 / (1 exp(-2*n)) - (2-5)

CodePudding user response:

The commented code below does not use regular expressions, but does parse char arrays in MATLAB and output the terms which contain top-level brackets.

So in your 3 question examples with a single set of nested brackets, it returns the outermost bracketed term.

In the example from your comment where there are two or more (possibly nested) terms within brackets at the "top level", it returns both terms.

The logic is as follows, see the comments for more details

  • Find the left (opening) and right (closing) brackets
  • Generate the "nest level" according to how many un-closed brackets there are at each point in the equation char
  • Find the indicies where the nesting level changes. We're interested in opening brackets where the nest level increases to 1 and closing brackets where it decreases from 1.
  • Extract the terms from these indices
e = { 'a = 2 / (1   exp(-2*n)) - 1'
      'a = 2 / (1   e) - 1'
      'a = 2 / (3*(1   exp(-2*n))) - 1'
      'a = 2 / (1   exp(-2*n)) - (2-5)' };
  
str = cell(size(e)); % preallocate output
for ii = 1:numel(e)
    str{ii} = parseBrackets_(e{ii});
end


function str = parseBrackets_( equation )
    bracketL = ( equation == '(' ); % indicies of opening brackets
    bracketR = ( equation == ')' ); % indicies of closing brackets
    str = {}; % intialise empty output
    if numel(bracketL) ~= numel(bracketR)
        % Validate the input
        warning( 'Could not match bracket pairs, count mismatch!' )
        return
    end
    
    nL = cumsum( bracketL ); % cumulative open bracket count
    nR = cumsum( bracketR ); % cumulative close bracket count
    nestLevel = nL - nR;     % nest level is number of open brackets not closed
    nestLevelChanged = diff(nestLevel); % Get the change points in nest level
    % get the points where the nest level changed to/from 1
    level1L = find( nestLevel == 1 & [true,nestLevelChanged==1] )   1; 
    level1R = find( nestLevel == 1 & [nestLevelChanged==-1,true] ); 
    
    % Compile cell array of terms within nest level 1 brackets
    str = arrayfun( @(x) equation(level1L(x):level1R(x)), 1:numel(level1L), 'uni', 0 );
end

Outputs:

str = 
    {'1   exp(-2*n)'}
    {'1   e'}
    {'3*(1   exp(-2*n))'}
    {'1   exp(-2*n)'}    {'2-5'}
  • Related