Home > Software engineering >  How to extract substrings that contain parenthesis with regexp()
How to extract substrings that contain parenthesis with regexp()

Time:06-20

I am trying to extract substrings that come before either a left parenthesis or a dot with regexp(). For example

example1='qwer(1).asdf; qwer(1).zxcv;';

example2='qwer.asdf; qwer.zxcv;';

I tried

expression='(?<varname>.*?)(\(\d \)){?}.';

expression='(?<varname>.*?)(?<others>\(\d \)){?}.';

expression='(?<varname>.*?)\(';

expression='(?<varname>.*?)(';

expression='(?<varname>.*?)/(';

with

parts=regexp(example1,expression,'names');

None worked.

How exactly does matching with parenthesis work in regexp()?

The official documentation doesnt mention how to characters that form operators, quantifiers, etc.

CodePudding user response:

You can use

example1 = '01/11/2000  20-02-2020  03/30/2000  16-04-2020';
expression = '(?<varname>\w )[(.]';
parts=regexp(example1,expression,'names');

See the regex demo. Details:

  • (?<varname>\w ) - Group "varname": one or more word chars
  • [(.] - a ( or . char.

CodePudding user response:

/(?<varname>(\S*?)(?:\(.*?;)|(\S*?)(?:\..*?;))/g

Here's an illustration of parentheses in regex below. I added the above to actually answer your question

 /\(.*?\)/g

This will match the parentheses and everything between for each

The \ escapes the parenthesis, the *? matches the smallest amount so it doesn't match everything between the first and last parentheses and the global flag does it for each occurrence

https://regex101.com/ is a great resource

  • Related