Home > OS >  match all commas that are outside parentheses and square brackets in perl regex
match all commas that are outside parentheses and square brackets in perl regex

Time:09-21

I'm trying to match, using regex, all commas(followed by a space): , that are outside any parentheses or square brackets, i.e. the comma should not be contained in the parentheses or square brackets.

The target string is A, An(hi, world[hello, (hi , world) world]); This, These. In this case, it should match the first comma and the last comma (the ones between A and An, this and these).

So I could split A, An(hi, world[hello, (hi , world) world]); This, These into A, An(hi, world[hello, (hi , world) world]); This and These, not leaving parens/brackets unbalanced as a result.

To that end, it seems hard to use regex alone. Is there any other approach to this problem?

The regex expression I'm using: , (?![^()\[\]]*[\)\]])

But this expression will match other extra two commas , (the second and the third) which shouldn't have been matched.

Though if it is matching against the following strings, it'll match the right comma (the first one respectively): A, An(hi, world) and A, An[hi, world]

But if the parenthesis and brackets contain each other, it'll be problems.

More details in this link: enter image description here

For this string, the matches are as pointed out:

A, An(hi, world[hello, (hi , world) world]) and this, is that, for [the, one (in, here, [is not,])] and last,here!
 ^   ^------------------------------------^         ^        ^     ^------------------------------^         ^
  • So it didn't capture any commas inside any of those bracket/parenthesis groups as it captured them as a whole. Now, you have the commas at the outer level.

CodePudding user response:

zdim mentioned one approach is to use the core Text::Balanced module. Demonstration:

#!/usr/bin/env perl
use strict;
use warnings;
use feature qw/say/;
use Text::Balanced qw/extract_bracketed/;

my $str = "A, An(hi, world[hello, (hi , world) world]); This, These";
my ($inside, $after, $before) = extract_bracketed $str, '()[]', qr/[^([]*/;

my @tokens = (split(/,/, $before//""), $inside, split(/,/, $after//""));

# Displays
# A  An (hi, world[hello, (hi , world) world]) ; This  These
say join(' ', @tokens);
  • Related