I'm trying to match, using regex, all commas(followed by a space): ,
that are outside any parentheses or square brackets, i.e. the comma should not be contained in the parentheses or square brackets.
The target string is A, An(hi, world[hello, (hi , world) world]); This, These
. In this case, it should match the first comma and the last comma (the ones between A
and An
, this
and these
).
So I could split A, An(hi, world[hello, (hi , world) world]); This, These
into A
, An(hi, world[hello, (hi , world) world]); This
and These
, not leaving parens/brackets unbalanced as a result.
To that end, it seems hard to use regex alone. Is there any other approach to this problem?
The regex expression I'm using:
, (?![^()\[\]]*[\)\]])
But this expression will match other extra two commas ,
(the second and the third) which shouldn't have been matched.
Though if it is matching against the following strings, it'll match the right comma (the first one respectively): A, An(hi, world)
and A, An[hi, world]
But if the parenthesis and brackets contain each other, it'll be problems.
For this string, the matches are as pointed out:
A, An(hi, world[hello, (hi , world) world]) and this, is that, for [the, one (in, here, [is not,])] and last,here!
^ ^------------------------------------^ ^ ^ ^------------------------------^ ^
- So it didn't capture any commas inside any of those bracket/parenthesis groups as it captured them as a whole. Now, you have the commas at the outer level.
CodePudding user response:
zdim mentioned one approach is to use the core Text::Balanced module. Demonstration:
#!/usr/bin/env perl
use strict;
use warnings;
use feature qw/say/;
use Text::Balanced qw/extract_bracketed/;
my $str = "A, An(hi, world[hello, (hi , world) world]); This, These";
my ($inside, $after, $before) = extract_bracketed $str, '()[]', qr/[^([]*/;
my @tokens = (split(/,/, $before//""), $inside, split(/,/, $after//""));
# Displays
# A An (hi, world[hello, (hi , world) world]) ; This These
say join(' ', @tokens);