Home > Software design >  How can I regexp capture the string between 2 specific sets of double underscores?
How can I regexp capture the string between 2 specific sets of double underscores?

Time:09-21

I want to regexp capture the string between 2 specific sets of double underscores. The string that get captured may itself have single underscore occurrences in it. Here's the test Perl script I've been working with:

#!/usr/bin/env perl
use strict;

my $str = "DFD_20220913_121409_strix1a0__z1_erx_adm__CL1695331__RTL_Dfdsg4__regression__df_umc_nbio_hubs_gfx__220913_150718";
(my $grp) = $str =~ /CL\d \_\_(\w )\_\_/;
print "grp = $grp\n";

exit;

This returns...

grp = RTL_Dfdsg4__regression__df_umc_nbio_hubs_gfx

I want...

grp = RTL_Dfdsg4

As you can see, I know something about where the first set of double underscores exists (after the CL\d ). But for some reason, the regexp reads past the next occurrence of the double underscores until it hits the last set.

CodePudding user response:

You need to use the non-greedy quantifier, ?.

(my $grp) = $str =~ /CL\d __(\w ?)__/;

I removed the unnecessary backslashes from before the underscores.

CodePudding user response:

Note that using the non-greedy modifier is fragile and can easily work different than intended. This is the robust alternative:

my ( $grp ) = $str =~ /
   CL \d 
   __ 
   ( [^\W_]  (?: _ [^\W_]  )* )     # `[^\W_]` is `\w` minus `_`
   __
/x;
  • Related