Home > Software design >  Perl: iterate through multiline match
Perl: iterate through multiline match

Time:06-18

I would like iterate through a multiline pattern in Perl, but I'm struggling with the syntax.

My input string is:


       STAR-WARS        2020-01-01 00:00:00 00:00
S&W    #00000000
%%SHOW NAME: Q=Kenobi;%%
RETCODE = 0  Operation success

In-universe information
-----------------------
                 Species  =  Human
                  Gender  =  Male
television series of num  =  whatever

(Number of results = 1)

Personal Details
----------------
                        First Name  =  Obi-Wan
                         Last Name  =  Kenobi
                             Alias  =  Padawan
                                    =  Jedi Knight
                                    =  Jedi General
                                    =  Jedi Master
Points to other set of information  =  whatever

(Number of results = 1)

Other attribute
---------------
Significant other  =  Satine Kryze
      Affiliation  =  Jedi Order
                   =  Galactic Republic
                   =  Rebel Alliance
       Occupation  =  Jedi

(Number of results = 1)

---    END

My desired resulting hash would be:

$VAR1 = {
          'In-universe information' => {
                                         'Gender' => 'Male',
                                         'Species' => 'Human',
                                         'results' => '1',
                                         'television series of num' => 'whatever'
                                       },
          'Other attribute' => {
                                 'Affiliation' => [
                                                    'Jedi Order',
                                                    'Galactic Republic',
                                                    'Rebel Alliance'
                                                  ],
                                 'Occupation' => 'Jedi',
                                 'Significant other' => 'Satine Kryze',
                                 'results' => '1'
                               },
          'Personal Details' => {
                                  'Alias' => [
                                               'Padawan',
                                               'Jedi Knight',
                                               'Jedi General',
                                               'Jedi Master'
                                             ],
                                  'First Name' => 'Obi-Wan',
                                  'Last Name' => 'Kenobi',
                                  'Points to other set of information' => 'whatever',
                                  'results' => '1'
                                },
          'code' => '0',
          'description' => 'Operation success'
        };

What I have come up with works well for a "single block" (e.g. Personal Details above). However, if the data contains multiple blocks, I can't figure out how to iterate through every matching block. (e.g. use while with /g)

#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;

local $/;
my $output = <DATA>;

my %hash;
($hash{'code'}, $hash{'description'}) = $output =~ /^RETCODE = (\d )\s (.*)\n/m;

if ($hash{'code'} eq "0") {
    my ($type,$data, $results) = $output =~ /([^\n] )\n- \n(.*)\n\n\(Number of results = (\d )\)\n\n/sm;
    my $previousKey = "";
    while ($data =~ /(. )$/mg) {
    my $line = $1;
    $line =~ s/(?:^  )//g;
    my ($key, $value);
    if ($line =~ /^\s*=  /) {
        ($value) = $line =~ /^\s*=  (.*)$/;
        $hash{$type}{$previousKey} = [ $hash{$type}{$previousKey} ] unless ref($hash{$type}{$previousKey});
        push (@{$hash{$type}{$previousKey}}, $value);
    } else {
        ($key, $value) = split(/  =  /, $line);
        $hash{$type}{$key} = $value;
        $previousKey = $key;
    }
    }
    say STDERR Dumper(\%hash);
}

__DATA__

       STAR-WARS        2020-01-01 00:00:00 00:00
S&W    #00000000
%%SHOW NAME: Q=Kenobi;%%
RETCODE = 0  Operation success

In-universe information
-----------------------
                 Species  =  Human
                  Gender  =  Male
television series of num  =  whatever

(Number of results = 1)

Personal Details
----------------
                        First Name  =  Obi-Wan
                         Last Name  =  Kenobi
                             Alias  =  Padawan
                                    =  Jedi Knight
                                    =  Jedi General
                                    =  Jedi Master
Points to other set of information  =  whatever

(Number of results = 1)

Other attribute
---------------
Significant other  =  Satine Kryze
      Affiliation  =  Jedi Order
                   =  Galactic Republic
                   =  Rebel Alliance
       Occupation  =  Jedi

(Number of results = 1)

---    END

Few facts:

  • every "block" always contains a header, followed by newline and dashes equal to the length of the header.
  • every "block" always ends with \n, followed by (Number of results = \d ), followed by \n.
  • each key/value pair always have two spaces before and after the equal sign. i.e. / = /
  • when no key exists, assume it's an [array], and append the value to the previous key. e.g. Alias in my example above.
  • the string will always ends with --- END followed by a \n

CodePudding user response:

According your description the section is starting with ... and ending with --- END.

Based on this information the input can be devided with regex into blocks of interest which then processed individually in a loop with a parser to build a hash.

NOTE: the parser was slightly modified and put into subroutine

use strict;
use warnings;
use feature 'say';

use Data::Dumper;

my @shows;

my $data   = do { local $/; <DATA> };
my @blocks = $data =~ /^(\ \ \  .*?^---    END)/msg;

push @shows, parse($_) for @blocks;

say Dumper(\@shows);

exit 0;

sub parse {
    my $data = shift;

    my(@sections,$re,$r);
    
#   Alternative block to extract show info section
#   $re = qr/^\ \ \ \s (\S )\s (\S )\s (\S )\s \S \s (\S )\s %%[^:] ?:\s ([^;] ?);%%\s RETCODE = (\d )\s ([^\n] )/;
#   $r->{info}->@{qw/show day time sw show_name code description/} = $data =~ /$re/;

    $re = qr/RETCODE = (\d )\s ([^\n] )/;
    $r->@{qw/code description/} = $data =~ /$re/;
    
    @sections = $data =~ /\n\n(. ?\n- .*?\(Number of results = \d \))/gs;

    for my $block ( @sections ) {
        my($section,@lines,$key,$value);
        @lines   = split("\n",$block);
        $section = $lines[0];
        for my $line (@lines[2..$#lines-2] ) {
            $line =~ s/^\s //;
            if( $line =~ /^=\s (. )/ ) {
                $r->{$section}{$key} = [ $r->{$section}{$key} ] unless ref $r->{$section}{$key} eq 'ARRAY';
                push @{$r->{$section}{$key}}, $1;
            } else {
                ($key,$value) = split(/  =  /,$line);
                $r->{$section}{$key} = $value; 
            }
        }
        $r->{$section}{results} = $block =~ /\(Number of results = (\d )\)/gs;
    }
    
    return $r;
}

__DATA__

       STAR-WARS        2020-01-01 00:00:00 00:00
S&W    #00000000
%%SHOW NAME: Q=Kenobi;%%
RETCODE = 0  Operation success

In-universe information
-----------------------
                 Species  =  Human
                  Gender  =  Male
television series of num  =  whatever

(Number of results = 1)

Personal Details
----------------
                        First Name  =  Obi-Wan
                         Last Name  =  Kenobi
                             Alias  =  Padawan
                                    =  Jedi Knight
                                    =  Jedi General
                                    =  Jedi Master
Points to other set of information  =  whatever

(Number of results = 1)

Other attribute
---------------
Significant other  =  Satine Kryze
      Affiliation  =  Jedi Order
                   =  Galactic Republic
                   =  Rebel Alliance
       Occupation  =  Jedi

(Number of results = 1)

---    END

       STAR-WARS        2020-01-01 00:00:00 00:00
S&W    #00000000
%%SHOW NAME: Q=Kenobi;%%
RETCODE = 0  Operation success

In-universe information
-----------------------
                 Species  =  Human
                  Gender  =  Male
television series of num  =  whatever

(Number of results = 1)

Personal Details
----------------
                        First Name  =  Obi-Wan
                         Last Name  =  Kenobi
                             Alias  =  Padawan
                                    =  Jedi Knight
                                    =  Jedi General
                                    =  Jedi Master
Points to other set of information  =  whatever

(Number of results = 1)

Other attribute
---------------
Significant other  =  Satine Kryze
      Affiliation  =  Jedi Order
                   =  Galactic Republic
                   =  Rebel Alliance
       Occupation  =  Jedi

(Number of results = 1)

---    END

Output

$VAR1 = [
          {
            'Other attribute' => {
                                   'Significant other' => 'Satine Kryze',
                                   'Occupation' => 'Jedi',
                                   'results' => 1,
                                   'Affiliation' => [
                                                      'Jedi Order',
                                                      'Galactic Republic',
                                                      'Rebel Alliance'
                                                    ]
                                 },
            'Personal Details' => {
                                    'results' => 1,
                                    'First Name' => 'Obi-Wan',
                                    'Alias' => [
                                                 'Padawan',
                                                 'Jedi Knight',
                                                 'Jedi General',
                                                 'Jedi Master'
                                               ],
                                    'Points to other set of information' => 'whatever',
                                    'Last Name' => 'Kenobi'
                                  },
            'code' => '0',
            'description' => 'Operation success',
            'In-universe information' => {
                                           'television series of num' => 'whatever',
                                           'Gender' => 'Male',
                                           'results' => 1,
                                           'Species' => 'Human'
                                         }
          },
          {
            'Other attribute' => {
                                   'Affiliation' => [
                                                      'Jedi Order',
                                                      'Galactic Republic',
                                                      'Rebel Alliance'
                                                    ],
                                   'results' => 1,
                                   'Significant other' => 'Satine Kryze',
                                   'Occupation' => 'Jedi'
                                 },
            'Personal Details' => {
                                    'First Name' => 'Obi-Wan',
                                    'results' => 1,
                                    'Last Name' => 'Kenobi',
                                    'Alias' => [
                                                 'Padawan',
                                                 'Jedi Knight',
                                                 'Jedi General',
                                                 'Jedi Master'
                                               ],
                                    'Points to other set of information' => 'whatever'
                                  },
            'code' => '0',
            'description' => 'Operation success',
            'In-universe information' => {
                                           'television series of num' => 'whatever',
                                           'results' => 1,
                                           'Gender' => 'Male',
                                           'Species' => 'Human'
                                         }
          }
        ];
  • Related