I would like iterate through a multiline pattern in Perl, but I'm struggling with the syntax.
My input string is:
STAR-WARS 2020-01-01 00:00:00 00:00
S&W #00000000
%%SHOW NAME: Q=Kenobi;%%
RETCODE = 0 Operation success
In-universe information
-----------------------
Species = Human
Gender = Male
television series of num = whatever
(Number of results = 1)
Personal Details
----------------
First Name = Obi-Wan
Last Name = Kenobi
Alias = Padawan
= Jedi Knight
= Jedi General
= Jedi Master
Points to other set of information = whatever
(Number of results = 1)
Other attribute
---------------
Significant other = Satine Kryze
Affiliation = Jedi Order
= Galactic Republic
= Rebel Alliance
Occupation = Jedi
(Number of results = 1)
--- END
My desired resulting hash would be:
$VAR1 = {
'In-universe information' => {
'Gender' => 'Male',
'Species' => 'Human',
'results' => '1',
'television series of num' => 'whatever'
},
'Other attribute' => {
'Affiliation' => [
'Jedi Order',
'Galactic Republic',
'Rebel Alliance'
],
'Occupation' => 'Jedi',
'Significant other' => 'Satine Kryze',
'results' => '1'
},
'Personal Details' => {
'Alias' => [
'Padawan',
'Jedi Knight',
'Jedi General',
'Jedi Master'
],
'First Name' => 'Obi-Wan',
'Last Name' => 'Kenobi',
'Points to other set of information' => 'whatever',
'results' => '1'
},
'code' => '0',
'description' => 'Operation success'
};
What I have come up with works well for a "single block" (e.g. Personal Details
above). However, if the data contains multiple blocks, I can't figure out how to iterate through every matching block. (e.g. use while
with /g
)
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
local $/;
my $output = <DATA>;
my %hash;
($hash{'code'}, $hash{'description'}) = $output =~ /^RETCODE = (\d )\s (.*)\n/m;
if ($hash{'code'} eq "0") {
my ($type,$data, $results) = $output =~ /([^\n] )\n- \n(.*)\n\n\(Number of results = (\d )\)\n\n/sm;
my $previousKey = "";
while ($data =~ /(. )$/mg) {
my $line = $1;
$line =~ s/(?:^ )//g;
my ($key, $value);
if ($line =~ /^\s*= /) {
($value) = $line =~ /^\s*= (.*)$/;
$hash{$type}{$previousKey} = [ $hash{$type}{$previousKey} ] unless ref($hash{$type}{$previousKey});
push (@{$hash{$type}{$previousKey}}, $value);
} else {
($key, $value) = split(/ = /, $line);
$hash{$type}{$key} = $value;
$previousKey = $key;
}
}
say STDERR Dumper(\%hash);
}
__DATA__
STAR-WARS 2020-01-01 00:00:00 00:00
S&W #00000000
%%SHOW NAME: Q=Kenobi;%%
RETCODE = 0 Operation success
In-universe information
-----------------------
Species = Human
Gender = Male
television series of num = whatever
(Number of results = 1)
Personal Details
----------------
First Name = Obi-Wan
Last Name = Kenobi
Alias = Padawan
= Jedi Knight
= Jedi General
= Jedi Master
Points to other set of information = whatever
(Number of results = 1)
Other attribute
---------------
Significant other = Satine Kryze
Affiliation = Jedi Order
= Galactic Republic
= Rebel Alliance
Occupation = Jedi
(Number of results = 1)
--- END
Few facts:
- every "block" always contains a header, followed by newline and dashes equal to the length of the header.
- every "block" always ends with
\n
, followed by(Number of results = \d )
, followed by\n
. - each key/value pair always have two spaces before and after the equal sign. i.e.
/ = /
- when no key exists, assume it's an [array], and append the value to the previous key. e.g.
Alias
in my example above. - the string will always ends with
--- END
followed by a\n
CodePudding user response:
According your description the section is starting with ...
and ending with --- END
.
Based on this information the input can be devided with regex into blocks of interest which then processed individually in a loop with a parser to build a hash.
NOTE: the parser was slightly modified and put into subroutine
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
my @shows;
my $data = do { local $/; <DATA> };
my @blocks = $data =~ /^(\ \ \ .*?^--- END)/msg;
push @shows, parse($_) for @blocks;
say Dumper(\@shows);
exit 0;
sub parse {
my $data = shift;
my(@sections,$re,$r);
# Alternative block to extract show info section
# $re = qr/^\ \ \ \s (\S )\s (\S )\s (\S )\s \S \s (\S )\s %%[^:] ?:\s ([^;] ?);%%\s RETCODE = (\d )\s ([^\n] )/;
# $r->{info}->@{qw/show day time sw show_name code description/} = $data =~ /$re/;
$re = qr/RETCODE = (\d )\s ([^\n] )/;
$r->@{qw/code description/} = $data =~ /$re/;
@sections = $data =~ /\n\n(. ?\n- .*?\(Number of results = \d \))/gs;
for my $block ( @sections ) {
my($section,@lines,$key,$value);
@lines = split("\n",$block);
$section = $lines[0];
for my $line (@lines[2..$#lines-2] ) {
$line =~ s/^\s //;
if( $line =~ /^=\s (. )/ ) {
$r->{$section}{$key} = [ $r->{$section}{$key} ] unless ref $r->{$section}{$key} eq 'ARRAY';
push @{$r->{$section}{$key}}, $1;
} else {
($key,$value) = split(/ = /,$line);
$r->{$section}{$key} = $value;
}
}
$r->{$section}{results} = $block =~ /\(Number of results = (\d )\)/gs;
}
return $r;
}
__DATA__
STAR-WARS 2020-01-01 00:00:00 00:00
S&W #00000000
%%SHOW NAME: Q=Kenobi;%%
RETCODE = 0 Operation success
In-universe information
-----------------------
Species = Human
Gender = Male
television series of num = whatever
(Number of results = 1)
Personal Details
----------------
First Name = Obi-Wan
Last Name = Kenobi
Alias = Padawan
= Jedi Knight
= Jedi General
= Jedi Master
Points to other set of information = whatever
(Number of results = 1)
Other attribute
---------------
Significant other = Satine Kryze
Affiliation = Jedi Order
= Galactic Republic
= Rebel Alliance
Occupation = Jedi
(Number of results = 1)
--- END
STAR-WARS 2020-01-01 00:00:00 00:00
S&W #00000000
%%SHOW NAME: Q=Kenobi;%%
RETCODE = 0 Operation success
In-universe information
-----------------------
Species = Human
Gender = Male
television series of num = whatever
(Number of results = 1)
Personal Details
----------------
First Name = Obi-Wan
Last Name = Kenobi
Alias = Padawan
= Jedi Knight
= Jedi General
= Jedi Master
Points to other set of information = whatever
(Number of results = 1)
Other attribute
---------------
Significant other = Satine Kryze
Affiliation = Jedi Order
= Galactic Republic
= Rebel Alliance
Occupation = Jedi
(Number of results = 1)
--- END
Output
$VAR1 = [
{
'Other attribute' => {
'Significant other' => 'Satine Kryze',
'Occupation' => 'Jedi',
'results' => 1,
'Affiliation' => [
'Jedi Order',
'Galactic Republic',
'Rebel Alliance'
]
},
'Personal Details' => {
'results' => 1,
'First Name' => 'Obi-Wan',
'Alias' => [
'Padawan',
'Jedi Knight',
'Jedi General',
'Jedi Master'
],
'Points to other set of information' => 'whatever',
'Last Name' => 'Kenobi'
},
'code' => '0',
'description' => 'Operation success',
'In-universe information' => {
'television series of num' => 'whatever',
'Gender' => 'Male',
'results' => 1,
'Species' => 'Human'
}
},
{
'Other attribute' => {
'Affiliation' => [
'Jedi Order',
'Galactic Republic',
'Rebel Alliance'
],
'results' => 1,
'Significant other' => 'Satine Kryze',
'Occupation' => 'Jedi'
},
'Personal Details' => {
'First Name' => 'Obi-Wan',
'results' => 1,
'Last Name' => 'Kenobi',
'Alias' => [
'Padawan',
'Jedi Knight',
'Jedi General',
'Jedi Master'
],
'Points to other set of information' => 'whatever'
},
'code' => '0',
'description' => 'Operation success',
'In-universe information' => {
'television series of num' => 'whatever',
'results' => 1,
'Gender' => 'Male',
'Species' => 'Human'
}
}
];