output [15:0] pin;
output [1:0] en;
input [6:0] dddr;
input [6:0] dbg;
replace this with
16 : pin : output;
2 : en : output;
7 : dddr : input;
7 : dbg :input;
I tried this code after opening the file and stored it in var. but i am not able to filter it like above
if ($var =~ /(\w )\[(\d )\:/) {
print "word=$1 number=$2\n";
}
//i am trying to add : in middle of the columns also
CodePudding user response:
You are missing the whitespace after the word characters in your pattern.
(\w ) \[(\d ):
VVVVVVVV
output [15:0] pin;
This is easily fixed. Add it into the pattern in between, like so:
use strict;
use warnings;
use feature 'say';
while (my $line = <DATA>) {
if ($line =~ /(\w )\s \[(\d )\:/) {
say "word=$1 number=$2";
}
}
__DATA__
output [15:0] pin;
output [1:0] en;
input [6:0] dddr;
input [6:0] dbg;
This produces:
word=output number=15
word=output number=1
word=input number=6
word=input number=6
To get to your desired output, you'll have to refine the pattern and probably do some incrementing too.
CodePudding user response:
You are not taking account of the whitespace between (\w )
and the (\d )
parts of your regex.
while (<DATA>)
{
if ( /(\w )\s \[(\d )\:/) {
print "word=$1 number=$2\n";
}
}
__DATA__
output [15:0] pin;
output [1:0] en;
input [6:0] dddr;
input [6:0] dbg;
That outputs this
word=output number=15
word=output number=1
word=input number=6
word=input number=6
To get to your close to your final requirement, the regex can be expanded to match the other parts you need, as follows
while (<DATA>)
{
if ( /(\w )\s \[(\d )\:\d \]\s (.*);/) {
print "$2 : $3 : $1\n";
}
}
__DATA__
output [15:0] pin;
output [1:0] en;
input [6:0] dddr;
input [6:0] dbg;
which outputs this
15 : pin : output
1 : en : output
6 : dddr : input
6 : dbg : input
Not sure how you calculate the value for the first column. It appears to be the number field 1. Is that correct?
CodePudding user response:
One way to parse the shown data
use warnings;
use strict;
use feature 'say';
while (<>) {
if ( /(\S ) \s \[ ([0-9] ):[0-9] \] \s (\S ) \s*;/x ) {
say $2 1, ' : ', $3, ' : ', $1, ';';
}
}
Some comments follow.
In most regex patterns a lot depends on details of the input data format, and on how much flexibility there is in what data to expect and allow.
That
\S
matches a string of non-whitespace characters; that assumes that there is a single word in the beginning, that may contain any non-space characters. If there may be multiple words then use. ?
instead, which matches all up to the first instance of the following pattern (here;
, so better yet, can use[^;]
)I use the rather permissive
\S
as nothing is told about data. But if only "word character"s ([a-zA-Z0-9_]
) are expected and allowed — and you want/need to enforce that — then use the far more restrictive\w
No spaces are allowed inside
[]
, only numbers with a:
between them. If it is OK for data to possibly have spaces then use\[\s*
and\s*\]
In the end, again one word is matched with
\S
, with any non-space characters in it. If more than one word can be expected then again use. ?
. If that part may contain semi-colons then you'd need.
which takes everything up to the very last;
In all of this the
*
quantifier instead, like.*
So it is important to understand what the data is like exactly, as much as possible, and to thoughtfully articulate the requirements, in what precisely to restrict/allow.