perl code to find and substitute a pattern-CodePudding

output        [15:0] pin;                
output         [1:0] en;                
input          [6:0] dddr;            
input          [6:0] dbg;

replace this with

16 : pin : output;                         
2 : en : output;                
7 : dddr : input;            
7 : dbg :input;

I tried this code after opening the file and stored it in var. but i am not able to filter it like above

if ($var =~ /(\w )\[(\d )\:/) {  
    print "word=$1 number=$2\n";
}

//i am trying to add : in middle of the columns also

CodePudding user response：

You are missing the whitespace after the word characters in your pattern.

(\w  )       \[(\d ):
      VVVVVVVV
output        [15:0] pin;

This is easily fixed. Add it into the pattern in between, like so:

use strict;
use warnings;
use feature 'say';

while (my $line = <DATA>) {
    if ($line =~ /(\w )\s \[(\d )\:/) {
        say "word=$1 number=$2";
    }
}

__DATA__
output        [15:0] pin;
output         [1:0] en;
input          [6:0] dddr;
input          [6:0] dbg;

This produces:

word=output number=15
word=output number=1
word=input number=6
word=input number=6

To get to your desired output, you'll have to refine the pattern and probably do some incrementing too.

CodePudding user response：

You are not taking account of the whitespace between (\w ) and the (\d ) parts of your regex.

while (<DATA>)
{
    if ( /(\w )\s \[(\d )\:/) {  
        print "word=$1 number=$2\n";
    }
}

__DATA__
output        [15:0] pin;                
output         [1:0] en;                
input          [6:0] dddr;            
input          [6:0] dbg;

That outputs this

word=output number=15
word=output number=1
word=input number=6
word=input number=6

To get to your close to your final requirement, the regex can be expanded to match the other parts you need, as follows

while (<DATA>)
{
    if ( /(\w )\s \[(\d )\:\d \]\s (.*);/) {  
        print "$2 : $3 : $1\n";
    }
}

__DATA__
output        [15:0] pin;                
output         [1:0] en;                
input          [6:0] dddr;            
input          [6:0] dbg;

which outputs this

15 : pin : output
1 : en : output
6 : dddr : input
6 : dbg : input

Not sure how you calculate the value for the first column. It appears to be the number field 1. Is that correct?

CodePudding user response：

One way to parse the shown data

use warnings;
use strict;
use feature 'say';

while (<>) {             
    if ( /(\S ) \s  \[ ([0-9] ):[0-9]  \] \s  (\S ) \s*;/x ) {
        say $2 1, ' : ', $3, ' : ', $1, ';';  
    }
}

Some comments follow.

In most regex patterns a lot depends on details of the input data format, and on how much flexibility there is in what data to expect and allow.

That \S matches a string of non-whitespace characters; that assumes that there is a single word in the beginning, that may contain any non-space characters. If there may be multiple words then use . ? instead, which matches all up to the first instance of the following pattern (here ;, so better yet, can use [^;] )
I use the rather permissive \S as nothing is told about data. But if only "word character"s ([a-zA-Z0-9_]) are expected and allowed — and you want/need to enforce that — then use the far more restrictive \w
No spaces are allowed inside [], only numbers with a : between them. If it is OK for data to possibly have spaces then use \[\s* and \s*\]
In the end, again one word is matched with \S , with any non-space characters in it. If more than one word can be expected then again use . ?. If that part may contain semi-colons then you'd need . which takes everything up to the very last ;
In all of this the quantifier requires that there be at least one occurrence of the previous pattern. If it is acceptable that there is nothing in that place in data (that last word just missing for example) then use the * quantifier instead, like .*

So it is important to understand what the data is like exactly, as much as possible, and to thoughtfully articulate the requirements, in what precisely to restrict/allow.