I have a Perl program that:
- Reads a SRC file written in C
- Uses a regex match from SRC file to find specific formatted data to use as the Destination filename
- Opens new Destination file
- Performs another regex match to find all C style comments /* */ that contain a keyword abcd. Note: these comments can be 1 line or more than 1 line so the regex is looking for the first /* and then the keyword abcd and then any amount of text and space before it encounters a closing */
- Writes the regex matches to the destination file
#!/usr/bin/perl
use warnings;
use strict;
my $src = 'D:\\Scripts\\sample.c';
my $fileName;
# open source file for reading
open(SRC_FH,'<',$src) or die $!;
while(my $row = <SRC_FH>){
if ($row =~ /([0-9]{2}\.[0-9]{2}\.[0-9]{3}\.[a-z,0-9]{2}|[0-9]{2}\.[0-9]{2}\.[0-9]{3}\.[a-z,0-9]{3})/){
$fileName = $1;
}
}
my $des = "D:\\Scripts\\" . $fileName . ".txt";
# open destination file for writing
open(DES_FH,'>',$des) or die $!;
print("copying content from $src to $des\n");
seek SRC_FH, 0, 0;
while(my $row = <SRC_FH>){
if ($row =~ /(\/\*.*abcd.[\s\S]*?\*\/)/){
print DES_FH "$1\n";
}
}
# always close the filehandles
close(SRC_FH);
close(DES_FH);
print "File content copied successfully!\n";
My problem is I think because of the way perl code executes although by regex is correct, my destination file is only getting the 1 line comments written to it. Any C style comments that are more than 1 line are not getting written to the destination file. What am I missing in my 2nd if statement?
I checked my 2nd if statement regex here https://regexr.com/ and it works as its supposed to capturing multi line C style comments as well as single line comments that also contain the keyword abcd.
So I tried the 1st suggestion below by zdim. Here is what I used:
#!/usr/bin/perl
use warnings;
use strict;
my $src = 'D:\\Scripts\\sample.c';
my $fileName;
my @comments;
# open source file for reading
open(SRC_FH,'<',$src) or die $!;
while(my $row = <SRC_FH>){
if ($row =~ /([0-9]{2}\.[0-9]{2}\.[0-9]{3}\.[a-z,0-9]{2}|[0-9]{2}\.[0-9]{2}\.[0-9]{3}\.[a-z,0-9]{3})/){
$fileName = $1;
}
}
my $des = "D:\\Scripts\\" . $fileName . ".txt";
# open destination file for writing
open(DES_FH,'>',$des) or die $!;
print("copying content from $src to $des\n");
#seek SRC_FH, 0, 0;
my $content = do {
#read whole file at once
local $/;
open (SRC_FH,'<', $src) or die $!;
<SRC_FH>;
};
#if($content =~ /(\/\*.*abcd.[\s\S]*?\*\/)/sg){
# my @comments = $content;
# }
my @comments = $content =~ /(\/\*.*abcd.[\s\S]*?\*\/)/sg;
foreach (@comments){
print DES_FH "$1\n";
}
#while(my $row = <SRC_FH>){
# if ($row =~ /(\/\*.*abcd.[\s\S]*?\*\/)/){
# print DES_FH "$1\n";
# }
#}
# always close the filehandles
close(SRC_FH);
close(DES_FH);
print "File content copied successfully!\n";
The result is all the content from sample.c are copied to the destination file. A full 1:1 copy. Where I am looking to pull all comments single line and multiline out of the C file.
Example 1: /* abcd */ Example 2: /* some text * some more comments abcd and some more comments */
Final Solution
#!/usr/bin/perl
use warnings;
use strict;
my $src = 'D:\\Scripts\\sample.c';
my $fileName;
# open source file for reading
open(SRC_FH,'<',$src) or die $!;
while(my $row = <SRC_FH>){
if ($row =~ /([0-9]{2}\.[0-9]{2}\.[0-9]{3}\.[a-z,0-9]{2}|[0-9]{2}\.[0-9]{2}\.[0-9]{3}\.[a-z,0-9]{3})/){
$fileName = $1;
}
}
my $des = "D:\\Scripts\\" . $fileName . ".txt";
# open destination file for writing
open(DES_FH,'>',$des) or die $!;
print("copying content from $src to $des\n");
seek SRC_FH, 0, 0;
my $content = do{local $/; <SRC_FH>};
my @comments = $content =~ /(\/\*.*abcd.[\s\S]*?\*\/)/g;
for(@comments){
print DES_FH "$_\n";
}
# always close the filehandles
close(SRC_FH);
close(DES_FH);
print "File content copied successfully!\n";
CodePudding user response:
What am I missing in my 2nd if statement?
Well, nothing -- it's just that in a multiline C comment neither of its lines has both /*
and */
. Thus that regex just cannot match a multiline comment when a file is read line by line.
To catch such comments either:
Read the whole file into a string ("slurp" it), and then add
/s
modifier on the regex so that.
matches a newline as well. Also use/g
modifier so to capture all such patterns in the string. One waymy $content = do { local $/; # undef record separator so the whole file is read at once open my $src_fh, '<', $src_file or die $!; # have to re-open <$src_fh>; # reads it all }; # lexical filehandle gets closed as we leave scope # NOTE -- there may be difficulties in capturing comments in a C source file my @comments = $content =~ /.../sg; # your regex
Or use a library to slurp a file, like
use Path::Tiny; my $content = path($src_file)->slurp;
Or,
Set a flag when you see
/*
, copy all lines until you hit the closing*/
, then unset the flag. Here is a rudimentary version of thatmy $inside_comment = 0; while (<$src_fh>) { if (m{(/\*.*)}) { #/ fix syntax hilite $inside_comment = 1; # opening line for the comment say $des_fh $1; } elsif (m{(.*\*/)}) { # closing line for the comment say $des_fh $1; $inside_comment = 0; } elsif ($inside_comment) { say $des_fh $_} }
I tested all this but please check and improve. For one, this plays funny with leading spaces.
Note: Getting all comments out of a C program in general may be rather tricky.
Here is a one-line version of slurping
my $file_content = do { local (@ARGV, $/) = $file_name; <> }