Home > Software engineering >  Perl - How to remove mulitline numbers with a Regex
Perl - How to remove mulitline numbers with a Regex

Time:06-20

I have a data file with the following.

Some random text here
1
2
3
13
Show:
120
items per page

I want to remove the numbers, "Show:" and the number below. So the result becomes

Some random text here
items per page

I have the following code:

my $Showing = "((\\d{1,}\\n))*Show:\\n\\d{1,}\\n";
$FileContents =~ s/$Showing//ig;

which results in the following:

Some random text here
1
2
3
items per page

It only removes one number above "Show:", I have tried a number of variations of the $Showing variable. How can I get this to work.

I have another data file with the following:

Showing 1 - 46 of 46 products
20
50
per page

With the code, this code works.

my $Showing = 'Showing.*\n((\\d{1,}\\n)*)';
$FileContents =~ s/$Showing//ig;

The difference is the numbers are below "Showing", whereas for the one that does not work the numbers are above.

CodePudding user response:

The attempted regex appears OK, even though I'd avoid the double quotes (and thus the need to then escape things!). Better yet, use qr operator to first build the regex pattern

my $re = qr/(?:[0-9] \s*\n\s*) Show:\s*\n\s*[0-9] \s*\n/;

Then

$text =~ s/$re//;

results in the wanted two lines. The whole file is in the string $text.

I've sprinkled that pattern with possible spaces everywhere, but then since \s mostly includes all manner of newlines you can probably leave only the \s

my $re = qr/(?:[0-9] \s ) Show:\s [0-9] \s /;

I left explicit \n's to avoid confusion.

It is possible that something's "wrong" with newlines in your file, like having a carriage return and linefeed pair (instead of just a newline character). So if this isn't working try to tweak the \n in the pattern.

Options are to use [\n\r] (for either or both), or \R, or even \v (vertical space).

CodePudding user response:

I would solve this by just doing multiple regexes. For example

#!/usr/bin/env perl
use strict;
use warnings;
use v5.32;

while (my $line = <>) {
    next if $line =~ m/\A\d \s*\z/xms;
    next if $line =~ m/\AShow:\s*\z/xms;
    
    print $line;
}

In Shell it works like

$ ./remover.pl data.txt 
Some random text here
items per page
  • Related