Home > front end >  Regex for removing line X when immediately followed by line Y
Regex for removing line X when immediately followed by line Y

Time:09-14

I'm looking for a regex for removing line X when immediately followed by line Y. I've tried this in a foreach (files previously gathered)

foreach my $file (@files) {
    say "Processing $file in $dir";

    open( my $fh, "<", "$file" )
      or die "Can't open < $file: $!";

    my $data = {};

    my $start = "X:";
    my $end   = "Y:";

    my $contents = do { local $/; <$fh> };

    my $count = 1;

    my $transformed = $contents;

    while ( $contents =~ /$start(.*?)$end/sg ) {
        say $1 if $1;
        my $formatted = $1;
        $formatted =~ s/\s //g                  if $formatted;
        $data->{$file}->{$formatted} = $count   if $formatted;
        $transformed =~ /($start.*?$end)/sg;
        my $removed = quotemeta($1)        if $1;
        $transformed =~ s/$removed/$end/sg if $removed;
    }

    push @results, $data if ($data);

    path($file)->spew_utf8($transformed);

}

but cannot get it to ignore the 'ignore example' below.

Text to process (Y immediately after X):

{
X: John Smith,
Y: 1234 Main Street
}

becomes

{
Y: 1234 Main Street
}

Text to ignore (Y not immediately after X):

{
X: John Smith,
X2: 1234567890,
Y: 1234 Main Street
}

CodePudding user response:

You can use the following:

s/^X:.*\n(?=Y:)//m;

This is short for

s/^X:[^\n]*\n(?=Y:)//m;

g can be used to remove multiple occurrences.

  •  Tags:  
  • perl
  • Related