Home > Enterprise >  Perl Script to remove a specific html block from all files in a folder
Perl Script to remove a specific html block from all files in a folder

Time:03-23

I have "help texts" as popup macros on some of our confluence sites and I have to export the sites without these help pop ups as HTML (so attachments are included). My Idea is to run a perl script after the export to remove all help texts which are printed out with the html-tag <aui-inline-dialog></aui-inline-dialog>. So I need to remove all theses blocks including the content between them from all html files in a folder.

So far I build up these few code lines :

$dirname = '.';
opendir(DIR, $dirname) or die "Could not open $dirname\n";

while ($filename = readdir(DIR)) {
  $perl -ne "print unless /^<aui-inline-dialog>/ .. /^<aui-inline-dialog>$/" "$filename\n";
}

closedir(DIR);

Sadly, I'm prompted with the Error "C:\Confluence-space-export-081756-489.html\DATS>perl Perl_Script_for_Removal.pl syntax error at Perl_Script_for_Removal.pl line 5, near "-ne" Execution of Perl_Script_for_Removal.pl aborted due to compilation errors."

I never used Perl before and just can't figure out what how to proceed. I read something about an -i switch for perl, in order to edit the files, but I don't know how to use it correctly. I'm on Windows 10 with the Strawberry Perl command line tool. Any suggestions?

Thanks and greetings from Germany


So I took the advice and solved this Problem with a Python Script instead of Perl. Here is the python Script and thank you all for the input :

import os
from bs4 import BeautifulSoup

directory ='./'
for filename in os.listdir(directory):
    if filename.endswith('.html'):
        fname = os.path.join(directory,filename)
        with open(fname, 'w') as f:
            soup = BeautifulSoup(f.read(),'html.parser')
            for tag in soup.findAll(True):
                print(tag.name)
                if tag.name=='aui-inline-dialog':
                    i_tag = tag.extract()
            f.write(str(soup))
            f.close()

CodePudding user response:

You cannot switch to shell in the middle of a Perl script. Keep writing Perl:

my $dirname = '.';
opendir my $dir, $dirname or die "Could not open $dirname: $!\n";
while (my $filename = readdir DIR) {
    open my $file, '<', $filename or die "Could not open $filename: $!\n";
    while (<$file>) {
        print unless /^<aui-inline-dialog>/ .. /^<aui-inline-dialog>$/;
    }
}

Also, doesn't the final tag start with </?

Moreover, using regexes to parse HTML/XML is a wrong idea. Use a parser instead.

CodePudding user response:

Here's a safe way to do it in perl, using modules (XML::Twig and Path::Tiny) included in the Strawberry Perl distribution. It's basically the equivalent of the python version using BeautifulSoup.

#!/usr/bin/env perl                                                                                                                                                                                                                               
use strict;
use warnings;
use feature qw/say/;
use Path::Tiny;
use XML::Twig;

my $dir = path(".");
for my $file ($dir->children(qr/\.html$/)) {
    say "Processing $file";
    XML::Twig->new(
        twig_roots => { 'aui-inline-dialog' => sub { $_->delete } },
        twig_print_outside_roots => 1
        )->parsefile_html_inplace($file);
}
  • Related