Home > Blockchain >  Need to finding HTML/XML nested tag levels using Perl
Need to finding HTML/XML nested tag levels using Perl

Time:10-14

Is their any simple way to find the level of the tag in nested form, i.e. no. of parent element with same tag name.

Note: I'm planning to create subroutine that if I pass a scalar like below input, it should return output like below as a scalar.

I need output like below from the input using Perl.

Input:

<sec>
  <sec></sec>
  <sec>
    <sec></sec>
  </sec>
</sec>

Output should be:

<sec level="1">
  <sec level="2"></sec>
  <sec level="2">
    <sec level="3"></sec>
  </sec>
</sec>

CodePudding user response:

One approach, that uses XML::LibXML to generate a DOM tree from the XML, and then walks the tree adding an incrementing level attribute to matching tags:

#!/usr/bin/env perl
use warnings;
use strict;
use XML::LibXML;

# Recursively walk a DOM tree, and invoke callbacks on elements
sub walk_elements {
    my ($node, $callbacks) = @_;
    $callbacks->{pre}->($node) if $node->nodeType == XML_ELEMENT_NODE;
    for my $child ($node->childNodes) {
        walk_elements($child, $callbacks);
    }
    $callbacks->{post}->($node) if $node->nodeType == XML_ELEMENT_NODE;
}

sub add_levels {
    my ($xml, $tagname) = @_;
    my $dom = XML::LibXML->load_xml(string => $xml);
    my $level = 1;
    walk_elements($dom->getDocumentElement,
                { pre => sub {
                    $_[0]->setAttribute('level', $level  )
                        if $_[0]->nodeName eq $tagname
                  },
                  post => sub { $level-- if $_[0]->nodeName eq $tagname }
                }
        );
    return $dom->toStringHTML; # Or toString for XML style tags
}

my $xml = <<EOXML;
<sec>
  <sec></sec>
  <sec>
    <sec></sec>
  </sec>
</sec>
EOXML

print add_levels($xml, 'sec');

Running this script outputs

<sec level="1">
  <sec level="2"></sec>
  <sec level="2">
    <sec level="3"></sec>
  </sec>
</sec>
  •  Tags:  
  • perl
  • Related