Home > Enterprise >  How to find Self-Closing Tags with org.w3c.dom
How to find Self-Closing Tags with org.w3c.dom

Time:03-04


Does anybody know, how to find self closing tags of the XML document?
I am able to get all the elements of specific type, but I am unable to find elements, that are self closing and also, I need to find elements, with no attributes.
var dbf = DocumentBuilderFactory.newInstance();
dbf.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true);
var db = dbf.newDocumentBuilder();

var urlToFile = MyClass.class.getClassLoader().getResource("file.xml");
var file = Paths.get(urlToFile .toURI()).toFile();
var doc = db.parse(file);

doc.getDocumentElement().normalize();

var list = doc.getElementsByTagName("myTag");

for (int i = 0; i < list.getLength(); i  ) {

     var node = list.item(i);

     if (node.getNodeType() == Node.ELEMENT_NODE) {

          var bits = node.getChildNodes();

          for (int j = 0; j < bits.getLength(); j  ) {

               if (bits.item(j).hasAttributes()) {
                    // var parrentAttrName = bits.item(j).getNodeName();
                    // getValueFromAttribute is my private method
                    var nameAttrValue = getValueFromAttribute(bits, j, "name");
                    var stateAttrValue = getValueFromAttribute(bits, j, "state");

                    bits.addElementToList(new MyBit(nameAttrValue, stateAttrValue));
                }

                if(!bit.item(j).hasAttributes()) {
                     // not working 
                     System.out.println(bits.item(j));
                }
          }
     }
}

My XML file has two types of myTag tags :

  1. Pair tags, that contains another nested child elements <myTag><someElementHere /></myTag>
  2. Self-closing tags, that are specifying some other behaviour <myTag/>

Is there a mechanism, to find this kind elements? The one possible thing would be, to match the regex of self closing tags, but I was thinking of some other solution possible.

Any reasonable answer will be appreciated.

Thanks in advance.

CodePudding user response:

Once the document is parsed, and the content loaded into a DOM, there are no tags, there are only nodes. You can tell that an element node is empty (by asking whether it has any child nodes), but you can't tell whether the empty element was originally written as <myTag/> or as <myTag></myTag>. That's the author's choice and it should make no difference to the recipient.

Your question indicates that you are very confused about the difference between the lexical XML (the tags and angle brackets), and the tree model of the XML represented by the DOM.

CodePudding user response:

Self closing tags have no children but so do empty tags. That said, XPath could be used to find elements with no children or with attributes

Given

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <test/>
    <test a="a"/>
    <empty></empty>
    <test>
        <a>a</a>
    </test>
    <test>text</test>
    <deep>
        <some b="b" />
    </deep>
</root>

Find elements with no children with //*[count(./descendant::*) = 0 and count(./text()) = 0]

xmllint --shell test.xml
</ cat //*[count(./descendant::*) = 0 and count(./text()) = 0]
<test/>
 -------
<test a="a"/>
 -------
<empty/>
 -------
<some b="b"/>

Find elements with attributes with xpath //*[count(./@*)> 0]

/ > cat //*[count(./@*)> 0]
 -------
<test a="a"/>
 -------
<some b="b"/>

Note: XPath is language agnostic so it should work in java.

  • Related