I have the XML-file:
<?xml version="1.0" encoding="UTF-8"?>
<questions>
<question>
<name>First question</name>
<true>2</true>
<answers>
<answer>First answer</answer>
<answer>Second answer</answer>
<answer>Third answer</answer>
<answer>Fourth answer</answer>
</answers>
</question>
<question>
<name>Second question</name>
<true>3</true>
<answers>
<answer>First answer</answer>
<answer>Second answer</answer>
<answer>Third answer</answer>
<answer>Fourth answer</answer>
</answers>
</question>
</questions>
Why when the Java code below is executed, it returns 9 elements instead of 4, and the incorrect 5 elements contain one line feed and 3 tabs that are between <answers>
and <answer>
(one), </answer>
and <answer>
(three), </answer>
and </answers>
(one) in XML:
File file = new File(path);
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
Document doc = documentBuilder.parse(file);
NodeList answers = doc.getElementsByTagName("answers").item(n).getChildNodes();
Next, I do a check to cut off the wrong elements:
if (answers.item(i).getTextContent().trim().length() > 0)
I would be grateful if you could tell me a better way.
CodePudding user response:
It's not returning 9 elements - it's returning 9 nodes, which is correct. (After all, you're asking for the child nodes of the answers
element.) Those white-space only text nodes are valid nodes. If you want elements, just ignore any node where Node.getNodeType()
doesn't return Node.ELEMENT_NODE
.
Alternatively, just call getElementsByTagName("answer")
on the answers
element to get just the elements. That's assuming you're happy to ignore any non-answer
elements though. For example:
Element answersElement = (Element) doc.getElementsByTagName("answers").item(n);
NodeList answerElements = answersElement.getElementsByTagName("answer");