I have an XML which I need to parse using XMLInputFactory(java.xml.stream). XML is of this type:
<SACL>
<Criteria>Dinner</Criteria>
<Value> Rice & amp ;(without spaces) Beverage </Value>
</SACL>
I am parsing this using XML Factory Reader in JAVA and my code is:
if(xmlEvent.asStartElement().getName().getLocalPart().equals("Value"){
xmlEvent = xmlEventReader.nextEvent();
value = xmlEvent.asCharacters().getData().trim(); //Issue is in the if bracket only
}
(xmlEventReader = XMLInputFactory.newInstance().createXMLEventReader(new FileInputStream(file.getPath())); //using java.xml.stream.XMLEventReader
But it is parsing the data like this only "Rice" (missing & Beverage) Expected Output : Rice & Beverage
Can someone suggest what is the issue with "& amp ;"(without spaces) and how can it be fixed?
CodePudding user response:
It looks like the issue you are experiencing is related to the way that the XML is being parsed. The & amp; string in the XML is being treated as an HTML entity, which is causing the XML parser to stop parsing the string at that point.
To fix this issue, you can try the following:
*Replace the & amp; string with & in the XML. This will fix the issue with the HTML entity, and the XML parser should be able to parse the string correctly.
*Use a different XML parser that can handle HTML entities. For example, you can use the javax.xml.parsers.DocumentBuilder class to parse the XML, and then use the getTextContent() method to extract the text from the Value element.
*Use a different approach to parse the XML. For example, you can use the org.w3c.dom.Document class to parse the XML and then use the getElementsByTagName() method to extract the Value element.
CodePudding user response:
I've worked on a project that did XML parsing recently, so I know almost exactly what's happening here: the parser sees &
as a separate event (XMLStreamConstants.ENTITY_REFERENCE
).
Try setting property XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES
to true
in your XML parser's options. If the parser is properly implemented, the entity is replaced and made part of the text.
Keep in mind that the parser is allowed to split it into multiple characters events, especially if you have large pieces of text. Setting property XMLInputFactory.IS_COALESCING
to true
should prevent that.