I am trying to parse an XML string to a Java object using fasterxml.jackson.xml.XmlMapper
.
The problem is that the XML string contains the character '&'.
I am getting an exception thrown
Exception in thread "main" com.fasterxml.jackson.databind.JsonMappingException: Unexpected character '&' in prolog; expected '<'.
Code
import java.util.Map;
import com.fasterxml.jackson.dataformat.xml.XmlMapper;
public class MyProblem {
public static void main(String[] args) {
XmlMapper = xmlMapper = new XmlMapper();
String myXML = "<cookies>Chocolate&Butter cocunut</cookies>";
Map<String, String> myTester = xmlMapper.reader().readValue(myXML, Map.class);
}
}
I was expecting it to work when I perform a System.out.println(myTester);
After reading XmlMapper's documentation, I believe there is a property I can set that I can use to override deserialization functionalities.
If I need to escape these special characters, how to do?
CodePudding user response:
Because of the special role of ampersand character in XML it must be
- either enclosed as CDATA
"<cookies><![CDATA[Chocolate&Butter cocunut]]></cookies>"
- or as HTML-entity
"<cookies>Chocolate&Butter cocunut</cookies>"
Both would be valid XML strings that Jackson and the underlying Woodstox can parse.
See also XML Spec, 2.4 Character Data and Markup:
The ampersand character (&) and the left angle bracket (<) MUST NOT appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. If they are needed elsewhere, they MUST be escaped using either numeric character references or the strings " & " and " < " respectively. The right angle bracket (>) may be represented using the string " > ", and MUST, for compatibility, be escaped using either " > " or a character reference when it appears in the string " ]]> " in content, when that string is not marking the end of a CDATA section.
Related questions:
- How to solve Ampersand (&) conversion issue in XML?
- "Content is not allowed in prolog" error yet nothing before XML declaration
- WstxUnexpectedCharException: Unexpected character '"' (code 34) in DOCTYPE declaration; expected a space between public and system identifiers
CodePudding user response:
Jackson provides a number of ways to escape special characters when serializing and deserializing. You can use the JsonParser.Feature.ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER
feature to allow backslash escaping of any character. You can also use the JsonParser.Feature.ALLOW_UNQUOTED_CONTROL_CHARS
feature to allow unquoted control characters.
More details can be found here