What will happen if we use CDATA for integers-CodePudding

CDATA is actually used to handle special characters. So, what will happen if we pass CDATA to integer value will it convert integer datatype to String/char datatype.

For example I have <itemCount><![CDATA[123]]></itemCount>. I have the same column in database itemCount as an integer. If I send the XML like this, will the database treat it as String/character datatype or it will be as integer datatype?

CodePudding user response：

In theory, <itemCount><![CDATA[123]]></itemCount> and <itemCount>123</itemCount> are 100% equivalent. The only purpose of putting stuff in CDATA is that it avoids the need to escape special characters, and when there aren't any special characters, it makes no difference.

However, practice isn't always the same as theory. As regular readers of StackOverflow know, there are people who (against all the best advice) try to process XML using regular expressions, and sprinkling a few CDATA tags in your XML is a sure way to reveal the bugs in their code. Even if they use a proper XML parser, there are often parsing options that make CDATA tags visible to applications (for example, there's a CDATA node type in DOM), which means that people can write applications that work one way if the CDATA tags are there, and a different way if they aren't.

So the answer is that an application COULD treat CDATA specially (in any way it chooses, including the way you describe) but a well designed application WON'T do so.

CodePudding user response：

There shouldn't be any difference between putting a string without entity references into CDATA section markers or not, but actual behavior is determined by your XML-consuming app. The lowest-level APIs for XML processing, such as SAX, really just send additional startCDATA()/endCDATA() events to demarcate CDATA section markers, with actual character data being sent via characters() events whether using CDATA sections or not, so apps can and probably should ignore CDATA section demarcation events unless having specific needs for reproducing input to output.

However, there's potential trouble in the presence of multiple consecutive CDATA sections, or more generally, when character content is split into multiple fragments, one or more of which is encoded as CDATA section, as in:

<itemCount>1<![CDATA[23]]></itemCount>

While logically this isn't different from just 123, an XML parser delivering SAX events won't be able to coalesce the fragments 1 and 23 into a single character() event when at the same time reporting startCDATA()/endCDATA() events precisely, so the app must be prepared to coalesce the sequence of two characters() events delivered to it on its own, or report an error pointing out the problem, which I'd expect few apps to actually do.

CDATA sections are really just an encoding construct to stop XML parsers from interpreting markup delimiters (ie. < and >) and what may look like entity references (&something;), but have no bearing on logical content delivered to applications. SGML, on which XML is based as a subset, has additional types of marked sections such as INCLUDE/IGNORE for conditional inclusion of content or markup declarations, TEMP for editorial content, and RCDATA as a CDATA variant honoring entity references but not markup delimiters; the latter is available in HTML as well.