I have a big xml document 250mb, which one of the tags contains another xml that I need to process.
But the problem is, this xml is wrapped by CDATA
and if I try to do a replace/replaceAll
String xml= fileContent.replace("<![CDATA[", " ");
String replace = xml.replace("]]>", " ");
I'm gettig
java.lang.OutOfMemoryError: Java heap space
A simple example of the structure.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<a>
<b>
<c>
<![CDATA[<?xml version="1.0" encoding="UTF-8" standalone="yes"?><bigXML>]]>
</c>
</b>
</a>
Even using XML parser like VDT
or SAX
it does not help because I still need to remove the <![CDATA[
and what we have inside there is the biggest portion of the file.
Allocate more memory heap is not an option since is running in a machine where I dont have any JVM control.
Anny idea how to extract the xml from c
tag and also extract from <![CDATA[
CodePudding user response:
Out of memory is coming if the whole file is read as a string in memory. What if file is read chunk by chunk and do your operations and then write that chunk with modified data to another file, Hence saving the out of memory error.
You can try using buffered reader to read chunk by chunk :
BufferedReader buffer = new BufferedReader(file, int size);