replace with big files java heap space out of memory-CodePudding

I have a big xml document 250mb, which one of the tags contains another xml that I need to process.

But the problem is, this xml is wrapped by CDATA and if I try to do a replace/replaceAll

String xml= fileContent.replace("<![CDATA[", "  ");
String replace = xml.replace("]]>", " ");

I'm gettig

java.lang.OutOfMemoryError: Java heap space

A simple example of the structure.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<a>
    <b>
        <c>
            <![CDATA[<?xml version="1.0" encoding="UTF-8" standalone="yes"?><bigXML>]]>
        </c>
    </b>
</a>

Even using XML parser like VDT or SAX it does not help because I still need to remove the <![CDATA[ and what we have inside there is the biggest portion of the file.

Allocate more memory heap is not an option since is running in a machine where I dont have any JVM control.

Anny idea how to extract the xml from c tag and also extract from <![CDATA[

CodePudding user response：

Out of memory is coming if the whole file is read as a string in memory. What if file is read chunk by chunk and do your operations and then write that chunk with modified data to another file, Hence saving the out of memory error.

You can try using buffered reader to read chunk by chunk :

BufferedReader buffer = new BufferedReader(file, int size);