I have an xml file and I want to manipulate the tags using the Java DOM, but its size is 25 gega-octets, so its telling me I can't and shows me this error:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
public Frwiki() {
filePath = "D:\\compressed\\frwiki-latest-pages-articles.xml";
}
public void deletingTag() throws Exception {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
Document doc = factory.newDocumentBuilder().parse(filePath);
NodeList nodes = doc.getElementsByTagName("*");
for (int j = 0; j < 3; j ) {
for (int i = 0; i < nodes.getLength(); i ) {
Node node = nodes.item(i);
if (!node.getNodeName().equals("id") && !node.getNodeName().equals("title")
&& !node.getNodeName().equals("text") && !node.getNodeName().equals("mediawiki")
&& !node.getNodeName().equals("revision") && !node.getNodeName().equals("page"))
node.getParentNode().removeChild(node);
}
}
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.transform(new DOMSource(doc), new StreamResult(filePath));
}
CodePudding user response:
You can split a large file into smaller files using XSLT 3.0 streaming, like this:
<xsl:transform version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template name="xsl:initial-template">
<xsl:source-document streamable="yes" href="frwiki-latest-pages-articles.xml">
<xsl:for-each-group ....>
<xsl:result-document href="......">
<part><xsl:copy-of select="current-group()"/></part>
</xsl:result-document>
</xsl:for-each-group>
</xsl:source-document>
</xsl:template>
</xsl:transform>
The "..." parts depend on how you want to split the document and name the result files.
Although XSLT 3.0 streaming is a W3C specification, the only implementation available at the moment is my company's Saxon-EE processor.
CodePudding user response:
Split the large XML file into smaller chunks and process them separately.