Home > Net >  Issue loading .XML files with doc4J in Java
Issue loading .XML files with doc4J in Java

Time:09-30

I am facing an issue where I cannot load even the sample word2003xml.xml which is provided by doc4J for tests in docx4j-samples-docx4j-8.3.1.zip found here https://www.docx4java.org/downloads.html

I tried loading the file using 2 different constructors but the result is the same.

WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new FileInputStream(new File("C:\\Mine\\project4tests\\word2003xml.xml")));
WordprocessingMLPackage wordMLPackage2 = WordprocessingMLPackage.load(new java.io.File("C:\\Mine\\project4tests\\word2003xml.xml"));

Here is the exception that I am getting:

Exception in thread "main" org.docx4j.openpackaging.exceptions.Docx4JException: Couldn't load xml from stream 
    at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:641)
    at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:418)
    at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:376)
    at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:341)
    at org.docx4j.openpackaging.packages.WordprocessingMLPackage.load(WordprocessingMLPackage.java:182)
    at Main.main(Main.java:13)
Caused by: javax.xml.bind.UnmarshalException
  with linked exception:
[com.sun.istack.internal.SAXParseException2; lineNumber: 3; columnNumber: 827; unexpected element (uri:"http://schemas.microsoft.com/office/word/2003/wordml", local:"wordDocument"). Expected elements are <{http://schemas.microsoft.com/office/2006/xmlPackage}package>,<{http://schemas.microsoft.com/office/2006/xmlPackage}part>,<{http://schemas.microsoft.com/office/2006/xmlPackage}xmlData>]
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.handleStreamException(UnmarshallerImpl.java:468)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:402)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(UnmarshallerImpl.java:371)
    at org.docx4j.convert.in.FlatOpcXmlImporter.<init>(FlatOpcXmlImporter.java:132)
    at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:638)
    ... 5 more
Caused by: com.sun.istack.internal.SAXParseException2; lineNumber: 3; columnNumber: 827; unexpected element (uri:"http://schemas.microsoft.com/office/word/2003/wordml", local:"wordDocument"). Expected elements are <{http://schemas.microsoft.com/office/2006/xmlPackage}package>,<{http://schemas.microsoft.com/office/2006/xmlPackage}part>,<{http://schemas.microsoft.com/office/2006/xmlPackage}xmlData>
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallingContext.handleEvent(UnmarshallingContext.java:726)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.Loader.reportError(Loader.java:247)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.Loader.reportError(Loader.java:242)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.Loader.reportUnexpectedChildElement(Loader.java:109)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallingContext$DefaultRootLoader.childElement(UnmarshallingContext.java:1131)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallingContext._startElement(UnmarshallingContext.java:556)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallingContext.startElement(UnmarshallingContext.java:538)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.InterningXmlVisitor.startElement(InterningXmlVisitor.java:60)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.StAXStreamConnector.handleStartElement(StAXStreamConnector.java:231)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.StAXStreamConnector.bridge(StAXStreamConnector.java:165)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:400)
    ... 8 more
Caused by: javax.xml.bind.UnmarshalException: unexpected element (uri:"http://schemas.microsoft.com/office/word/2003/wordml", local:"wordDocument"). Expected elements are <{http://schemas.microsoft.com/office/2006/xmlPackage}package>,<{http://schemas.microsoft.com/office/2006/xmlPackage}part>,<{http://schemas.microsoft.com/office/2006/xmlPackage}xmlData>
    ... 19 more

There is no issue loading a .DOCX file, however what I need to use the docx4J library is to convert an old .DOC (WordprocessingML more like an .XML) file into a .DOCX. Similar to what is done here https://coderanch.com/t/721499/java/Word-XML-DOCX

Does anybody know why I cannot load the file properly?

CodePudding user response:

See https://github.com/plutext/docx4j/blob/master/docx4j-core/src/main/java/org/docx4j/convert/in/word2003xml/Word2003XmlConverter.java for 2003 XML files.

Note that .doc is the old binary format; its not XML, it is something different again.

CodePudding user response:

I tried the code that you have specified, but I could not get it to work. I get the following exception no matter what file I use:

Exception in thread "main" javax.xml.bind.JAXBException: Provider com.sun.xml.internal.bind.v2.ContextFactory could not be instantiated: com.sun.xml.internal.bind.v2.runtime.IllegalAnnotationsException: 1 counts of IllegalAnnotationExceptions
There's no ObjectFactory with an @XmlElementDecl for the element {http://schemas.microsoft.com/office/word/2012/wordml}dataBinding.
    this problem is related to the following location:
        at protected java.util.List org.docx4j.wml.SdtPr.rPrOrAliasOrLock
        at org.docx4j.wml.SdtPr
        at protected org.docx4j.wml.SdtPr org.docx4j.wml.SdtBlock.sdtPr
        at org.docx4j.wml.SdtBlock
        at protected java.util.List org.docx4j.wml.Body.content
        at org.docx4j.wml.Body
        at protected org.docx4j.wml.Body org.docx4j.wml.Document.body
        at org.docx4j.wml.Document
        at public org.docx4j.wml.Document org.docx4j.wml.ObjectFactory.createDocument()
        at org.docx4j.wml.ObjectFactory
        at protected java.util.List org.docx4j.wml.CTPictureBase.anyAndAny
        at org.docx4j.wml.CTPictureBase
        at org.docx4j.wml.Pict
        at protected org.docx4j.wml.Pict org.docx4j.wml.Numbering$NumPicBullet.pict
        at org.docx4j.wml.Numbering$NumPicBullet
        at protected java.util.List org.docx4j.wml.Numbering.numPicBullet
        at org.docx4j.wml.Numbering
        at protected org.docx4j.wml.Numbering org.docx4j.convert.in.word2003xml.Transition03To06.numbering
        at org.docx4j.convert.in.word2003xml.Transition03To06
        at public org.docx4j.convert.in.word2003xml.Transition03To06 org.docx4j.convert.in.word2003xml.ObjectFactory.createTransition03To06()
        at org.docx4j.convert.in.word2003xml.ObjectFactory

I tried cloning the repo provided by you https://github.com/plutext/docx4j to make sure the code is the same.

Any ideas on how to work around this?

  • Related