I am facing an issue where I cannot load even the sample word2003xml.xml which is provided by doc4J for tests in docx4j-samples-docx4j-8.3.1.zip found here https://www.docx4java.org/downloads.html
I tried loading the file using 2 different constructors but the result is the same.
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new FileInputStream(new File("C:\\Mine\\project4tests\\word2003xml.xml")));
WordprocessingMLPackage wordMLPackage2 = WordprocessingMLPackage.load(new java.io.File("C:\\Mine\\project4tests\\word2003xml.xml"));
Here is the exception that I am getting:
Exception in thread "main" org.docx4j.openpackaging.exceptions.Docx4JException: Couldn't load xml from stream
at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:641)
at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:418)
at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:376)
at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:341)
at org.docx4j.openpackaging.packages.WordprocessingMLPackage.load(WordprocessingMLPackage.java:182)
at Main.main(Main.java:13)
Caused by: javax.xml.bind.UnmarshalException
with linked exception:
[com.sun.istack.internal.SAXParseException2; lineNumber: 3; columnNumber: 827; unexpected element (uri:"http://schemas.microsoft.com/office/word/2003/wordml", local:"wordDocument"). Expected elements are <{http://schemas.microsoft.com/office/2006/xmlPackage}package>,<{http://schemas.microsoft.com/office/2006/xmlPackage}part>,<{http://schemas.microsoft.com/office/2006/xmlPackage}xmlData>]
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.handleStreamException(UnmarshallerImpl.java:468)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:402)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(UnmarshallerImpl.java:371)
at org.docx4j.convert.in.FlatOpcXmlImporter.<init>(FlatOpcXmlImporter.java:132)
at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:638)
... 5 more
Caused by: com.sun.istack.internal.SAXParseException2; lineNumber: 3; columnNumber: 827; unexpected element (uri:"http://schemas.microsoft.com/office/word/2003/wordml", local:"wordDocument"). Expected elements are <{http://schemas.microsoft.com/office/2006/xmlPackage}package>,<{http://schemas.microsoft.com/office/2006/xmlPackage}part>,<{http://schemas.microsoft.com/office/2006/xmlPackage}xmlData>
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallingContext.handleEvent(UnmarshallingContext.java:726)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.Loader.reportError(Loader.java:247)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.Loader.reportError(Loader.java:242)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.Loader.reportUnexpectedChildElement(Loader.java:109)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallingContext$DefaultRootLoader.childElement(UnmarshallingContext.java:1131)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallingContext._startElement(UnmarshallingContext.java:556)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallingContext.startElement(UnmarshallingContext.java:538)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.InterningXmlVisitor.startElement(InterningXmlVisitor.java:60)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.StAXStreamConnector.handleStartElement(StAXStreamConnector.java:231)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.StAXStreamConnector.bridge(StAXStreamConnector.java:165)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:400)
... 8 more
Caused by: javax.xml.bind.UnmarshalException: unexpected element (uri:"http://schemas.microsoft.com/office/word/2003/wordml", local:"wordDocument"). Expected elements are <{http://schemas.microsoft.com/office/2006/xmlPackage}package>,<{http://schemas.microsoft.com/office/2006/xmlPackage}part>,<{http://schemas.microsoft.com/office/2006/xmlPackage}xmlData>
... 19 more
There is no issue loading a .DOCX file, however what I need to use the docx4J library is to convert an old .DOC (WordprocessingML more like an .XML) file into a .DOCX. Similar to what is done here https://coderanch.com/t/721499/java/Word-XML-DOCX
Does anybody know why I cannot load the file properly?
CodePudding user response:
See https://github.com/plutext/docx4j/blob/master/docx4j-core/src/main/java/org/docx4j/convert/in/word2003xml/Word2003XmlConverter.java for 2003 XML files.
Note that .doc is the old binary format; its not XML, it is something different again.
CodePudding user response:
I tried the code that you have specified, but I could not get it to work. I get the following exception no matter what file I use:
Exception in thread "main" javax.xml.bind.JAXBException: Provider com.sun.xml.internal.bind.v2.ContextFactory could not be instantiated: com.sun.xml.internal.bind.v2.runtime.IllegalAnnotationsException: 1 counts of IllegalAnnotationExceptions
There's no ObjectFactory with an @XmlElementDecl for the element {http://schemas.microsoft.com/office/word/2012/wordml}dataBinding.
this problem is related to the following location:
at protected java.util.List org.docx4j.wml.SdtPr.rPrOrAliasOrLock
at org.docx4j.wml.SdtPr
at protected org.docx4j.wml.SdtPr org.docx4j.wml.SdtBlock.sdtPr
at org.docx4j.wml.SdtBlock
at protected java.util.List org.docx4j.wml.Body.content
at org.docx4j.wml.Body
at protected org.docx4j.wml.Body org.docx4j.wml.Document.body
at org.docx4j.wml.Document
at public org.docx4j.wml.Document org.docx4j.wml.ObjectFactory.createDocument()
at org.docx4j.wml.ObjectFactory
at protected java.util.List org.docx4j.wml.CTPictureBase.anyAndAny
at org.docx4j.wml.CTPictureBase
at org.docx4j.wml.Pict
at protected org.docx4j.wml.Pict org.docx4j.wml.Numbering$NumPicBullet.pict
at org.docx4j.wml.Numbering$NumPicBullet
at protected java.util.List org.docx4j.wml.Numbering.numPicBullet
at org.docx4j.wml.Numbering
at protected org.docx4j.wml.Numbering org.docx4j.convert.in.word2003xml.Transition03To06.numbering
at org.docx4j.convert.in.word2003xml.Transition03To06
at public org.docx4j.convert.in.word2003xml.Transition03To06 org.docx4j.convert.in.word2003xml.ObjectFactory.createTransition03To06()
at org.docx4j.convert.in.word2003xml.ObjectFactory
I tried cloning the repo provided by you https://github.com/plutext/docx4j to make sure the code is the same.
Any ideas on how to work around this?