I'm using Java and Apache POI to read a Word documen template and generate a new document from it. The original document has newline breaks entered with "shift-enter"; I thought it would allow a line break while continuing the paragraph. But as I sequence through runs, I seem to get an empty string at that point. There are 'flags' on the run; do they indicate the line break somehow? I want to leave it in the resuling document; I think what's happening is that I detect it as an empty string and leave it out. How can I detect its presence so I can leave it in the resulting document after I've processed the template?
As a side note, are those flags documented anywhere?
CodePudding user response:
I suspect you are talking about XWPF
of apache poi
which is the apache poi
part to handle Office Open XML file format *.docx
.
All Office Open XML file formats are ZIP archives containing XML files and other files in a special directory structure. So one can simply unzip a *.docx
file and have a look into it.
For an explicit line break (Shift Enter) you will find following XML in /word/document.xml
in that ZIP archive:
...
<w:r ...>
<w:br/>
</w:r>
...
So it is a run element (w:r
) containing one or more break elements (w:br
).
The run element (w:r
) is the low level source for a XWPFRun in apache poi
. It is represented by a org.openxmlformats.schemas.wordprocessingml.x2006.main.CTR
which can be got via XWPFRun.getCTR
.
So if you got a XWPFRun run
, you can get the explicit line breaks as so:
...
for (int i = 0; i < run.getCTR().getBrList().size(); i ) {
System.out.println("<BR />");
}
...
Is this documented anywhere?
There is ECMA-376 for Office Open XML.
The org.openxmlformats.schemas.wordprocessingml.x2006.main.*
classes are auto-generated from this specifications. Unfortunately there is not a API documentation public available. So one needs downloading the sources from ooxml-schemas (up to apache poi 4
) or poi-ooxml-full (from apache poi 5
on) and then doing javadoc
from them.