Home > Software engineering >  How is shift-enter represented in a Word doc?
How is shift-enter represented in a Word doc?

Time:03-21

I'm using Java and Apache POI to read a Word documen template and generate a new document from it. The original document has newline breaks entered with "shift-enter"; I thought it would allow a line break while continuing the paragraph. But as I sequence through runs, I seem to get an empty string at that point. There are 'flags' on the run; do they indicate the line break somehow? I want to leave it in the resuling document; I think what's happening is that I detect it as an empty string and leave it out. How can I detect its presence so I can leave it in the resulting document after I've processed the template?

As a side note, are those flags documented anywhere?

CodePudding user response:

I suspect you are talking about XWPF of apache poi which is the apache poi part to handle Office Open XML file format *.docx.

All Office Open XML file formats are ZIP archives containing XML files and other files in a special directory structure. So one can simply unzip a *.docx file and have a look into it.

For an explicit line break (Shift Enter) you will find following XML in /word/document.xml in that ZIP archive:

...
<w:r ...>
 <w:br/>
</w:r>
...

So it is a run element (w:r) containing one or more break elements (w:br).

The run element (w:r) is the low level source for a XWPFRun in apache poi. It is represented by a org.openxmlformats.schemas.wordprocessingml.x2006.main.CTR which can be got via XWPFRun.getCTR.

So if you got a XWPFRun run, you can get the explicit line breaks as so:

...
for (int i = 0; i < run.getCTR().getBrList().size(); i  ) {
 System.out.println("<BR />");
}
...

Is this documented anywhere?

There is ECMA-376 for Office Open XML.

The org.openxmlformats.schemas.wordprocessingml.x2006.main.* classes are auto-generated from this specifications. Unfortunately there is not a API documentation public available. So one needs downloading the sources from ooxml-schemas (up to apache poi 4) or poi-ooxml-full (from apache poi 5 on) and then doing javadoc from them.

  • Related