I need to get word count of docx file based on the statistics provided in the ms word (Review->Word count)
Sor far I tried to use Aspose.Words and OpenXml in .Net. Both these tools provide me word count using BuiltInProperties or ExtendedProperties.
OpenXML
var appPart = wordprocessingDocument.ExtendedFilePropertiesPart;
Aspose
var wordCountValue = document.BuiltInDocumentProperties.FirstOrDefault(x => x.Name == "Words")?.Value;
However, In case the checkbox provided above is "unchecked", then footnotes/endnotes words are not counted by both these frameworks.
I need to make sure that footnotes/endnotes are always counted. I can't count words myself based on extracted text, as it will not be easy to get closed match as compared to ms word count. Is there any way I can get this count? OR is there any way I can confirm if the checkbox provided in statistics is "checked" or not?
CodePudding user response:
BuiltInDocumentProperties.Words actually returns the value read from the document. See app.xml
in the DOCX document:
<Properties xmlns="http://schemas.openxmlformats.org/officeDocument/2006/extended-properties" xmlns:vt="http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes">
<Template>Normal.dotm</Template>
<TotalTime>1</TotalTime>
<Pages>1</Pages>
<Words>1</Words>
<Characters>9</Characters>
<Application>Microsoft Office Word</Application>
<DocSecurity>0</DocSecurity>
<Lines>1</Lines>
<Paragraphs>1</Paragraphs>
<ScaleCrop>false</ScaleCrop>
<Company></Company>
<LinksUpToDate>false</LinksUpToDate>
<CharactersWithSpaces>9</CharactersWithSpaces>
<SharedDoc>false</SharedDoc>
<HyperlinksChanged>false</HyperlinksChanged>
<AppVersion>16.0000</AppVersion>
</Properties>
If the document has been generated by some external tool, number of words in the document might not be calculated. Aspose.Words does not update words count unless you explicitly call Document.UpdateWordCount method.
The mentioned checkbox value is stored in settings.xml
as <w:doNotIncludeSubdocsInStats/>
tag (if unchecked). Aspose.Words takes this flag in account when Document.UpdateWordCount
method is called. But unfortunately, there is no public API to get or set this flag.
You can post a feature request in Aspose.Words support forum to add API for DoNotIncludeSubdocsInStats
flag.