I'm looking for fast code to delete bookmarks from docx file opened in MS Word.
Now, I am using simple VBA macro with some experience-based improvement.
Public Sub RemoveBookmarks(ByRef doc As Document)
Dim b As Bookmark
Dim i As Long
For Each b In doc.Bookmarks
b.Delete
'There were documents freeze Word after delete only 4 bookmarks
i = i 1
If i Mod 4 = 0 Then
doc.UndoClear
End If
'to handle possible Ctrl Break
If i Mod 100 = 0 Then
DoEvents
End If
Next b
Set b = Nothing
End Sub
Very often my colleagues have large documents (over 1,2k pages) with 25k and more bookmarks. Delete this bookmarks take a lot of time.
Delete bookmarks using DocumentOpenXml and manipulate WordProcessingDocument is very fast:
public static void RemoveAllBookmarks(string fileName)
{
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(fileName, true))
{
var d = wordDoc.MainDocumentPart.Document;
var bstart = d.Descendants<BookmarkStart>().ToList();
foreach (var s in bstart)
s.Remove();
var bend = d.Descendants<BookmarkEnd>().ToList();
foreach (var e in bend)
e.Remove();
d.Save();
wordDoc.Save();
}
}
but I want to avoid close and open document in Word again, because adding and removing bookmarks is a part of larger proces. I don't want (I don't think I can) predict whether the document preparation process will be faster: just delete using VBA or close, remove and open huge file several times.
Maybe there is the solution to manipulate WordprocessingDocument underneath opened document and insert xml.
CodePudding user response:
Whenever you delete items from a collection you need to start from the end and work backwards.
Public Sub RemoveBookmarks(ByRef doc As Document)
Dim b As Long
For b = doc.Bookmarks.Count To 1 Step -1
doc.Bookmarks(b).Delete
Next b
End Sub
CodePudding user response:
If you want to transform the already opened Word document, you can use the WordOpenXML
property of a Document
or Range
instance to get the document's or range's Open XML markup in the Flat OPC format. You can then work with that XML string in the following ways:
- Using the
DocumentFormat.OpenXml
NuGet package (Open XML SDK), you can turn the Flat OPC string into aWordprocessingDocument
and transform it as you described. - Using LINQ to XML (
System.Xml.Linq
) and theDocumentFormat.OpenXml.Linq
NuGet package, you can turn it into anXElement
and transform it without having to turn it into aWordprocessingDocument
Once you have transformed the markup, you can turn it back into a Flat OPC string and insert it back into the Word document, using the Range.InsertXML()
method.
Transforming Open XML markup is one or two orders of magnitude faster than using the COM APIs, if you need many COM calls to create the desired result. Note, though, that retrieving the Open XML markup from an opened document and inserting it back into the document is also not free.