I am trying to accomplish the following task using VBA:
- Rename the word document(.docx) to Zip(.zip) so that I can make changes in xml
- extract, Zip content
- goto the word folder
- Open and edit the document.xml file
- Search using the following regular expression: w:date="[\d\W]\w[\d\W]\w"
- Replace it with nothing.
- This regular expression will match all XML timestamp attributes and remove them from the document.xml file.
- save the changes of document.xml file
- reZip files in the folder
- rename zip back to docx so that it becomes a Word document again
So far, I have gone to 5th step and stuck, as I don't know the correct way to open and manipulate xml.
- I don't want to open it in plan text
- also don't want to spoil the xml.
therefore I am looking for a way to open xml and remove specific text using vba without hurting xml.
the VBA code is as follows:
Sub Rename_Zip_Unzip()
Dim FSO As FileSystemObject
Dim oApp As Shell
Dim oDocPath, myZip
Dim oDocName, oDocTemp, oDocEx, AP, oDocZip
Dim oDocXML As String
Dim oDocUnzip As String
With Application.FileDialog(msoFileDialogFilePicker)
.AllowMultiSelect = False
.Filters.Clear
.InitialFileName = ActiveDocument.Path
.Filters.Add "Word files", "*.doc*", 1
If .Show = True Then
If .SelectedItems.Count > 0 Then
oDocPath = .SelectedItems(1)
Else
MsgBox "no valid selection"
Exit Sub
End If
End If
End With
If oDocPath = "" Then
Beep
Exit Sub
End If
Set FSO = CreateObject("Scripting.FileSystemObject")
AP = Application.PathSeparator
oDocName = FSO.GetFileName(oDocPath)
oDocEx = FSO.GetExtensionName(oDocPath)
oDocZip = Replace(oDocPath, AP & oDocName, AP & "oDocZip.zip")
'copy and rename
FSO.CopyFile oDocPath, oDocZip
Set oApp = CreateObject("Shell.Application")
oDocUnzip = Replace(oDocPath, AP & oDocName, AP & "oDocUnzip")
FileCompress.UnZip oDocZip, oDocUnzip, True
oDocXML = Replace(oDocPath, AP & oDocName, AP & "oDocUnzip\word\document.xml")
Set xDoc = Nothing
Application.StatusBar = " Loading xml!!! " & oDocXML
If Dir(oDocXML) = "" Then
MsgBox "File or folder path is not correct." & vbNewLine & oDocXML, vbOKOnly vbCritical
Exit Sub
End If
Load_xml oDocXML
' I tried to open it in word to replace text using wildcard, but it fails to open.
On Error Resume Next
Set xDoc = Documents.Open(oDocXML, Visible:=True)
If xDoc Is Nothing Then
MsgBox "Cannot open:" & vbCr & oDocXML & vbNewLine & "Please check the Name of Folder and File.", vbExclamation
Exit Sub
End If
Beep
End Sub
I need to open and make changes in xml using following function:
Function Load_xml(xml As String)
' Get file name ...
Dim oDoc As New MSXML2.DOMDocument60
Dim xMetricNames As IXMLDOMNodeList
Dim xMetricName As IXMLDOMElement
Dim xMetrics As IXMLDOMNode
Dim xMetric As IXMLDOMElement
Dim mtID As String, mtName As String, mtValue As String
' Load from file
oDoc.Load xml
' Select needed nodes
Set xMetrics =
oDoc.SelectSingleNode("//project/checkpoints/checkpoint/files/file/metrics")
Set xMetricNames = oDoc.SelectNodes("//project/metric_names/metric_name")
For Each xMetricName In xMetricNames
mtName = xMetricName.TEXT
mtID = xMetricName.getAttribute("id")
mtValue = xMetrics.SelectSingleNode("metric[@id='" & mtID & "']").TEXT
'here to delete the part of specific xml
Next
Set oDoc = Nothing
End Function
let say if I use open xml as simple text, what should be the correct pattern to replace the date and time?
CodePudding user response:
You can try to Load the text into a string and use the Replace() function for all the dates, You can skip all the xml/zip stuff. Word has also a find function you can use.
CodePudding user response:
You're on the right track about using the MSMXL DOM methods to ensure that you won't corrupt the XML.
To remove the elements that you find with
Set xMetricNames = oDoc.SelectNodes("//project/metric_names/metric_name")
Just call .removeAll()
xMetricNames.removeAll()
See https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms767535(v=vs.85)