Home > Enterprise >  Open xml and remove specific text (track changes date/time) using VBA
Open xml and remove specific text (track changes date/time) using VBA

Time:12-03

I am trying to accomplish the following task using VBA:

  1. Rename the word document(.docx) to Zip(.zip) so that I can make changes in xml
  2. extract, Zip content
  3. goto the word folder
  4. Open and edit the document.xml file
  5. Search using the following regular expression: w:date="[\d\W]\w[\d\W]\w"
  • Replace it with nothing.
  • This regular expression will match all XML timestamp attributes and remove them from the document.xml file.
  1. save the changes of document.xml file
  2. reZip files in the folder
  3. rename zip back to docx so that it becomes a Word document again

So far, I have gone to 5th step and stuck, as I don't know the correct way to open and manipulate xml.

  • I don't want to open it in plan text
  • also don't want to spoil the xml.

therefore I am looking for a way to open xml and remove specific text using vba without hurting xml.

the VBA code is as follows:

Sub Rename_Zip_Unzip()
    Dim FSO As FileSystemObject
    Dim oApp As Shell
    
    Dim oDocPath, myZip
    Dim oDocName, oDocTemp, oDocEx, AP, oDocZip
    Dim oDocXML As String
    Dim oDocUnzip As String

    With Application.FileDialog(msoFileDialogFilePicker)
        .AllowMultiSelect = False
        .Filters.Clear
        .InitialFileName = ActiveDocument.Path
        .Filters.Add "Word files", "*.doc*", 1
    If .Show = True Then
        If .SelectedItems.Count > 0 Then
        
            oDocPath = .SelectedItems(1)
            Else
            MsgBox "no valid selection"
        Exit Sub
        End If
        
    End If
    End With
        
    If oDocPath = "" Then
        Beep
        Exit Sub
    End If
    
    Set FSO = CreateObject("Scripting.FileSystemObject")
    
    AP = Application.PathSeparator
        
    oDocName = FSO.GetFileName(oDocPath)
    oDocEx = FSO.GetExtensionName(oDocPath)
    
    oDocZip = Replace(oDocPath, AP & oDocName, AP & "oDocZip.zip")
    
    'copy and rename
    FSO.CopyFile oDocPath, oDocZip
    
    Set oApp = CreateObject("Shell.Application")
    
    oDocUnzip = Replace(oDocPath, AP & oDocName, AP & "oDocUnzip")
    
    
    FileCompress.UnZip oDocZip, oDocUnzip, True
    
    oDocXML = Replace(oDocPath, AP & oDocName, AP & "oDocUnzip\word\document.xml")
    
    Set xDoc = Nothing
    
    Application.StatusBar = "    Loading xml!!! " & oDocXML
    If Dir(oDocXML) = "" Then
        MsgBox "File or folder path is not correct." & vbNewLine & oDocXML, vbOKOnly   vbCritical
        Exit Sub
    End If
    
    
    Load_xml oDocXML
    
' I tried to open it in word to replace text using wildcard, but it fails to open. 

    On Error Resume Next
        Set xDoc = Documents.Open(oDocXML, Visible:=True)
        If xDoc Is Nothing Then
            MsgBox "Cannot open:" & vbCr & oDocXML & vbNewLine & "Please check the Name of Folder and File.", vbExclamation
            Exit Sub
        End If


    Beep

End Sub

I need to open and make changes in xml using following function:

Function Load_xml(xml As String)

    ' Get file name ...

    Dim oDoc As New MSXML2.DOMDocument60
    Dim xMetricNames As IXMLDOMNodeList
    Dim xMetricName As IXMLDOMElement
    Dim xMetrics As IXMLDOMNode
    Dim xMetric As IXMLDOMElement
    Dim mtID As String, mtName As String, mtValue As String

    ' Load from file
    oDoc.Load xml

    ' Select needed nodes
    Set xMetrics = 
    oDoc.SelectSingleNode("//project/checkpoints/checkpoint/files/file/metrics")
    Set xMetricNames = oDoc.SelectNodes("//project/metric_names/metric_name")
    For Each xMetricName In xMetricNames
        mtName = xMetricName.TEXT
        mtID = xMetricName.getAttribute("id")
        mtValue = xMetrics.SelectSingleNode("metric[@id='" & mtID & "']").TEXT

        'here to delete the part of specific xml

    Next

    Set oDoc = Nothing

End Function

let say if I use open xml as simple text, what should be the correct pattern to replace the date and time?

CodePudding user response:

You can try to Load the text into a string and use the Replace() function for all the dates, You can skip all the xml/zip stuff. Word has also a find function you can use.

CodePudding user response:

You're on the right track about using the MSMXL DOM methods to ensure that you won't corrupt the XML.

To remove the elements that you find with

Set xMetricNames = oDoc.SelectNodes("//project/metric_names/metric_name")

Just call .removeAll()

xMetricNames.removeAll()

See https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms767535(v=vs.85)

  • Related