Home > database >  Automate .doc to .htm process in Word
Automate .doc to .htm process in Word

Time:06-24

Question

We inherited an older project from another company, and this project has a "help" index made up of htm files that were converted from .doc files. The issue is, their team exported all of these files in a very outdated and not supported encoding so they are packed with random special character alts.

Eventually we will replace this system with a MUCH easier to use and develop one, but given that the product came with a large userbase, in the meantime we need to fix this. Is there some automation tool for this (that still works in present day, I've tried a couple older vb scripts), or am I going to need to manually re-export a few hundred docs today? (its not necessarily a huge issue, but there are other things that I think my time would be better spent on working on today)

To be very clear: I have a folder full of .doc files that need to be re-saved as .htm files with UTF-encoding

What I've tried

I've been digging through several SO posts trying various solutions. My current code is as follows:

Sub ChangeDocsToTxtOrRTFOrHTML()
    Dim locFolder As String
    Dim fileType As String
    Dim oFolder As Object
    Dim tFolder As Object
    Dim fs As Object
    
    locFolder = "C:\Users\ColeD\Desktop\Help Files Angular"
    fileType = ".htm"
    Set fs = CreateObject("Scripting.FileSystemObject")
    Set oFolder = fs.GetFolder(locFolder)
    Set tFolder = fs.GetFolder(locFolder & "Converted")
    
    For Each oFile In oFolder.Files
    MsgBox ("hrtr!")
        Dim d As Document
        Set d = Application.Documents.Open(oFile.Path)
        strDocName = ActiveDocument.Name
        intPos = InStrRev(strDocName, ".")
        strDocName = Left(strDocName, intPos - 1)
        strDocName = strDocName & fileType
        ChangeFileOpenDirectory tFolder
        
        ActiveDocument.SaveAs2 FileName:=strDocName & fileType, _
                               FileFormat:=wdFormatHTML, _
                               Encoding:=msoEncodingUTF8

        d.Close
        ChangeFileOpenDirectory oFolder
    Next oFile
    MsgBox ("Done!")
End Sub

The issue is, it only opens one file then stops

CodePudding user response:

It looks like you are using code copied from Convert multiple Word documents to HTML files using VBA

But you need to work with the code to make it work in your scenario which is only HTML, not the other file types. See below example for focusing on docx to HTML.

Sub test()

Dim fpath As String
Dim StrFile As String

On Error Resume Next
    Set wordapp = CreateObject("word.Application")
    wordapp.Visible = True
On Error GoTo 0

fpath = "C:\Users\user\"
StrFile = Dir(fpath & "*.doc*")
    
    Do While Len(StrFile) > 0
        wordapp.documents.Open fpath & StrFile
        Filename = CreateObject("Scripting.FileSystemObject").GetBaseName(StrFile)
        outputFileName = fpath & Filename & ".html"
        Debug.Print outputFileName
        Application.DisplayAlerts = False
        Debug.Print wordapp.ActiveDocument.Name
        wordapp.ActiveDocument.SaveAs Filename:=outputFileName, FileFormat:=8 'wdFormatFilteredHTML
        Application.DisplayAlerts = True
        wordapp.ActiveDocument.Close
        Debug.Print StrFile
        StrFile = Dir
    Loop

End Sub
  • Related