Question
We inherited an older project from another company, and this project has a "help" index made up of htm files that were converted from .doc files. The issue is, their team exported all of these files in a very outdated and not supported encoding so they are packed with random special character alts.
Eventually we will replace this system with a MUCH easier to use and develop one, but given that the product came with a large userbase, in the meantime we need to fix this. Is there some automation tool for this (that still works in present day, I've tried a couple older vb scripts), or am I going to need to manually re-export a few hundred docs today? (its not necessarily a huge issue, but there are other things that I think my time would be better spent on working on today)
To be very clear: I have a folder full of .doc files that need to be re-saved as .htm files with UTF-encoding
What I've tried
I've been digging through several SO posts trying various solutions. My current code is as follows:
Sub ChangeDocsToTxtOrRTFOrHTML()
Dim locFolder As String
Dim fileType As String
Dim oFolder As Object
Dim tFolder As Object
Dim fs As Object
locFolder = "C:\Users\ColeD\Desktop\Help Files Angular"
fileType = ".htm"
Set fs = CreateObject("Scripting.FileSystemObject")
Set oFolder = fs.GetFolder(locFolder)
Set tFolder = fs.GetFolder(locFolder & "Converted")
For Each oFile In oFolder.Files
MsgBox ("hrtr!")
Dim d As Document
Set d = Application.Documents.Open(oFile.Path)
strDocName = ActiveDocument.Name
intPos = InStrRev(strDocName, ".")
strDocName = Left(strDocName, intPos - 1)
strDocName = strDocName & fileType
ChangeFileOpenDirectory tFolder
ActiveDocument.SaveAs2 FileName:=strDocName & fileType, _
FileFormat:=wdFormatHTML, _
Encoding:=msoEncodingUTF8
d.Close
ChangeFileOpenDirectory oFolder
Next oFile
MsgBox ("Done!")
End Sub
The issue is, it only opens one file then stops
CodePudding user response:
It looks like you are using code copied from Convert multiple Word documents to HTML files using VBA
But you need to work with the code to make it work in your scenario which is only HTML, not the other file types. See below example for focusing on docx to HTML.
Sub test()
Dim fpath As String
Dim StrFile As String
On Error Resume Next
Set wordapp = CreateObject("word.Application")
wordapp.Visible = True
On Error GoTo 0
fpath = "C:\Users\user\"
StrFile = Dir(fpath & "*.doc*")
Do While Len(StrFile) > 0
wordapp.documents.Open fpath & StrFile
Filename = CreateObject("Scripting.FileSystemObject").GetBaseName(StrFile)
outputFileName = fpath & Filename & ".html"
Debug.Print outputFileName
Application.DisplayAlerts = False
Debug.Print wordapp.ActiveDocument.Name
wordapp.ActiveDocument.SaveAs Filename:=outputFileName, FileFormat:=8 'wdFormatFilteredHTML
Application.DisplayAlerts = True
wordapp.ActiveDocument.Close
Debug.Print StrFile
StrFile = Dir
Loop
End Sub