I have the following part of a script that converts a Word document (Previously converted from a PDF) to a text file. This is usually a function as part of a larger script but for the purposes of this question this is fine.
Sub GetTextFromWord()
Dim fso As FileSystemObject
Dim oWd As Object, oDoc As Object
Set fso = New FileSystemObject
Set oWd = CreateObject("word.application")
Set oDoc = oWd.Documents.Open("C:\temp\PDFs\XFA006HH - Granular Sulphamic acid - Univar - 19-05-2021.pdf.doc")
filePath = "C:\temp\PDFs\" & "TEST" & ".txt" 'filename
Debug.Print filePath
'open text stream as unicode
Set fileStream = fso.CreateTextFile(filePath, overwrite:=True, Unicode:=True)
fileStream.Write oDoc.Range.Text
fileStream.Close
oDoc.Close
oWd.Quit
End Sub
The TEST file generated is okay however lacks the subsection numbers that would normally be present.
When I generate the text file manually open the word doc. (File Export > Change file type > plain text (save). With options Windows Default selected, Insert line breaks unticked and allows for character substitution.
The generated text file is as desired.
When I record a macro in word for the same steps, I get the following script:
Sub Macro2()
' Macro2 Macro
'
'
ActiveDocument.SaveAs2 FileName:= _
"XFA006HH - Granular Sulphamic acid - Univar - 19-05-2021.pdf.txt", _
FileFormat:=wdFormatText, LockComments:=False, Password:="", _
AddToRecentFiles:=True, WritePassword:="", ReadOnlyRecommended:=False, _
EmbedTrueTypeFonts:=False, SaveNativePictureFormat:=False, SaveFormsData _
:=False, SaveAsAOCELetter:=False, Encoding:=1252, InsertLineBreaks:=False _
, AllowSubstitutions:=True, LineEnding:=wdCRLF, CompatibilityMode:=0
End Sub
I would like to modify the first script to incorporate these parameters (mainly InsertLineBreaks:=False, AllowSubstitutions:=True
- unsure if the others are required to generate the text file as exact). Ideally, I can incorporate as many as feasible to play around with and see the effect of the file generated. Things like LockComments:=False, Password:=""
are not required.
How can I incorporate the script to achieve this?
fso.CreateTextFile doesn't appear to give such options so I wonder if I need to rethink this.
Link to Doc file:
CodePudding user response:
Please, try the next updated code. It replaces the VBScript object method with the one you tested:
Sub GetTextFromWord()
Dim fso As FileSystemObject
Dim oWd As Object, oDoc As Object
Const wdFormatText as Long = 2, wdCRLF as Long = 0
Set fso = New FileSystemObject
Set oWd = CreateObject("word.application")
Set oDoc = oWd.Documents.Open("C:\temp\PDFs\XFA006HH - Granular Sulphamic acid - Univar - 19-05-2021.pdf.doc")
Dim filePath As String: filePath = "C:\temp\PDFs\" & "TEST" & ".txt" 'filename
Debug.Print filePath
oDoc.SaveAs2 fileName:=filePath, _
FileFormat:=wdFormatText, LockComments:=False, Password:="", _
AddToRecentFiles:=True, WritePassword:="", ReadOnlyRecommended:=False, _
EmbedTrueTypeFonts:=False, SaveNativePictureFormat:=False, SaveFormsData _
:=False, SaveAsAOCELetter:=False, Encoding:=1252, InsertLineBreaks:=False _
, AllowSubstitutions:=True, LineEnding:=wdCRLF, CompatibilityMode:=0
oDoc.Close False
oWd.Quit
End Sub