Home > Mobile >  Run-time error '5' VBA when running against specific PDF
Run-time error '5' VBA when running against specific PDF

Time:08-22

I have the following Code in VBA following an answer to my enter image description here

URLs:

https://hpvchemicals.oecd.org/ui/handler.axd?id=e19d2799-0c16-496d-a607-b09330dd28a7
https://hpvchemicals.oecd.org/ui/handler.axd?id=40da06b1-a855-4c0c-bc21-bbc856dca725
https://hpvchemicals.oecd.org/ui/handler.axd?id=c4967546-1f5e-472a-b629-a2998323735b
https://hpvchemicals.oecd.org/ui/handler.axd?id=bde5e625-83ee-423d-aa70-eb0e453088e4
https://hpvchemicals.oecd.org/ui/handler.axd?id=621c4f55-ef3c-4b99-bb98-e6aaf3f436dd
https://hpvchemicals.oecd.org/ui/handler.axd?id=26e1420d-f9b7-4768-b6fa-d345f54e7683
https://hpvchemicals.oecd.org/ui/handler.axd?id=263f3491-90c7-4c3a-b43e-4c4e9395bcea
https://hpvchemicals.oecd.org/ui/handler.axd?id=b78d39a9-26c2-48ff-aadc-cb056a89f08b
https://hpvchemicals.oecd.org/ui/handler.axd?id=97a7b56f-ebaf-4416-8b4b-88b19ca3bd16
https://hpvchemicals.oecd.org/ui/handler.axd?id=c6c3b7c1-9239-40d9-b51a-85a15e2411d6

So firstly, I think it's likely that the issue with the last URL can be solved. However, I think I also need to introduce some error handling that generates a blank text file and moves on to the next but im not sure how to achieve this.

I am not very competent with VBA, I have specified to repeat for rows B2:B200, but ideally, it would be good if, regardless of how many URLs, it just works to that number.

Also not sure if the logic in my code is particularly robust/if there is a better way to extract text from a URL.

The expected output is:

enter image description here

And here is an example of the generated Text file.

enter image description here

CodePudding user response:

Seems like that document at that problem URL contained some characters which couldn't be written to a non-unicode text file.

See comments inline:

Sub Tester()

    Dim filePath As String
    Dim fso As FileSystemObject, url
    Dim fileStream As TextStream, ws As Worksheet
    Dim oWd As Object, oDoc As Object, c As Range, fileRoot As String
    
    Set fso = New FileSystemObject
    Set oWd = CreateObject("word.application")
    
    Set ws = Worksheets("Data")     'use a specific worksheet reference
    fileRoot = ws.Range("D2").Value 'read this once
    If Right(fileRoot, 1) <> "\" Then fileRoot = fileRoot & "\" 'ensure terminating \
    
    For Each c In ws.Range("B2:B" & ws.Cells(Rows.Count, "B").End(xlUp).row).Cells
        url = Trim(c.Value)
        If LCase(url) Like "http?:*" Then  'has a URL
            Set oDoc = Nothing
            On Error Resume Next 'ignore error if no document...
            Set oDoc = oWd.Documents.Open(url)
            On Error GoTo 0      'stop ignoring errors
            If Not oDoc Is Nothing Then
                filePath = fileRoot & c.Offset(0, -1).Value & ".txt" 'filename from ColA
                Debug.Print filePath
                'open text stream as unicode
                Set fileStream = fso.CreateTextFile(filePath, overwrite:=True, Unicode:=True)
                fileStream.Write oDoc.Range.Text
                fileStream.Close
                oDoc.Close
                c.Interior.Color = vbGreen 'flag OK
            Else
                c.Interior.Color = vbRed   'flag problem
            End If
        End If 'have url
    Next c
    
    oWd.Quit
End Sub

CodePudding user response:

using excel notepad or any text means your comfortable with such as export columns as csv build a cmd file that can have extra constructs like not show during runtime, but personally like to see confirmed progress NOTE using call the second parameter must be "quoted".

download2txt.cmd

call URLpdf2txt Name1  "https://hpvchemicals.oecd.org/ui/handler.axd?id=e19d2799-0c16-496d-a607-b09330dd28a7"
call URLpdf2txt Name2  "https://hpvchemicals.oecd.org/ui/handler.axd?id=40da06b1-a855-4c0c-bc21-bbc856dca725"
call URLpdf2txt Name3  "https://hpvchemicals.oecd.org/ui/handler.axd?id=c4967546-1f5e-472a-b629-a2998323735b"
call URLpdf2txt Name4  "https://hpvchemicals.oecd.org/ui/handler.axd?id=bde5e625-83ee-423d-aa70-eb0e453088e4"
call URLpdf2txt Name5  "https://hpvchemicals.oecd.org/ui/handler.axd?id=621c4f55-ef3c-4b99-bb98-e6aaf3f436dd"
call URLpdf2txt Name6  "https://hpvchemicals.oecd.org/ui/handler.axd?id=26e1420d-f9b7-4768-b6fa-d345f54e7683"
call URLpdf2txt Name7  "https://hpvchemicals.oecd.org/ui/handler.axd?id=263f3491-90c7-4c3a-b43e-4c4e9395bcea"
call URLpdf2txt Name8  "https://hpvchemicals.oecd.org/ui/handler.axd?id=b78d39a9-26c2-48ff-aadc-cb056a89f08b"
call URLpdf2txt Name9  "https://hpvchemicals.oecd.org/ui/handler.axd?id=97a7b56f-ebaf-4416-8b4b-88b19ca3bd16"
call URLpdf2txt Name10  "https://hpvchemicals.oecd.org/ui/handler.axd?id=c6c3b7c1-9239-40d9-b51a-85a15e2411d6"

in comment you mention pdftotext is not installed as a native command so first step would be to ensure a local copy so a helper script as specified URLpdf2txt would be

URLpdf2txt.cmd

@echo off
if not exist xpdf-tools-win-4.04/bin32/pdftotext.exe curl -o %temp%\xpdftools.zip https://dl.xpdfreader.com/xpdf-tools-win-4.04.zip && tar -m -xf %temp%\xpdftools.zip xpdf-tools-win-4.04/bin32/pdftotext.exe
curl -o "%~dpn1.pdf" %2
"xpdf-tools-win-4.04/bin32/pdftotext.exe" -nopgbrk -layout -enc UTF-8 "%~dpn1.pdf" "%~dpn1.txt"
  • Related