Home > Software design >  split pdf based on the text using vba acrobat api
split pdf based on the text using vba acrobat api

Time:01-23

I am trying to split pdf, based on the pages where it finds ".pdf" however when I try to save the pdf with a dynamic string variable, it do not save the pdf but when I write hardcode file path it output the pdf. do not know what is going on here.

the following code is not finished yet I am stuck in creating new pdf with deleted pages:

Function Extract_PDF()
    
    Dim aApp As Acrobat.CAcroApp
    Dim av_Doc As Acrobat.CAcroAVDoc
    Dim pdf_Doc As Acrobat.CAcroPDDoc '
    Dim newPDFdoc As Acrobat.CAcroPDDoc
    
    Dim Sel_Text As Acrobat.CAcroPDTextSelect
    Dim i As Long, j As Long
    Dim pageNum, Content
    Dim pageContent As Acrobat.CAcroHiliteList
    Dim found As Boolean
    Dim foundPage As Integer
    Dim PDF_Path As String
    Dim pdfName As String
    Dim folerPath As String
    
    Dim FileExplorer As FileDialog
    Set FileExplorer = Application.FileDialog(msoFileDialogFilePicker)
        
    With FileExplorer
        .AllowMultiSelect = False
        .InitialFileName = ActiveDocument.Path
        .Filters.Clear
        .Filters.Add "PDF File", "*.pdf"
        
        If .Show = -1 Then
            PDF_Path = .SelectedItems.Item(1)
            
        Else
            PagesLB = "Catch me Next Time ;)"
            PDF_Path = ""
            Exit Function
        End If
    End With
        
    Set aApp = CreateObject("AcroExch.App")
    Set av_Doc = CreateObject("AcroExch.AVDoc")
    
    If av_Doc.Open(PDF_Path, vbNull) <> True Then Exit Function
    
    
    While av_Doc Is Nothing
        Set av_Doc = aApp.GetActiveDoc
    Wend
                
        av_Doc.BringToFront
        aApp.Show
    
    Set pdf_Doc = av_Doc.GetPDDoc
    
    For i = pdf_Doc.GetNumPages - 1 To 0 Step -1
    
        Set pageNum = pdf_Doc.AcquirePage(i)
        Set pageContent = CreateObject("AcroExch.HiliteList")
        
        If pageContent.Add(0, 9000) <> True Then Exit Function
        Set Sel_Text = pageNum.CreatePageHilite(pageContent)
        
        Content = ""
        found = False
        
        For j = 0 To Sel_Text.GetNumText - 1
            Content = Content & Sel_Text.GetText(j)
            If InStr(1, Content, ".pdf") > 0 Then
                found = True
                foundPage = i
                pdfName = Content
                Exit For
            End If
        Next j

        If found Then
            
            PDF_Path = Left(PDF_Path, InStrRev(PDF_Path, "\")) & ValidWBName(pdfName)
                        
            Set newPDFdoc = CreateObject("AcroExch.PDDoc")
            Set newPDFdoc = av_Doc.GetPDDoc
            
                If newPDFdoc.DeletePages(0, i - 1) = False Then
                    Debug.Print "Failed"
                Else
                    Debug.Print "done"
                End If
                
                If newPDFdoc.Save(PDSaveFull, PDF_Path) = False Then
                
                    Debug.Print "Failed to save pdf "
                    Else
                    Debug.Print "Saved"
                End If
            
            newPDFdoc.Close
        
        End If
    
    Next i
        
        av_Doc.Close False
        aApp.Exit
            
        Set av_Doc = Nothing
        Set pdf_Doc = Nothing
        Set aApp = Nothing
    
End Function

ValidWBName:

Function ValidWBName(agr As String) As String
    Dim RegEx As Object
    Set RegEx = CreateObject("VBScript.RegExp")
    With RegEx
        .Pattern = "[\\/:\*\?""<>\|]"
        .Global = True
        ValidWBName = .Replace(agr, "")
    End With
End Function

in above function when it finds the word pdf it try to create a new instance of pdf and remove previous pages.

        If found Then
            
            PDF_Path = Left(PDF_Path, InStrRev(PDF_Path, "\")) & ValidWBName(pdfName)
                        
            Set newPDFdoc = CreateObject("AcroExch.PDDoc")
            Set newPDFdoc = av_Doc.GetPDDoc
            
                If newPDFdoc.DeletePages(0, i - 1) = False Then
                    Debug.Print "Failed"
                Else
                    Debug.Print "done"
                End If
                
                If newPDFdoc.Save(PDSaveFull, PDF_Path) = False Then
                
                    Debug.Print "Failed to save pdf "
                    Else
                    Debug.Print "Saved"
                End If
            
            newPDFdoc.Close
        
        End If

this line "Failed to save pdf"

        If newPDFdoc.Save(PDSaveFull, PDF_Path) = False Then

but when I write hardcode path it create the pdf

        If newPDFdoc.Save(PDSaveFull, "C:\Users\MBA\Desktop\PDF Project 2\Murdoch_Michael__Hilary_PIA_19.pdf") = False Then

CodePudding user response:

the culprit HAD to be in ValidWBName() function, which didn't handle all possible not allowed chars for a valid file name

since it came out vbCr char was one of them, you could change it as follows:

Function ValidWBName(agr As String) As String
    Dim RegEx As Object
    Set RegEx = CreateObject("VBScript.RegExp")
    With RegEx
        .Pattern = "[\\/:\*\?""<>\|" & Chr(13) & "]"  ' <-- added vbCr 
        .Global = True
        ValidWBName = .Replace(agr, "")
    End With
End Function
  • Related