Home > Mobile >  Using powershell to merge PDFs in multiple subfolders with pdftk and then delete original PDF files
Using powershell to merge PDFs in multiple subfolders with pdftk and then delete original PDF files

Time:09-17

I have a root folder that contains many subfolders, each with multiple PDFs. I then have a powershell script that goes through the folder structure and creates a merged PDF file (using PDFtk) for each subfolder as follows:

enter image description here

    $pdftk = "C:\Program Files (x86)\PDFtk\bin\pdftk.exe"
    $RootFolder = "path to root folder"
    Get-ChildItem -r -include *.pdf | group DirectoryName | % {& $PDFtk $_.group CAT OUTPUT "$($_.Name | Split-Path -Parent)\$($_.Name | Split-Path -Leaf)_merged.pdf"}

The script works as required, however I will be working with a very large amount of data, and for that reason I need to delete the original PDFs from each folder after the merge is completed.

Basically, i need the script to look in the first folder 4830_2017, create the merged file 4830_2017_merged.pdf and then delete the PDFs located inside the 4830_2017 folder before moving on to the next folder, and doing the same thing.

I am stuggling to find the correct way of deleting the contents of each folder after the merge.

Thanks is advance for your help.

CodePudding user response:

In your ForEach-Object script block, $_.Group contains each group's, i.e. each directory's System.IO.FileInfo instances representing the *.pdf files, so you can pipe them to Remove-Item after a successful merge:

(Get-ChildItem -Recurse -Filter *.pdf) | 
  Group-Object DirectoryName | 
    ForEach-Object {
      & $PDFtk $_.Group.FullName CAT OUTPUT "$($_.Name | Split-Path -Parent)\$($_.Name | Split-Path -Leaf)_merged.pdf"
      if (0 -eq $LASTEXITCODE) { # If the merge succeeded.
        $_.Group | Remove-Item   # Delete.
      }
    }

Note:

  • The Get-ChildItem command is enclosed in (...) so as to ensure that its output is collected in full before further processing, to rule out side effects from new *.pdf files getting created or old ones getting deleted affecting the recursive enumeration.

    • -Filter *.pdf is used in lieu of -Include *.pdf, which is functionally equivalent in this case, but performs much better, due to delegating the filtering to the file-system APIs rather, at the source - see this answer.
  • & $PDFtk $_.Group was changed to & $PDFtk $_.Group.FullName to ensure that full file paths are passed; note that this is no longer necessary in PowerShell (Core) 7 , where System.IO.FileInfo and System.IO.DirectoryInfo instances consistently stringify to their full paths - see this answer.

  • Group-Object outputs Microsoft.PowerShell.Commands.GroupInfo instances.

  • Related