I have a root folder that contains many subfolders, each with multiple PDFs. I then have a powershell script that goes through the folder structure and creates a merged PDF file (using PDFtk) for each subfolder as follows:
$pdftk = "C:\Program Files (x86)\PDFtk\bin\pdftk.exe"
$RootFolder = "path to root folder"
Get-ChildItem -r -include *.pdf | group DirectoryName | % {& $PDFtk $_.group CAT OUTPUT "$($_.Name | Split-Path -Parent)\$($_.Name | Split-Path -Leaf)_merged.pdf"}
The script works as required, however I will be working with a very large amount of data, and for that reason I need to delete the original PDFs from each folder after the merge is completed.
Basically, i need the script to look in the first folder 4830_2017, create the merged file 4830_2017_merged.pdf and then delete the PDFs located inside the 4830_2017 folder before moving on to the next folder, and doing the same thing.
I am stuggling to find the correct way of deleting the contents of each folder after the merge.
Thanks is advance for your help.
CodePudding user response:
In your ForEach-Object
script block, $_.Group
contains each group's, i.e. each directory's System.IO.FileInfo
instances representing the *.pdf
files, so you can pipe them to Remove-Item
after a successful merge:
(Get-ChildItem -Recurse -Filter *.pdf) |
Group-Object DirectoryName |
ForEach-Object {
& $PDFtk $_.Group.FullName CAT OUTPUT "$($_.Name | Split-Path -Parent)\$($_.Name | Split-Path -Leaf)_merged.pdf"
if (0 -eq $LASTEXITCODE) { # If the merge succeeded.
$_.Group | Remove-Item # Delete.
}
}
Note:
The
Get-ChildItem
command is enclosed in(...)
so as to ensure that its output is collected in full before further processing, to rule out side effects from new*.pdf
files getting created or old ones getting deleted affecting the recursive enumeration.-Filter *.pdf
is used in lieu of-Include *.pdf
, which is functionally equivalent in this case, but performs much better, due to delegating the filtering to the file-system APIs rather, at the source - see this answer.
& $PDFtk $_.Group
was changed to& $PDFtk $_.Group.FullName
to ensure that full file paths are passed; note that this is no longer necessary in PowerShell (Core) 7 , whereSystem.IO.FileInfo
andSystem.IO.DirectoryInfo
instances consistently stringify to their full paths - see this answer.Group-Object
outputsMicrosoft.PowerShell.Commands.GroupInfo
instances.