There are files in a directory that partly start the same way:
- C001_200129.pdf
- C001_29292.pdf
- C001_ABCDF.pdf
- C041_29292.pdf
- C041_110101.pdf
- P121_AAAA.pdf
- P121_CCCC.pdf
- P121_DDDD.pdf
- P121_AKAKA.pdf
The files with the same prefix (which I do not know beforehand), I only know that the first 4 characters are the same, are to be merged.
pdftk.exe $a*.pdf CAT OUTPUT \Merged\$a-merged.pdf
How do I loop through to find all the files that have the same prefix and then pass them into the $a variable and merge them and then go directly to the next set of files that I can merge.
The goal here should be
- C001_Merged.pdf
- C041_Merged.pdf
- P121_Merged.pdf
to be able to create
CodePudding user response:
Santiago Squarzon has provided the basic steps in a comment; let me flesh them out:
Get-ChildItem -Filter *.pdf | # Get all input files.
Group-Object { ($_.Name -split '_')[0] } | # Group them by shared prefix.
ForEach-Object { # Process each group.
pdftk.exe $_.Group.FullName CAT OUTPUT \Merged\$($_.Name)_merged.pdf
}
Group-Object
is used to group all*.pdf
files in the current directory (add a-LiteralPath
argument to target a different dir.) by their shared prefix (the start of the name up to, but not including_
).The
.Name
property of each resulting[Microsoft.PowerShell.Commands.GroupInfo]
instance contains that shared prefix, and the.Group
property contains all file-info objects that make up the group.$_.Group.FullName
uses member-access enumeration to get the full paths (.FullName
) of all file-info objects in the group; passing the resulting array of file paths to an external program such aspdftk.exe
results in the paths getting passed as individual arguments.