Home > Mobile >  Extracting attachments from PDF with PowerShell
Extracting attachments from PDF with PowerShell

Time:07-29

I've roamed the web about this topic but didn't find anything that matches my case.

I have a PDF that includes a few attachments:

enter image description here

From these attachments I need the content of a specific file (pdfDocument.xml).

Is there a way to extract the attachments with Powershell?

CodePudding user response:

PowerShell has no native PDF file handler, so I am going to suggest the simplest cmd sequence is:-

Presuming you have no current pdf utility then you need to download and extract one at a cmd prompt.

md c:\pdfutils
cd /d c:\pdfutils
curl -O https://dl.xpdfreader.com/xpdf-tools-win-4.04.zip
tar -m -xf xpdf-tools-win-4.04.zip
xpdf-tools-win-4.04\bin32\pdfdetach

this will show the help screen

you can use bin64 in place of bin32 if your system is 64 bit, at this stage your powershell could navigate and presuming pdfDocument.xml is say number 4 run pdfdetach -save 4 -o pdfDocument.xml input.pdf HOWEVER it is best to run from a work directory, and since a file can sometimes be a problem I tested -saveall against a fractured PDF and did get the desired target, thus would recommend run in a temporary work dir as best

c:\pdfutils\xpdf-tools-win-4.04\bin32\pdfdetach -saveall "C:\path to\factur-x\fractured.pdf"
Syntax Error: Couldn't read xref table
Syntax Warning: PDF file is damaged - attempting to reconstruct xref table...

and in current work directory got all attachments including the one I was after.

  • Related