I need to implement on the server side(.NET Core) the ability to check the PDF document for the presence of Java Scripts. If there is a script there, then I need to inform user about it. Is it possible to do this without using paid libraries? I will be grateful for any ideas.
CodePudding user response:
Cross platform poppler utils is the simplest to use, There is no guarantee any means will find deliberately obscured JavaScript
pdfinfo -js filename.pdf
will output as plain text any standard embedded JavaScript, thus if the text looks obscured you can be forewarned of an oddity.
A common PDF with normal /JavaScript
entry will show up by simple plain text search.
for suspect or compressed file objects a simple extension is to use a pdf decompressor (Internal streams must be decompressed if required to show up any content as plain text) and text search for the /JavaScript
marker as here for example:-
<</S/JavaScript/JS(\n\r\n\r\n// T
However an article about say PDF exploitation could legitimately contain this text
Td [(/JavaScript)]TJ
and it is easy for JS to self-edit at run time so this would not be detected in such a simple manner /JavaScr##69pt
You may find of interest page 4 of https://web.archive.org/web/20150421225342if_/http://cs.gmu.edu:80/~astavrou/research/Daiping_dsn14.pdf
For a similar question (aimed at PHP) with variable answers see Find malicious PDF files using PHP validation?
CodePudding user response:
use PDFsharp and MigraDoc, it is free to use as it is open source (http://www.pdfsharp.net/Licensing.ashx)