Home > Mobile >  How to find Java Script in PDF document in .NET Core?
How to find Java Script in PDF document in .NET Core?

Time:01-10

I need to implement on the server side(.NET Core) the ability to check the PDF document for the presence of Java Scripts. If there is a script there, then I need to inform user about it. Is it possible to do this without using paid libraries? I will be grateful for any ideas.

CodePudding user response:

Cross platform poppler utils is the simplest to use, There is no guarantee any means will find deliberately obscured JavaScript

pdfinfo -js filename.pdf

will output as plain text any standard embedded JavaScript, thus if the text looks obscured you can be forewarned of an oddity.

A common PDF with normal /JavaScript entry will show up by simple plain text search.

for suspect or compressed file objects a simple extension is to use a pdf decompressor (Internal streams must be decompressed if required to show up any content as plain text) and text search for the /JavaScript marker as here for example:-

<</S/JavaScript/JS(\n\r\n\r\n// T

However an article about say PDF exploitation could legitimately contain this text
Td [(/JavaScript)]TJ

and it is easy for JS to self-edit at run time so this would not be detected in such a simple manner /JavaScr##69pt

You may find of interest page 4 of https://web.archive.org/web/20150421225342if_/http://cs.gmu.edu:80/~astavrou/research/Daiping_dsn14.pdf

For a similar question (aimed at PHP) with variable answers see Find malicious PDF files using PHP validation?

CodePudding user response:

use PDFsharp and MigraDoc, it is free to use as it is open source (http://www.pdfsharp.net/Licensing.ashx)

  • Related