Home > database >  How can I get the size of the embedded fonts?
How can I get the size of the embedded fonts?

Time:07-19

I have a private PDF document which has about 0.6MB, but when I watermark it with PyPDF2 it grows to 12 MB (the watermarking document is < 0.4MB). I think that this is related to compression, but I don't understand how.

It especially confuses me why the original PDF is so huge (uncompressed).:

  • No images
  • No embedded files
  • Just 15 pages and the extracted text has 1467 characters

I was thinking that it might be embedded fonts:

$ pdffonts example.pdf 
name                                 type              encoding         emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
AAAAAB ArialMT                       CID TrueType      Identity-H       yes yes yes      8  0
AAAAAC OpenSans-Regular              TrueType          MacRoman         yes yes no      13  0
AAAAAD MyriadPro-Regular             Type 1C           MacRoman         yes yes no      14  0
AAAAAE MyriadPro-Regular             Type 1C           MacRoman         yes yes no      15  0
AAAAAF OpenSans-Regular              TrueType          MacRoman         yes yes no      16  0
AAAAAG OpenSans-Regular              TrueType          MacRoman         yes yes no      17  0
AAAAAH OpenSans-Regular              TrueType          MacRoman         yes yes no      18  0
AAAAAI OpenSans-Bold                 TrueType          MacRoman         yes yes no      19  0
AAAAAJ OpenSans-Regular              TrueType          MacRoman         yes yes no      20  0
AAAAAK OpenSans-Italic               TrueType          MacRoman         yes yes no      21  0
AAAAAL OpenSans-Regular              TrueType          MacRoman         yes yes no      31  0
AAAAAM OpenSans-Regular              TrueType          MacRoman         yes yes no      35  0
AAAAAN MyriadPro-Regular             Type 1C           MacRoman         yes yes no      36  0
AAAAAO MyriadPro-Regular             Type 1C           MacRoman         yes yes no      37  0
AAAAAP OpenSans-Regular              TrueType          MacRoman         yes yes no      38  0
AAAAAQ OpenSans-Regular              TrueType          MacRoman         yes yes no      39  0
AAAAAR OpenSans-Regular              TrueType          MacRoman         yes yes no      40  0
AAAAAS OpenSans-Bold                 TrueType          MacRoman         yes yes no      41  0
AAAAAT OpenSans-Regular              TrueType          MacRoman         yes yes no      42  0
AAAAAU Arial-BoldMT                  CID TrueType      Identity-H       yes yes yes     53  0
AAAAAV ArialMT                       CID TrueType      Identity-H       yes yes yes     54  0
AAAAAW Arial-ItalicMT                CID TrueType      Identity-H       yes yes yes     60  0

How can I check the (uncompressed) size of the embedded fonts?

CodePudding user response:

With

mutool extract example.pdf

you can extract all images / fonts.

In my case, the sum of all fonts (and two images I missed) was 0.3 kB... so my search continues.

  •  Tags:  
  • pdf
  • Related