Home > database >  PDF file with two trailers?
PDF file with two trailers?

Time:11-13

If I analyse multiple PDF files with a hex editor, I see that all of them have two trailers. That's possible if an object has changed or renewed (https://blog.idrsolutions.com/multiple-trailers-in-a-pdf-file/), but in my case, the PDF files are not edited. Does anyone know why all of the analysed files have two trailers?

This is a PDF file that contains a lot of text and also two images (there are two trailers in this file, who are (almost) identical to each other: :

0001a30bh: 74 72 61 69 6C 65 72 0D 0A 3C 3C 2F 53 69 7A 65 ; TRAILER..<</Size
0001a31bh: 20 34 37 2F 52 6F 6F 74 20 31 20 30 20 52 2F 49 ;  47/Root 1 0 R/I
0001a32bh: 6E 66 6F 20 31 35 20 30 20 52 2F 49 44 5B 3C 45 ; nfo 15 0 R/ID[<E
0001a33bh: 42 33 46 46 33 41 31 45 33 37 33 43 36 34 45 39 ; B3FF3A1E373C64E9
0001a34bh: 31 30 45 33 46 42 43 34 45 37 38 39 31 33 43 3E ; 10E3FBC4E78913C>
0001a35bh: 3C 45 42 33 46 46 33 41 31 45 33 37 33 43 36 34 ; <EB3FF3A1E373C64
0001a36bh: 45 39 31 30 45 33 46 42 43 34 45 37 38 39 31 33 ; E910E3FBC4E78913
0001a37bh: 43 3E 5D 20 3E 3E 0D 0A 73 74 61 72 74 78 72 65 ; C>] >>..startxre
0001a38bh: 66 0D 0A 31 30 36 33 32 33 0D 0A 25 25 45 4F 46 ; f..106323..%%EOF
0001a39bh: 0D 0A 78 72 65 66 0D 0A 30 20 30 0D 0A 74 72 61 ; ..xref..0 0..TRA
0001a3abh: 69 6C 65 72 0D 0A 3C 3C 2F 53 69 7A 65 20 34 37 ; ILER..<</Size 47
0001a3bbh: 2F 52 6F 6F 74 20 31 20 30 20 52 2F 49 6E 66 6F ; /Root 1 0 R/Info
0001a3cbh: 20 31 35 20 30 20 52 2F 49 44 5B 3C 45 42 33 46 ;  15 0 R/ID[<EB3F
0001a3dbh: 46 33 41 31 45 33 37 33 43 36 34 45 39 31 30 45 ; F3A1E373C64E910E
0001a3ebh: 33 46 42 43 34 45 37 38 39 31 33 43 3E 3C 45 42 ; 3FBC4E78913C><EB
0001a3fbh: 33 46 46 33 41 31 45 33 37 33 43 36 34 45 39 31 ; 3FF3A1E373C64E91
0001a40bh: 30 45 33 46 42 43 34 45 37 38 39 31 33 43 3E 5D ; 0E3FBC4E78913C>]
0001a41bh: 20 2F 50 72 65 76 20 31 30 36 33 32 33 2F 58 52 ;  /Prev 106323/XR
0001a42bh: 65 66 53 74 6D 20 31 30 35 39 37 32 3E 3E 0D 0A ; efStm 105972>>..
0001a43bh: 73 74 61 72 74 78 72 65 66 0D 0A 31 30 37 34 32 ; startxref..10742
0001a44bh: 31 0D 0A 25 25 45 4F 46                         ; 1..%%EOF

This is a PDF file that does only contain some random characters:

000071cbh: 74 72 61 69 6C 65 72 0D 0A 3C 3C 2F 53 69 7A 65 ; TRAILER..<</Size
000071dbh: 20 32 33 2F 52 6F 6F 74 20 31 20 30 20 52 2F 49 ;  23/Root 1 0 R/I
000071ebh: 6E 66 6F 20 39 20 30 20 52 2F 49 44 5B 3C 39 46 ; nfo 9 0 R/ID[<9F
000071fbh: 46 31 32 45 31 43 30 41 35 36 44 42 34 38 41 33 ; F12E1C0A56DB48A3
0000720bh: 41 31 43 37 32 30 33 38 32 33 30 32 45 32 3E 3C ; A1C720382302E2><
0000721bh: 39 46 46 31 32 45 31 43 30 41 35 36 44 42 34 38 ; 9FF12E1C0A56DB48
0000722bh: 41 33 41 31 43 37 32 30 33 38 32 33 30 32 45 32 ; A3A1C720382302E2
0000723bh: 3E 5D 20 3E 3E 0D 0A 73 74 61 72 74 78 72 65 66 ; >] >>..startxref
0000724bh: 0D 0A 32 38 36 35 39 0D 0A 25 25 45 4F 46 0D 0A ; ..28659..%%EOF..
0000725bh: 78 72 65 66 0D 0A 30 20 30 0D 0A 74 72 61 69 6C ; xref..0 0..TRAIL
0000726bh: 65 72 0D 0A 3C 3C 2F 53 69 7A 65 20 32 33 2F 52 ; ER..<</Size 23/R
0000727bh: 6F 6F 74 20 31 20 30 20 52 2F 49 6E 66 6F 20 39 ; oot 1 0 R/Info 9
0000728bh: 20 30 20 52 2F 49 44 5B 3C 39 46 46 31 32 45 31 ;  0 R/ID[<9FF12E1
0000729bh: 43 30 41 35 36 44 42 34 38 41 33 41 31 43 37 32 ; C0A56DB48A3A1C72
000072abh: 30 33 38 32 33 30 32 45 32 3E 3C 39 46 46 31 32 ; 0382302E2><9FF12
000072bbh: 45 31 43 30 41 35 36 44 42 34 38 41 33 41 31 43 ; E1C0A56DB48A3A1C
000072cbh: 37 32 30 33 38 32 33 30 32 45 32 3E 5D 20 2F 50 ; 720382302E2>] /P
000072dbh: 72 65 76 20 32 38 36 35 39 2F 58 52 65 66 53 74 ; rev 28659/XRefSt
000072ebh: 6D 20 32 38 33 37 34 3E 3E 0D 0A 73 74 61 72 74 ; m 28374>>..start
000072fbh: 78 72 65 66 0D 0A 32 39 32 37 35 0D 0A 25 25 45 ; xref..29275..%%E
0000730bh: 4F 46                                           ; OF

                                                                                      

CodePudding user response:

Those files are most likely created by MS Word. The excerpts you posted look like their interpretation of hybrid reference PDFs.

There are two special constructs in which the PDF specification uses the mechanisms it introduced for incremental updates for something else:

  • Linearized PDFs (see ISO 32000-2:2020 Annex F) and
  • Hybrid-reference PDFs (see ISO 32000-2:2020 Section 7.5.8.4).

Your excerpts look like the latter type of PDFs.

Some backgrounds:

With PDF 1.5 Adobe introduced the option to collect multiple non-stream indirect objects in a stream, a so called "object stream". The advantage of doing so is that data in streams can be compressed while otherwise those object cannot be compressed. At the same time they also introduced the option to put the cross reference table data into streams, the so called "cross-reference streams", also to allow compression.

Obviously a new type of cross reference entry was necessary to describe indirect objects in object streams, so they defined entries of that kind, but only for the cross-reference streams, not for the old cross reference tables.

PDFs stored using object and cross-reference streams often indeed are much smaller than the same PDFs stored as regular indirect objects with cross reference tables. On the other hand PDF processors that were not aware of these techniques couldn't open these PDFs at all.

Thus, Adobe came up with the idea of hybrid files: Files that contain the basic objects in a PDF required to view it at all in the old-fashioned way and the objects for newer or optional features in object and cross reference streams. The trailers of the cross reference tables contain an entry XRefStm pointing to the cross reference stream.

For some reason, though, it was specified that object lookup first had to be attempted in the cross reference table, and only if no entry was found there for the object number in question, the associated cross reference stream was to be searched.

As the first cross reference table is required to cover the complete range of object numbers used, this lookup strategy implied that hybrid-reference files needed a second cross reference table whose trailer could point to the cross reference stream that would be used for lookups before the innermost, first cross reference table.

This is what we see in your example:

trailer
<</Size 47/Root 1 0 R/Info 15 0 R/ID[<EB3FF3A1E373C64E910E3FBC4E78913C><EB3FF3A1E373C64E910E3FBC4E78913C>] >>
startxref
106323
%%EOF
xref
0 0
trailer
<</Size 47/Root 1 0 R/Info 15 0 R/ID[<EB3FF3A1E373C64E910E3FBC4E78913C><EB3FF3A1E373C64E910E3FBC4E78913C>] /Prev 106323/XRefStm 105972>>
startxref
107421
%%EOF

Actually most PDF producers implemented hybrid-reference files (if they did at all) under the impression that the cross reference stream and probably also the object streams should go between the first trailer and the second cross reference table. But there is no requirement for that, and the PDF export of MS Office chose to put all the streams before the first cross reference table. As that's the case for your examples, too, I assume they were produced by MS Office.

  • Related