Home > Blockchain >  Generating PDF user password hash
Generating PDF user password hash

Time:03-21

Currently, I am attempting to generating a hash of a user password for PDF, given the encrypted PDF file and the plain password. I follow the instruction of this article. However, the hash I've computed is different from the hash stored in the PDF file.

The hashed user password (/U entry) is simply the 32-byte padding string above, encrypted with RC4, using the 5-byte file key. Compliant PDF viewers will check the password given by the user (by attempting to decrypt the /U entry using the file key, and comparing it against the padding string) and allow or refuse certain operations based on the permission settings.

First, I padded my password "123456" using a hardcoded 32-byte string, which gives me

31 32 33 34 35 36 28 BF 4E 5E 4E 75 8A 41 64 00
4E 56 FF FA 01 08 2E 2E 00 B6 D0 68 3E 80 2F 0C

I tried to compute the hash with RC4 using the 5-byte file key as the key. According to the article:

The encryption key is generated as follows:

    1. Pad the user password out to 32 bytes, using a hardcoded
       32-byte string:
           28 BF 4E 5E 4E 75 8A 41 64 00 4E 56 FF FA 01 08
           2E 2E 00 B6 D0 68 3E 80 2F 0C A9 FE 64 53 69 7A
       If the user password is null, just use the entire padding
       string.  (I.e., concatenate the user password and the padding
       string and take the first 32 bytes.)

    2. Append the hashed owner password (the /O entry above).

    3. Append the permissions (the /P entry), treated as a four-byte
       integer, LSB first.

    4. Append the file identifier (the /ID entry from the trailer
       dictionary).  This is an arbitrary string of bytes; Adobe
       recommends that it be generated by MD5 hashing various pieces
       of information about the document.

    5. MD5 hash this string; the first 5 bytes of output are the
       encryption key.  (This is a 40-bit key, presumably to meet US
       export regulations.)

I appended the hashed owner key to the padded password, which gives me

31 32 33 34 35 36 28 BF 4E 5E 4E 75 8A 41 64 00
4E 56 FF FA 01 08 2E 2E 00 B6 D0 68 3E 80 2F 0C
C4 31 FA B9 CC 5E F7 B5 9C 24 4B 61 B7 45 F7 1A
C5 BA 42 7B 1B 91 02 DA 46 8E 77 12 7F 1E 69 D6

Then, I appended the /P entry (-4), treated as a four-byte integer, encoded with little endian, which gives me

31 32 33 34 35 36 28 BF 4E 5E 4E 75 8A 41 64 00
4E 56 FF FA 01 08 2E 2E 00 B6 D0 68 3E 80 2F 0C
C4 31 FA B9 CC 5E F7 B5 9C 24 4B 61 B7 45 F7 1A
C5 BA 42 7B 1B 91 02 DA 46 8E 77 12 7F 1E 69 D6
FC FF FF FF

Last, I appended the file identifier to it. The trailer of my PDF is:

trailer
<<
/Size 13
/Root 2 0 R
/Encrypt 1 0 R
/Info 4 0 R
/ID [<B5185D941CC0EA39ACA809F661EF36D4> <393BE725532F9158DC9E6E8EA97CFBF0>]
>>

and the result is

31 32 33 34 35 36 28 BF 4E 5E 4E 75 8A 41 64 00
4E 56 FF FA 01 08 2E 2E 00 B6 D0 68 3E 80 2F 0C
C4 31 FA B9 CC 5E F7 B5 9C 24 4B 61 B7 45 F7 1A
C5 BA 42 7B 1B 91 02 DA 46 8E 77 12 7F 1E 69 D6
FC FF FF FF B5 18 5D 94 1C C0 EA 39 AC A8 09 F6
61 EF 36 D4 39 3B E7 25 53 2F 91 58 DC 9E 6E 8E
A9 7C FB F0

MD5 hashing this block of data returns 942c5e7b2020ce57ce4408f531a65019. I RC4-ed the padded password with cryptii using the first 5 bytes of the MD5 hash as the key. However, it returns

90 e2 b5 21 2a 7d 53 05 70 d9 5d 26 95 c7 c2 05
6e 2a 28 40 63 e7 4a d4 e9 05 86 71 43 d1 39 d6

while the hash in PDF is

58 81 CA 74 65 DC 2E A7 5D D2 39 D4 43 9C 0D DE
28 BF 4E 5E 4E 75 8A 41 64 00 4E 56 FF FA 01 08

Which step am I doing wrong? I suspect that the problem happens because

  • I am appending the File Idenifier in a wrong format
  • I am using the wrong drop bytes with RC4.
  • The hash function is not for PDF 1.6
  • I make some mistake during those process
  • Or maybe the article is actually wrong

Files: Original PDF dummy.pdf, dummy-protected.pdf (Password: 123456)

Please help

CodePudding user response:

There are two issues in your calculation:

  • The article to use refers to PDF encryption algorithms available for PDF-1.3 but your document is encrypted using an algorithm introduced with PDF-1.5.

  • You make an error when appending the file identifier - actually only the first entry of the ID array shall be appended, not both (which is not really clear from the article you use).

In a comment you asked accordingly

where can I find the password hashing detail for >V1.3 PDF?

I would propose using the PDF specification, ISO 32000.

As ISO specifications go, they are not free, but Adobe used to provide a version of ISO 32000-1 with merely the ISO header removed on their web site. Some days ago it has been removed (By design? By error? I don't know yet.) but you still find copies of it googl'ing for "PDF32000".

The relevant section in ISO 32000-1 is 7.6 Encryption and in particular 7.6.3 Standard Security Handler.

Following that information you should be able to correctly calculate the value in question.

(Alternatively you can also use old Adobe PDF references, the editions for PDF 1.5, 1.6, and 1.7 should also give you the information required for decrypting your document. But these references have been characterized as not normative in nature by prominent Adobe employees, so I would go for the ISO norm.)

Beware, though: After ISO 32000-1 had been published, Adobe introduced an AES-256 encryption scheme as an extension which obviously is not included in ISO 32000-1. You can find a specification in "Adobe Supplement to ISO 32000, base version 1.7, extension level 3".

Furthermore, with ISO 32000-2 that Adobe AES-256 encryption scheme and all older schemes became deprecated, the only encryption scheme to use with PDF-2.0 is a new AES-256 encryption scheme described in ISO 32000-2 which is based on the Adobe scheme but introduces some extra hashing iterations.

  • Related