Embedded OpenType (CFF) font in a PDF shows strange behaviour in some viewers-CodePudding

When embedding a subsetted OpenType font with CFF outlines (

Chrome rendering (Noto HK top, PingFang HK bottom): Preview rendering (Noto HK invisible, PingFang HK as expected): Other HK Chinese CFF fonts like PingFang HK render perfectly in every PDF reader I have tested, but Noto Sans HK just won't. As far as embedding restrictions go, FontBook shows Noto Sans HK as having "no restrictions", so nothing there either.

I am embedding all fonts as CIDFontType0C fonts with Identity-H encoding, and although I'm not providing ToUnicode maps yet as they are the next thing on the roadmap, that should make no difference to rendering.

Noto HK Font objects (Widths removed for conciseness):

6 0 obj
<< /Ascent 1160 /CapHeight 733 /Descent -288 /Flags 4 /FontBBox [ -991 -1050 2930 1810 ] /FontFile3 10 0 R /FontName /NZGUSD NotoSansHK-Thin /ItalicAngle 0 /StemV 58 /Type /FontDescriptor >>
endobj
7 0 obj
<< /BaseFont /NZGUSD NotoSansHK-Thin /DescendantFonts [ 8 0 R ] /Encoding /Identity-H /Subtype /Type0 /Type /Font >>
endobj
8 0 obj
<< /BaseFont /NZGUSD NotoSansHK-Thin /CIDSystemInfo << /Ordering (Identity) /Registry (Adobe) /Supplement 0 >> /FontDescriptor 6 0 R /Subtype /CIDFontType0 /Type /Font /W 9 0 R >>
endobj

Equivalent PingFang objects:

11 0 obj
<< /Ascent 1060 /CapHeight 860 /Descent -340 /Flags 4 /FontBBox [ -72 -212 1126 952 ] /FontFile3 15 0 R /FontName /DYBBAB PingFangHK-Regular /ItalicAngle 0 /StemV 95 /Type /FontDescriptor >>
endobj
12 0 obj
<< /BaseFont /DYBBAB PingFangHK-Regular /DescendantFonts [ 13 0 R ] /Encoding /Identity-H /Subtype /Type0 /Type /Font >>
endobj
13 0 obj
<< /BaseFont /DYBBAB PingFangHK-Regular /CIDSystemInfo << /Ordering (Identity) /Registry (Adobe) /Supplement 0 >> /FontDescriptor 11 0 R /Subtype /CIDFontType0 /Type /Font /W 14 0 R >>
endobj

Relevant Page objects:

3 0 obj
<< /F4v0 12 0 R /F5v0 7 0 R >>
endobj
4 0 obj
<< /Contents 5 0 R /CropBox [ 2.5 4 595 842 ] /MediaBox [ 0 0 600 850 ] /Parent 2 0 R /Resources << /Font 3 0 R >> /Type /Page >>
endobj
5 0 obj
<< /Length 462 >>
stream
q 1 1 1 rg 0 0 600 850 re F Q  BT /F5v0 15.000000 Tf 0 0 0 rg 0 Tr 27.500000 802.000000 Td [<0AFD292728192FFF3162282746BB112F14E410E20E96201D0D820A9111440EC016922CB046A10AFD0EC039AF1D0B272D17D431C92A2B4F4D384719160F2C29C9297634F34F4D1846>] TJ ET  BT /F4v0 15.000000 Tf 0 0 0 rg 0 Tr 27.500000 780.280000 Td [<05487DE1129E161216D412A7726A08C175A77465074A7A1706A504E4748207710B1814B5726605480771641D0E4D12580BD481D113A37267628146D107BE7E0D1358AD3772670C18>] TJ ET endstream
endobj

I'm using HarfBuzz to generate subsets with the HB_SUBSET_FLAGS_RETAIN_GIDS flag set, and when I view the generated subset in FontForge, the glyphs expected are present with the correct GIDs.

In chrome the result is the same as before:

The glyphs being rendered definitely do not correspond to the glyph IDs being provided (again, verified with FontForge).

As before, PingFang and other fonts render perfectly in either case.

I'm starting to think I might be missing an edge case here with respect to glyph indexing, where Cairo and other PDF generators are remapping GIDs to low numbers so they have no issues, but I'm retaining the original GIDs (still fitting in 2 bytes, but could be an implementation limitation I haven't seen?).

I'll try remapping the GIDs to see if that helps and report back.

CodePudding user response：

This is happening because of a misunderstanding on my part of how CID fonts work in PDFs.

Let me explain.

When using a font in PDF you will provide several structures (font descriptor, font dictionary, and for Type0 a descendant font) describing the font, and categorising it into one of the predefined types (Type0, Type1, Type3, or TrueType), and in the case of Type0 a subtype (/CIDFontType0 or /CIDFontType2).

What I didn't understand was that Type0 fonts with subtype /CIDFontType0 actually have one further implicit distinction between those that use CIDFont operators in their TopDICT structure, and those that don't (which includes all CFF2 fonts).

The way glyph lookup works differs based on the type of font used too: With "Simple" fonts (Type1, TrueType) you would use the actual string ((like this) or <0074006800690073>) as the operand to text showing operators, whereas for Composite fonts (Type0) you would typically use hex encoded strings of CIDs (<DEADBEEF...>).

When using Identity mappings with CID fonts, CID == GID so we can use GIDs directly in these strings — unless you're using a CID Font with CFF outlines that has CIDFont operators in its TopDICT. In this (now rather rare) case, CIDs may or may not equal GIDs — in my testing NotoSansHK was the only font that used a different mapping, hence why other fonts worked fine.

What I needed was to parse the charset array in the TopDICT structure, and look up the GID in question to obtain a SID. Normally each SID corresponds to a string in the string index, but in OpenType fonts the SIDs seem to actually encode the CID for the font. Once the CID is obtained, this can be used to encode text in the PDF.

In my case, 人 (U 4EBA) had a GID of 2813, but the PDF reader interpreted that as a CID, which in this case didn't exist. When using the CID of 9749 instead, however, the glyph is shown as expected.