Home > database >  HTML to PDF with cyrillic characters
HTML to PDF with cyrillic characters

Time:02-10

I'm making a Spring Boot application. I want to generate PDF from HTML code:

        String htmlString = "<!DOCTYPE html>\n"  
                "<html lang=\"ru\">\n"  
                "<head>\n"  
                "    <meta charset=\"UTF-8\"/>\n"  
                "    <meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\"/>\n"  
                "    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\"/>\n"  
                "</head>\n"  
                "<body>\n"  
                "    <h3>ПРЕДСТАВЛЕНИЕ</h3>\n"  
                "</body>\n"  
                "</html>";

        ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
        String path = FileSystemView.getFileSystemView().getDefaultDirectory().getPath()   "/A.pdf";
        OutputStream outputStream = new FileOutputStream(path);

        ITextRenderer renderer = new ITextRenderer();
        renderer.setDocumentFromString(htmlString);
        renderer.layout();
        renderer.createPDF(outputStream);

        byteArrayOutputStream.writeTo(outputStream);

As you can see there is a h3 tag with cyrillic symbols. The problem is that after conversion and saving the symbols are not presented in PDF (it's simply empty, because there is nothing more in html code to be visible). Other symbols are being displayed properly btw.

For html-to-pdf conversion i use:

<dependency>
    <groupId>org.xhtmlrenderer</groupId>
    <artifactId>flying-saucer-pdf-itext5</artifactId>
    <version>9.0.1</version>
</dependency>

I suppose there is a problem with charset, fonts etc. How can I fix it?

CodePudding user response:

This worked for me!

public static void main(String[] args) throws DocumentException, IOException, SAXException, ParserConfigurationException {
        String htmlString = "<!DOCTYPE html>\n"   "<html lang=\"ru\">\n"   "<head>\n"
                  "    <meta charset=\"UTF-8\"/>\n"   "    <meta http-equiv=\"Content-Type\" content=\"text/html\"/>\n"
                  "    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\"/>\n" 
                  "    <style type='text/css'> "
                  "        * { font-family: Verdana; }/n"
                  "    </style>/n"
                  "</head>\n"
                  "<body>\n"   "    <h3>ПРЕДСТАВЛЕНИЕ</h3>\n"   "</body>\n"   "</html>";


    String path = FileSystemView.getFileSystemView().getDefaultDirectory().getPath()   "/A.pdf";
    OutputStream os = new FileOutputStream(path);
    ITextRenderer renderer = new ITextRenderer();
    renderer.getFontResolver().addFont("c:/windows/fonts/verdana.ttf", BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
    renderer.setDocumentFromString(htmlString);
    renderer.layout();
    renderer.createPDF(os);
    os.close();
}

enter image description here

I think the trick is to add the CSS to the HTML and the font must match what you set on the PDF.

CodePudding user response:

You must add a font that support Cyrillic to the renderer. (For example Deja Vu.)

String htmlString = getHtml();
ITextRenderer renderer = new ITextRenderer();
renderer.getFontResolver().addFont("font/dejavu-sans/DejaVuSans.ttf", BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
OutputStream out = new FileOutputStream(new File("so.pdf"));
renderer.setDocumentFromString(html);
renderer.layout();
renderer.createPDF(out);
out.close();

And in your HTML, you should set the font-family to use:

<html>
<head>
    <style>
        body{font-family: "DejaVu Sans", Arial, sans-serif }
    </style>
</head>
<body>
Лорем ипсум долор сит амет, цу вел оратио постеа импедит
</body>
</html>
  • Related