I'm using pdfbox 2.0.26 to convert pdf to image. The maven dependencies are as following.
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>fontbox</artifactId>
<version>2.0.26</version>
</dependency>
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.26</version>
</dependency>
The program that I wrote is like
FileInputStream fin = new FileInputStream("/path/to/sample.pdf");
try(final PDDocument doc = PDDocument.load(fin)){
PDFRenderer pdfRenderer = new PDFRenderer((doc));
BufferedImage bim = pdfRenderer.renderImageWithDPI(0, 300, ImageType.RGB);
File myObj = new File("/path/to/sample.png");
FileOutputStream fos = new FileOutputStream(myObj);
ImageIOUtil.writeImage(bim, "png", fos);
fin.close();
fos.close();
} catch (IOException e) {
System.out.println("error");
}
It works fine on my macOS (although the fonts in the image is different from the one in the pdf), but the Chinese characters are lost when I run it on the linux server.
The source PDF file can be found here
The resulting image file is like:
What shall I do to solve the problem? Thank you
CodePudding user response:
Thanks to Tilman Hausherr's suggestion, I realized that when the specified fonts are not available, "PDFBox will try to find one that is close". My problem was that PDFBox failed to find one that is close enough to recognize the font on the pdf file. After I upload some Chinese fonts to the server (for linux os, the fonts are copied to /usr/share/fonts), the problem was solved. The font types I used belong to the company I'm working for, but I believe fonts such as SimSun will also work, just try it out.