Home > Net >  To solve the extract PDF text garbled words
To solve the extract PDF text garbled words

Time:11-23

Describe the problem, first,
These two days when parsing a PDF document, need to extract a text message in the document, written before the code is the company's predecessor one method, the code is as follows:


X1, y1, x2, y2 framed a range of four parameters extraction is the range of the article, in the document below:

But the actual extracted text is as follows:
To checked the baidu, baidu's basic are stil give priority to, I this text extraction has partly right, part of the error, it may not seem like online said character encoding problem, please show a labyrinth bosses help

CodePudding user response:

No one encountered this problem,,,,

CodePudding user response:

With the traditional, that is, coding issues

CodePudding user response:

refer to the second floor assky124 response:
with the traditional, is the problem encoding
if it is a coding problem, why do some text, there is no problem, such as "bluetooth" two characters have a problem, 'ears' word is wrong, if it is what you have said this coding problem, in "PdfTextExtractor. GetTextFromPage" this method can make changes?

CodePudding user response:

PDF different text block may use a different font, you use PDF editing software to see,
Breaking point in Chou Chou,
Not just in the mature component, AsposePDF, IText

CodePudding user response:

In fact need not bother! Have you found your predecessors coding place another coding format

CodePudding user response:

reference 4 floor assky124 response:
PDF different text block may use a different font, you use PDF editing software to see,
Breaking point in Chou Chou,
Not just in the mature component, AsposePDF, IText like
yes have found that is the reason of the font, now looking for ways to see if I can turn the font
  •  Tags:  
  • C#
  • Related