Home > database >  Tesseract OCR read text from photo
Tesseract OCR read text from photo

Time:06-22

I have more than 5000 of images like below. Text in the images have 10 variants like T122 R-2 T123 R-12 T45 R-1 etc.

enter image description here

I want to read text in the image but result is like below.

, | 5 ğ ve . > | >
İmes <ğ
de 3
U | | i
( ' g
i : .
İM ; »
| ? > e
* >
| NN | va
, İ
( di
(Ny ,
i
>
i |
, > N i | Ni

How can I improve OCR accuracy ?

I have tried various of filters but result is almost same.

I am familiar with tesseract so it's better for me to use it but If there is better OCR I can try it as well.

P.S. I know Google Vision has better results but I couldn't find a way to automate it.

CodePudding user response:

OCR-Tesseract have many corner points which gives inappropriate results. One of these is rotation. Documentation says:

The quality of Tesseract’s line segmentation reduces significantly if a page is too skewed, which severely impacts the quality of the OCR.

Before come to the others, the biggest problem is rotation in your case. So you should figure it out firstly.

Second is noise which gives different kind of letters as result. To improve it you can check the enter image description here

so using CLI deskew enter image description here

  • Related