Home > other >  OCR Tesseract recognition to the empty page
OCR Tesseract recognition to the empty page

Time:12-31

[b] I am a Chinese character in only one font, according to each Chinese characters corresponding to the generation of a picture, and then put the BOX generated by the multiple characters, TR file merging, generate a big word stock,
Now the problem is for a single Chinese character generated images generated by this CMD command box file "tesseract chi_sim. The song typeface. The. JPG chi_sim. The song typeface. - l chi_sim batch. Nochop makebox" will be submitted to the empty page this


, don't know is what reason, to find a lot of information on the Internet, also didn't find the specific reason, see the box after the next generation after the contents of the file, is actually "word" high to width X Y this content
, so in the generated box file, manually create a box type of document and then to write the content, the problem is cleared, to one step behind the important operation, use the CMD "tesseract chi_sim. The song typeface.. Tif chi_sim. The song typeface. Ground nobatch box. The" train ", is to generate the tr file, this time also need to use Chinese characters to generate images, the problem back to before, to generate the tr, will be submitted to the empty page,

Sometimes generate tr report this

Then I want to use the method of box before hand to write, but the inside of the tr content don't understand, can't write manually, before also didn't make OCR, is stuck in here, do you have any brothers met this kind of problem, I always doubt that generated the problem of the picture, because a single Chinese character images generated only 1-2 KB, don't understand, ah genuflect is begged

CodePudding user response:

Your sample pictures to white background and black text

CodePudding user response:

The building Lord, come out?? RMB this character problem like you, I want to train alone, always empty page

CodePudding user response:

Leave! Not white background and black text, even didn't solve to dry, single letter anyway, single number can't identify,

CodePudding user response:

Has been solved,
Generated box with command: tesseract xx. Xx -- PSM 10 batch. Tif nochop makebox, key to see the PSM (parameters)
With Java code recognition: the instance. SetPageSegMode (TessPageSegMode. PSM_SINGLE_CHAR); (the key to see the source TessPageSegMode this excuse)
Hope to be able to help people to see later

CodePudding user response:

Why did I quote read_params_file: Can 't open 10, cry
  • Related