OpenVINO scenario text detection and recognition-CodePudding

OpenVINO provide scene text detection model accuracy is very high, can fully meet practical level, actually OpenVINO also provides another scene model of character recognition, the overall use down the feeling is not scenario text detection, and support only English letters and Numbers to identify, does not support Chinese, have to say it's a little regret, but for a cleaner document image, its accuracy is quite high, faster, basically in milliseconds basic out results,

The model introduced
Text recognition (OCR) model based on the network architecture of network + two-way LSTM, one of the basic network selection is VGG16, character recognition is case sensitive, 26 letters + 10 Numbers a total of 36 characters, its network structure similar to the following:

Model input structure is:

 [BxCxHxW]=1 x1x32x120

The B said batch, said channel C, H, W said width
Model output is:

 [WxBxL]=30 x1x37

The B said batch, W represents the output sequence length, L said all their score 37 characters, with a 37 is #
Output part analysis based on CTC greedy decoding way,
code

Load model

 # loading IRThe log. The info (" Reading IR... 
 ")Net=IENetwork (model=model_xml, weights=model_bin) 
Text_net=IENetwork (model=text_xml, weights=text_bin)

Scene text detection

 # image=cv2. Imread (" D:/images/openvino_ocr. PNG "); 
Image=cv2. Imread (" D:/images/cover_01. JPG "); 
Cv2. Imshow (" image ", image) 
Inf_start=time. Time () 
In_frame=cv2. Resize (image, (w, h)) 
In_frame=in_frame. Transpose ((2, 0, 1)) # Change data layout from HWC to CHW 
In_frame=in_frame. Reshape ((n, c, h, w)) 
Exec_net. Infer (inputs)={input_blob: in_frame}

ROI interception and character recognition

 x, y, width, height=cv2. BoundingRect (contours [c]) 
ROI=image [y - 5: y + height + 10, x - 5: width x + + 10, :) 
Gray=cv2. CvtColor (ROI, cv2 COLOR_BGR2GRAY) 
Text_roi=cv2. Resize (gray, (tw, th)) 
Text_roi=np. Expand_dims (text_roi, 2) 
Text_roi=text_roi. Transpose ((2, 0, 1)) 
Text_roi=text_roi. Reshape ((tn, tc, th, tw)) 
Text_exec_net. Infer (inputs={input_blob: text_roi}) 
Text_out=text_exec_net. Requests [0]. Outputs [text_out_blob]

CTC analytical results

 # parse the output text 
Ocrstr="" 
Prev_pad=False; 
For I in range (text_out shape [0]) : 
CTC=text_out [I] 
CTC=np. Squeeze (CTC, 0) 
The index, prob=ctc_soft_max (CTC) 
If alphabet [index]=='#' : 
Prev_pad=True 
The else: 
If len (ocrstr)==0 or prev_pad or (len (ocrstr) & gt; 0 and alphabet [index]!=ocrstr [1]) : 
Prev_pad=False 
Ocrstr +=alphabet [index]

The output text detection and recognition results

 
 # show identification resultsPrint (" result: % s "% ocrstr) 
Cv2. DrawContours (image, [box], 0, (0, 255, 0), (2) 
Cv2. PutText (image, ocrstr, (x, y), cv2. FONT_HERSHEY_COMPLEX, 0.75, (255, 0, 0), (1)

Finally on the demo code

 def demo () : 
# loading MKLDNN - CPU Target 
The basicConfig (format="(levelname) [% s] % s" (the message), level=the INFO, stream=sys. Stdout) 
The plugin=IEPlugin (device="CPU", plugin_dirs=plugin_dir) 
The plugin. Add_cpu_extension (cpu_extension) 

# loading IR 
The log. The info (" Reading IR... 
 ")Net=IENetwork (model=model_xml, weights=model_bin) 
Text_net=IENetwork (model=text_xml, weights=text_bin) 

If the plugin. The device=="CPU" : 
Supported_layers=plugin. Get_supported_layers (net) 
Not_supported_layers=[l for l in.net. The layers. The keys () if not in supported_layers] l 
If len (not_supported_layers)!=0: 
The log. The error (" Following the layers are not supported by the plugin for specified device: {} \ n {} ". 
Format (plugin. The device, ', '. Join (not_supported_layers))) 
The log. The error (" both Please try to specify the CPU extensions library path in the demo 's command line parameters using the -l 
""The or - cpu_extension command line argument") 
Sys. Exit (1) 

# for input and output layer 
Input_blob=next (iter (net. Inputs)) 
Outputs=iter (net. Outputs) 

# take multiple output layer name 
Out_blob=next (outputs) 
Second_blob=next (outputs) 
The info (" Loading the IR to the plugin... 
 ")Print (" pixel output: % s, link the output: % s \ n "% (out_blob second_blob)) 

Text_input_blob=next (iter (text_net. Inputs)) 
Text_out_blob=next (iter (text_net outputs)) 
Print (" text_out_blob: % s "% text_out_blob) 

# to create an executable network 
Exec_net=plugin. The load (network=net) 
Text_exec_net=plugin. The load (network=text_net) 

# Read and pre - process the input image 
N, c, h, w=net. Inputs [input_blob]. Shape 
Tn, tc, th, tw=text_net inputs [text_input_blob]. Shape 
Del.net 
Del text_net 

The info (" Starting inference in async mode... 
 ")The info (" To switch between the sync and async modes, press Tab button ") 
The info (" To stop the demo execution press Esc button ") 

Image=cv2. Imread (" D:/images/openvino_ocr. PNG "); 
# image=cv2. Imread (" D:/images/cover_01. JPG "); 
Cv2. Imshow (" image ", image) 
Inf_start=time. Time () 
In_frame=cv2. Resize (image, (w, h)) 
In_frame=in_frame. Transpose ((2, 0, 1)) # Change data layout from HWC to CHW 
In_frame=in_frame. Reshape ((n, c, h, w)) 
Exec_net. Infer (inputs={input_blob: in_frame}) 
Inf_end=time. Time () 
Det_time=inf_end - inf_start 

# get output 
Res1=exec_net. Requests [0]. Outputs [out_blob] 
Res2=exec_net. Requests [0]. Outputs [second_blob] 

# dimension reduction 
nullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnull