Home > other >  ai-form-recognizer vs. cognitiveservices-computervision
ai-form-recognizer vs. cognitiveservices-computervision

Time:02-14

Currently using @azure/ai-form-recognizer 3.2.0 to OCR from images and PDF like:

const poller = await MsClient.beginRecognizeInvoices(stream, 
            {
                onProgress: (state) => {}
            });
const [ocrResult] = await poller.pollUntilDone();

What's the diff or relationship of @azure/cognitiveservices-computervision? I'm only interested in OCR.

CodePudding user response:

There are several key differences between the two. Form Recognizer's primary goal is to structure data from forms and other digitized documents for further processing. The key here is that Form Recognizer provides features that can help better contextualize the information that is read from said documents than just stand-alone optical character recognition. From the Form Recognizer documentation (emphasis mine):

Azure Form Recognizer is a cloud-based Azure Applied AI Service that uses machine-learning models to extract and analyze form fields, text, and tables from your documents. Form Recognizer analyzes your forms and documents, extracts text and data, maps field relationships as key-value pairs, and returns a structured JSON output. You quickly get accurate results that are tailored to your specific content without excessive manual intervention or extensive data science expertise. Use Form Recognizer to automate your data processing in applications and workflows, enhance data-driven strategies, and enrich document search capabilities.

On the other hand, Azure Computer Vision provides three distinct features. While the OCR tenet below describes something similar to Form Recognizer, it's more general-purpose in use in that it does not provide as robust contextualization of key/value pairs that Form Recognizer does. The service also provides higher-level AI functionality for processing images and video to identify people/celebrities, landmarks, and common objects in them (among others). From the Computer Vision documentation:

Service Description
Optical Character Recognition (OCR) The Optical Character Recognition (OCR) service extracts text from images. You can use the new Read API to extract printed and handwritten text from photos and documents. It uses deep-learning-based models and works with text on a variety of surfaces and backgrounds. These include business documents, invoices, receipts, posters, business cards, letters, and whiteboards. The OCR APIs support extracting printed text in several languages...
Image Analysis The Image Analysis service extracts many visual features from images, such as objects, faces, adult content, and auto-generated text descriptions. Follow the Image Analysis quickstart to get started.
Spatial Analysis The Spatial Analysis service analyzes the presence and movement of people on a video feed and produces events that other systems can respond to. Install the Spatial Analysis container to get started.

At first glance, there is some overlap between the two, but upon further inspection there are clear delineations for the primary use cases for the two.

  • Related