Home > Software design >  Handwriting recognition algorithm that takes feedback?
Handwriting recognition algorithm that takes feedback?

Time:10-11

I'm developing an app that can help people that make sloppy handwriting convert their handwriting into text. I'll probably just use it for myself, but I might spread it around the Internet too.

I'm creating it with Electron. I need a JavaScript library (or a few libraries that will help me achieve the result) that can process a picture of someone's handwriting, then use that information for handwriting recognition.

I think I could have the user write out the alphabet one-by-one, each letter instructed to be written by the computer individually, and let the AI try to recognize the letters when they're writing a real sentence using image processing, although I don't know a good library for that either.

Furthermore, I want the user to be able to tell the computer if the computer got something wrong, and be able to tell it what they actually wrote out, so that the computer can decide if it needs to ask the user to write out more of their handwriting to further train it. - I didn't clarify it earlier, but this is one of the most important things I need to be able to accomplish.

As someone in the comments suggested, I could look into the technology that the Apple Newton had used, but I still think my idea might be interesting to experiment with.

I do not have any experience with AI, so if someone could give me any guidance on where to get started with this, I would be very grateful. Thanks! I don't want someone to do it for me - I just need some guidance on where to get started.

CodePudding user response:

You could try looking into TensorFlow.js: a ML library for both the browser and Node.js. There is a tutorial for handwritten digit recognition which could serve as a starting point. There is also a tutorial for OCR using TensorFlow Lite with relevant information and references.

It is probably most effective if you base your app on a pretrained model. This way, you don't need a huge dataset for training (i.e. you can limit the number of times the user would have to write out each character). In this regard, it might be handy to look into how to convert existing models to the TensorFlow.js format.

For the feedback part, you would have to store the input image along with the user's revised inference for retraining of the model.

CodePudding user response:

Amazon Textract is a machine learning (ML) cloud service that allows you to instantly "read" virtually any type of document, printed text, handwriting, numbers etc. using OCR technology and then extract this information (it can also handle forms & tables but that is out of scope).

It uses machine learning behind the scenes, and while I understand that "you want the user to be able to tell the computer if the computer got something wrong", Textract's machine learning models have been trained on millions of documents so that virtually any document type you upload should be automatically recognized and processed for text extraction. Amazon Textract is also always learning from new data, and Amazon is continually adding new features to the service. The deep-learning technology is proven, highly scalable, fast, easy to use via their APIs & used by Amazon themselves.

It returns a confidence score with each response, which then if the user flags it as wrong (should be extremely rare as it pretty much works on anything that you can decipher with the human eye), you can decide to let the user know & alert them to perhaps write a tad bit neater. Genuinely, it should work on pretty much anything and when it doesn't for critical sensitive data (think medical insurance claims, prescriptions, mortgage applications etc.), it integrates with Amazon A2I which allows you to tap into a pool of human reviewers to review the data based on the specified confidence score threshold.

As with 100% of AWS services, AWS offers a Javascript SDK which you can use as a Textract client to get up and running ASAP. You can either opt to store the objects in Amazon's cloud object storage, S3 or just store the handwriting images locally to not pay for S3 (even though it is extremely cheap).

The main action you want to carry out via the API call will be DetectDocumentText (API/JS SDK), which detects text in the input document provided & returns the information. We don't want to analyse the full document and identify key information etc. so this will suffice.

To answer the cost question, it is dependent on the region you use for setting up Amazon Textract but as of now, in the US West (Oregon) region, you pay only $0.0015 per page for the first one million pages & $0.0006 per page for over one million pages. As a bonus, new AWS accounts on the free tier can analyze up to 1,000 pages per month free for 3 months so you can test any prototypes before paying 15% of a cent.

With regards to the Electron app, it really differs but at the end of the day, you need some form of getting the handwriting from the user (image file) and sending that to the API in this case alongside some form of UI to display the results of the analysis.

If you really need a way to train your own models and don't trust AWS's accurate and cheap ML abstraction, I would say check out TensorFlow but be warned that bespoke machine learning is not an easy route to go down.

Just don't reinvent the wheel plus you have nothing to lose as you can test Textract for free for 3 months and check the accuracy on some demo data if you like before committing to extremely cheap costs (which will pay off in terms of how much time you save).

  • Related