Separating non meaning full words and meaning full words from list of words usingpython-CodePudding

I have extracted text from an image using Easyocr and I found so many spelling mistakes in the list of words. In that, I need to separate and find the Number of meaning full and Non-meaning full words or spell mistaken words.

I have this:

example = ["kaaggl","woryse","good","hey","otherwise","orrsy","taken","sometimes"]

I need like this:

meaning_full_words = ["good","hey","otherwise","taken","sometimes"]

Non-meaning_full_words = ["kaaggl","woryse","orrsy"]

please, help me if is there any possible way to do it I have a huge dataset.

CodePudding user response：

You want to iterate through the list of words and check each one against the English dictionary. A library such as PyEnchant has the functionality you need.

CodePudding user response：

You want to check words if there are meaning full or complete, then you can use this module:

Some example's:

To check for mistake

import language_check
tool = language_check.LanguageTool('en-US')
text = u'A sentence with a error in the Hitchhiker’s Guide tot he Galaxy'
matches = tool.check(text)

To correct it:

language_check.correct(text, matches)

You can use a for loop to iterate between list and to sort out correct words and wrong words.

Alternatively you can compete the words in a dictionary. This link would be helpful