I have extracted text from an image using Easyocr and I found so many spelling mistakes in the list of words. In that, I need to separate and find the Number of meaning full and Non-meaning full words or spell mistaken words.
I have this:
example = ["kaaggl","woryse","good","hey","otherwise","orrsy","taken","sometimes"]
I need like this:
meaning_full_words = ["good","hey","otherwise","taken","sometimes"]
Non-meaning_full_words = ["kaaggl","woryse","orrsy"]
please, help me if is there any possible way to do it I have a huge dataset.
CodePudding user response:
You want to iterate through the list of words and check each one against the English dictionary. A library such as PyEnchant has the functionality you need.
CodePudding user response:
You want to check words if there are meaning full or complete, then you can use this module:
Some example's:
- To check for mistake
import language_check
tool = language_check.LanguageTool('en-US')
text = u'A sentence with a error in the Hitchhiker’s Guide tot he Galaxy'
matches = tool.check(text)
- To correct it:
language_check.correct(text, matches)
You can use a for loop to iterate between list and to sort out correct words and wrong words.
Alternatively you can compete the words in a dictionary. This link would be helpful