Tokenization (remove the punctuation, number, etc.)
Remove the keywords (int, public, while, the if, for... )
Stop word delete (remove the commonly used, a, it, on)
Word segmentation (NextData - & gt; Next Data)
The acronym expansion (after - & gt; Afterbute)
Pruning (remove a higher probability of word)
CodePudding user response:
what meaning to kneadCodePudding user response:
In word processing,The text of the continuous do participles, cut into small units,
And then identify the attribute of these units (Spaces, key words, punctuation,... )
These units can be changed as required at this time,
Finally, in accordance with the requirements to each unit to fasten the tent together as a continuous text files,
CodePudding user response:
Only is the legendary lexical analysis?CodePudding user response:
Do the compiler!