One, the diagram below, crawl the web page source of content have the code and link, how to delete or replace space
The first is the source code, the second is to crawl to the content, the third is to write code that is to extract content in Chinese only, other don't, what to do?
CodePudding user response:
You this extraction of what ah web printing again, please resolve never toCodePudding user response:
It is good to use regular expressions to extract theIf you only want Chinese
Specific you can learn to learn regular expressions
CodePudding user response:
Can use pyquery, then traverse the output text form of the corresponding node, is the content