Home > other >  Put elder help me have a look at how to use the content of the crawl out unwanted parts.
Put elder help me have a look at how to use the content of the crawl out unwanted parts.

Time:11-26

The crawler has a small white self-study places do not understand, baidu search less than for counsel, also

One, the diagram below, crawl the web page source of content have the code and link, how to delete or replace space


The first is the source code, the second is to crawl to the content, the third is to write code that is to extract content in Chinese only, other don't, what to do?

CodePudding user response:

You this extraction of what ah web printing again, please resolve never to

CodePudding user response:

It is good to use regular expressions to extract the
If you only want Chinese
Specific you can learn to learn regular expressions

CodePudding user response:

Can use pyquery, then traverse the output text form of the corresponding node, is the content
  • Related