Home > other >  For years, solve the crawler site change
For years, solve the crawler site change

Time:09-18

Each experts:

I am novice, url change problems, have a little no clue, url is as follows:

http://www. * * * *. Cn/newepaper/PC/layout/202005/13/node_001. The HTML

This is a newspaper electronic web site, the site only 202005/13//node_001 HTML this part is the change,

/202005/13/that's the part date, but not inside each month holiday newspaper, so do not include,

/node_001. Is this part of the HTML pages, change law is behind the date have node_001 - $008 a day, but there is no rule 1-8, sometimes it is 1-4, 1-8 - sometimes, sometimes lack a,

So how can I use python is simple to set up a regular "URL", through out a month in all of the content of the page?


Please answer ~ ~ ~ ~

CodePudding user response:

Learn the crawler, also don't understand this?

In addition, since this is the crawler, don't need to understand this, it is get real url from other places, such as an index page
  • Related