Home > other >  About the crawl douban film [the top250] code for help
About the crawl douban film [the top250] code for help

Time:12-05

Small white, currently only learned a little bit of fur, for the final project,
The code is as follows:

The import requests, CSV, the random
The from bs4 import BeautifulSoup

Csv_file=open (' movie. CSV ', 'w', newline=', encoding='utf-8 - sig)
Writer.=the CSV writer (csv_file)
Writer. Writerow ([' movie name ', 'director', 'actors',' types', 'area', 'language', 'release date', 'running'])

The header={
'Host' : 'movie.douban.com',
'Origin' : 'movie.douban.com',
'the user-agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64. X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36 ',
}
Proxies={" HTTP: "' 163.204.240.175}


Def format_url (num) :
Urls=[]
Base_url='https://movie.douban.com/j/new_search_subjects? Sort=T& The range=0, 10 & amp; Tags E7=% % 94% B5 BD E5 % % % B1 & amp; Start={} '
For I in range (0, 20 * num, 20) :
Url=base_url. The format (I)
Urls. Append (url)
Return urls
Urls=format_url (500)



For the url in urls:
HTML=requests. Get (url, headers=headers, proxies=proxies, timeout=5)
Soup=BeautifulSoup (HTML text, 'LXML')


Don't know what to do here
The print (soup)
It is the result of the film, the name of the ratings, director, actor, and the film's specific douban the url of the page,
But what I want is to enter the url,
To get more information, including film 'types',' area ', 'language', 'release date', 'running'
But I don't know what to do here

There is always fail to write to a CSV file, why can't write in??

Hope to be able to help!

CodePudding user response:

By my own

CodePudding user response:

 no one I

CodePudding user response:

Scrapy frame area, requests will cry

CodePudding user response:

To define a method, to pick up the specific url of the page and pass in line, the new method takes the url, again send request to this url, obtain response content and parsing
  • Related