The code is as follows:
The import requests, CSV, the random
The from bs4 import BeautifulSoup
Csv_file=open (' movie. CSV ', 'w', newline=', encoding='utf-8 - sig)
Writer.=the CSV writer (csv_file)
Writer. Writerow ([' movie name ', 'director', 'actors',' types', 'area', 'language', 'release date', 'running'])
The header={
'Host' : 'movie.douban.com',
'Origin' : 'movie.douban.com',
'the user-agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64. X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36 ',
}
Proxies={" HTTP: "' 163.204.240.175}
Def format_url (num) :
Urls=[]
Base_url='https://movie.douban.com/j/new_search_subjects? Sort=T& The range=0, 10 & amp; Tags E7=% % 94% B5 BD E5 % % % B1 & amp; Start={} '
For I in range (0, 20 * num, 20) :
Url=base_url. The format (I)
Urls. Append (url)
Return urls
Urls=format_url (500)
For the url in urls:
HTML=requests. Get (url, headers=headers, proxies=proxies, timeout=5)
Soup=BeautifulSoup (HTML text, 'LXML')
Don't know what to do here
The print (soup)
It is the result of the film, the name of the ratings, director, actor, and the film's specific douban the url of the page,
But what I want is to enter the url,
To get more information, including film 'types',' area ', 'language', 'release date', 'running'
But I don't know what to do here
There is always fail to write to a CSV file, why can't write in??
Hope to be able to help!
CodePudding user response:
By my ownCodePudding user response:
no one ICodePudding user response:
Scrapy frame area, requests will cryCodePudding user response:
To define a method, to pick up the specific url of the page and pass in line, the new method takes the url, again send request to this url, obtain response content and parsing