First of all, I climb the information about, is what I want to know how to delete the before and after the English name//,
CodePudding user response:
Although you may the final assignment is over... But let me replyThe import requests
The from bs4 import BeautifulSoup
Def get_movies () :
Headers={' the user-agent ':' Mozilla/5.0 (Windows NT 10.0; Win64. X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36 '
, the 'Host' : 'movie.douban.com'}
Movie_list=[]
For I in range (0, 10) :
Url="https://movie.douban.com/top250? Start="+ STR (I * 25) + '& amp; The filter='
R=requests. Get (url, headers=headers, timeout=10)
Print (STR (I + 1), 'the response status code page, r.s tatus_code)
Soup=BeautifulSoup (r.t ext, 'LXML')
Div_list=soup. Find_all (' div 'class_=' hd ')
For each in div_list:
Movie=each. A.c ontents [3]. The text. The strip ()
Movie=movie [2:] # by the problem can be solved by this statement
Movie_list. Append (movie)
Return movie_list
Movies=get_movies ()
Print (movies)
CodePudding user response:
Or you can try:EnglishName=li. Xpath ('.//a/span [@ class='title'] '[2]/text)/2:
CodePudding user response: