I am trying to scrape the top episode data from IMDB and extract the name of the show and the name of the episode. However I am facing an issue where the show name and episode name are both anchor tags which are under the same header. Screenshot of element
Here is the code:
url = "https://www.imdb.com/search/title/?title_type=tv_episode&num_votes=1000,&sort=user_rating,desc&ref_=adv_prv"
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
series_name = []
episode_name = []
episode_data = soup.findAll('div', attrs={'class': 'lister-item mode-advanced'})
for store in episode_data:
sName = store.h3.a.text
series_name.append(sName)
# eName = store.h3.a.text
# episode_name.append(eName)
Anyone know how to get through this problem?
CodePudding user response:
in the last part you should specify more
for store in episode_data:
h3=store.find('h3', attrs={'class': 'lister-item-header'})
sName =h3.findAll('a')[0].text
series_name.append(sName)
eName = h3.findAll('a')[1].text
episode_name.append(eName)
note that the name of 'attack of titan' has been changed to it's Japanese name!!, which is different than the html that has been shown in the browser and I don't know why!?!
CodePudding user response:
You can either use the find_all
and then call it by its index in the list. Or you could find the fisrt anchor tag, then use find_next
Farhang beat me to the find_all() solution. So heres the find_next
for store in episode_data:
h3=store.find('h3', attrs={'class': 'lister-item-header'})
sName =h3.find('a')[0].text
series_name.append(sName)
eName = h3.find('a').find_next('a').text
episode_name.append(eName)