Home > Net >  Scraping multiple anchor tags which are under the same header/class
Scraping multiple anchor tags which are under the same header/class

Time:02-18

I am trying to scrape the top episode data from IMDB and extract the name of the show and the name of the episode. However I am facing an issue where the show name and episode name are both anchor tags which are under the same header. Screenshot of element

Here is the code:

url = "https://www.imdb.com/search/title/?title_type=tv_episode&num_votes=1000,&sort=user_rating,desc&ref_=adv_prv"
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

series_name = []
episode_name = []

episode_data = soup.findAll('div', attrs={'class': 'lister-item mode-advanced'})

for store in episode_data:
    sName = store.h3.a.text
    series_name.append(sName)
    # eName = store.h3.a.text
    # episode_name.append(eName)

Anyone know how to get through this problem?

CodePudding user response:

in the last part you should specify more

for store in episode_data:
    h3=store.find('h3', attrs={'class': 'lister-item-header'})
    sName =h3.findAll('a')[0].text
    series_name.append(sName)
    eName = h3.findAll('a')[1].text
    episode_name.append(eName)

note that the name of 'attack of titan' has been changed to it's Japanese name!!, which is different than the html that has been shown in the browser and I don't know why!?!

CodePudding user response:

You can either use the find_all and then call it by its index in the list. Or you could find the fisrt anchor tag, then use find_next

Farhang beat me to the find_all() solution. So heres the find_next

for store in episode_data:
    h3=store.find('h3', attrs={'class': 'lister-item-header'})
    sName =h3.find('a')[0].text
    series_name.append(sName)
    eName = h3.find('a').find_next('a').text
    episode_name.append(eName)
  • Related