Home > Mobile >  Web scraping IMDB with Python's Beautiful Soup
Web scraping IMDB with Python's Beautiful Soup

Time:08-18

I am trying to parse this page "https://www.imdb.com/title/tt0068112/?ref_=fn_al_tt_1", but I can't find the href that I need (href="/title/tt0068112/episodes?ref_=tt_eps_sm").

I tried with this code:

url="https://www.imdb.com/title/tt0068112/?ref_=fn_al_tt_1"
page(requests.get(url)
soup=BeautifulSoup(page.content,"html.parser")
for a in soup.find_all('a'):
    print(a['href'])

What's wrong with this? I also tried to check "manually" with print(soup.prettify()) but it seems that that link is hidden or something like that.

CodePudding user response:

To get the link with Episodes you can use next example:

import requests
from bs4 import BeautifulSoup

url = "https://www.imdb.com/title/tt0068112/?ref_=fn_al_tt_1"

soup = BeautifulSoup(requests.get(url).content, "html.parser")
print(soup.select_one("a:-soup-contains(Episodes)")["href"])

Prints:

/title/tt0068112/episodes?ref_=tt_eps_sm

CodePudding user response:

you can get the page html with requests, the href item is in there, no need for special apis. i tried this and it worked:

import requests
from bs4 import BeautifulSoup

page = requests.get("https://www.imdb.com/title/tt0068112/?ref_=fn_al_tt_1")
soup = BeautifulSoup(page.content, "html.parser")

scooby_link = ""
for item in soup.findAll("a", href="/title/tt0068112/episodes?ref_=tt_eps_sm"):
    print(item["href"])
    scooby_link = "https://www.imdb.com"   "/title/tt0068112/episodes?ref_=tt_eps_sm"

print(scooby_link)

i'm assuming you also wanted to save the link to a variable for further scraping so i did that as well.

  • Related