Web scraping IMDB with Python's Beautiful Soup-CodePudding

I am trying to parse this page "https://www.imdb.com/title/tt0068112/?ref_=fn_al_tt_1", but I can't find the href that I need (href="/title/tt0068112/episodes?ref_=tt_eps_sm").

I tried with this code:

url="https://www.imdb.com/title/tt0068112/?ref_=fn_al_tt_1"
page(requests.get(url)
soup=BeautifulSoup(page.content,"html.parser")
for a in soup.find_all('a'):
    print(a['href'])

What's wrong with this? I also tried to check "manually" with print(soup.prettify()) but it seems that that link is hidden or something like that.

CodePudding user response：

To get the link with Episodes you can use next example:

import requests
from bs4 import BeautifulSoup

url = "https://www.imdb.com/title/tt0068112/?ref_=fn_al_tt_1"

soup = BeautifulSoup(requests.get(url).content, "html.parser")
print(soup.select_one("a:-soup-contains(Episodes)")["href"])

Prints:

/title/tt0068112/episodes?ref_=tt_eps_sm

CodePudding user response：

you can get the page html with requests, the href item is in there, no need for special apis. i tried this and it worked:

import requests
from bs4 import BeautifulSoup

page = requests.get("https://www.imdb.com/title/tt0068112/?ref_=fn_al_tt_1")
soup = BeautifulSoup(page.content, "html.parser")

scooby_link = ""
for item in soup.findAll("a", href="/title/tt0068112/episodes?ref_=tt_eps_sm"):
    print(item["href"])
    scooby_link = "https://www.imdb.com"   "/title/tt0068112/episodes?ref_=tt_eps_sm"

print(scooby_link)

i'm assuming you also wanted to save the link to a variable for further scraping so i did that as well.