Home > database >  BeautifulSoup can not find "h3;" tags
BeautifulSoup can not find "h3;" tags

Time:06-29

The URL in this question is : enter image description here

CodePudding user response:

All of the information is inside the HTML that is returned inside a <script> tag containing JSON data.

It is then usually converted into HTML by Javascript, but you can still extract it using BeautifulSoup to find the tag and then Python's JSON library to convert all the data into a Python structure.

For example:

import requests
from bs4 import BeautifulSoup
import json

req = requests.get("https://www.empireonline.com/movies/features/best-movies-2/")
soup = BeautifulSoup(req.content, "html.parser")
script = soup.find("script", type="application/json")
data = json.loads(script.string)

for film in data["props"]["pageProps"]["data"]["getArticleByFurl"]["_layout"][7]["content"]["images"]:
    print(film["titleText"])
    print(film["description"])
    print("-------------")

The hard part is finding the information you want inside the data structure. I suggest you print data and have a closer look.

This would give you output starting:

100) Reservoir Dogs
**1992**<br>[Quentin Tarantino](https://www.empireonline.com/people/quentin-tarantino/)'s terrific twist on the heist-gone-wrong thriller ricochets the zing and fizz of its dialogue around a gloriously intense single setting (for the most part) and centres the majority of its action around one long and incredibly bloody death scene. Oh, and by the way: Nice Guy Eddie was shot by Mr. White. Who fired twice. Case closed.<br>[Read Empire's review of Reservoir Dogs](https://www.empireonline.com/movies/reviews/empire-essay-reservoir-dogs-review/)<br>
-------------
99) Groundhog Day
**1993**<br>[Bill Murray](https://www.empireonline.com/people/bill-murray/) at the height of his loveable (eventually) schmuck powers. [Andie McDowell](https://www.empireonline.com/people/andie-macdowell/) bringing the brains and the heart. And [Harold Ramis](https://www.empireonline.com/people/harold-ramis/) (directing and co-writing with Danny Rubin) managing to find gold in the story of a man trapped in a time loop. It might not have been the first to tap this particular trope, but it's head and shoulders above the rest. Murray's snarktastic delivery makes the early going easy to laugh at, but as the movie finds deeper things to say about existence and morals, it never feels like a polemic.<br>[Read Empire's review of Groundhog Day](https://www.empireonline.com/movies/reviews/groundhog-day-review/)<br>
-------------
98) Paddington 2
**2017**<br>When the first *Paddington* was on the way, early trailers didn't look entirely promising. Yet co-writer/director [Paul King](https://www.empireonline.com/people/paul-king/) delivered a truly wonderful film bursting with joy, imagination, kindness and just one or two hard stares. How was he going to follow that? Turns out, with more of the same, but also plenty of fresh pleasures. Paddington (bouncily voiced by [Ben Whishaw](https://www.empireonline.com/people/ben-whishaw/)) matches wits with washed-up actor Phoenix Buchanan ([Hugh Grant](https://www.empireonline.com/people/hugh-grant/), chewing scenery like fine steak), being framed for theft and getting sent to prison. Like all great sequels, it works superbly as a double bill with the original.<br>[Read Empire's review of Paddington 2](https://www.empireonline.com/movies/reviews/paddington-2-review/)<br>
-------------

CodePudding user response:

You can't statically scrape that website because some of it is rendered dynamically, that is, some of its contents (including the h3 tags) are available only after your browser executes JavaScript code. This is common in sites that use modern web frameworks, like React (which is the case here).

To solve this, you should use a scraping tool that is capable of running a site's scripts, like selenium.

  • Related