I'm trying to print the information inside a tag. But I have an empty print.
There is the website: https://mubi.com/it/films/25-watts/cast?type=cast
I'm trying to print all actors name.
Here is my code:
import random
import requests
from bs4 import BeautifulSoup
url ='https://mubi.com/it/films/25-watts/cast?type=cast' #vincitori
def main():
response = requests.get(url)
html = response.text
soup1 = BeautifulSoup(html, 'html.parser')
cast = soup1.find_all('span', {'class' : 'css-1marmfu e1a7pc1u9'})
for tag in cast:
print(tag)
if __name__ == '__main__':
main()
Thank you for supporting ;)
CodePudding user response:
The data you see on the page is loaded from an external URL via JavaScript (so beautifulsoup
doesn't see it). You can use the requests
module to simulate the Ajax request:
import json
import requests
from bs4 import BeautifulSoup
url = "https://mubi.com/it/films/25-watts/cast?type=cast"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
data = json.loads(soup.select_one("#__NEXT_DATA__").contents[0])
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
film = data["props"]["initialProps"]["pageProps"]["film"]
cast_url = "https://api.mubi.com/v3/films/{}/cast_members?sort=relevance&type=cast&page=1"
cast = requests.get(
cast_url.format(film["id"]),
headers={"CLIENT": "web", "Client-Country": "US"},
).json()
# print(json.dumps(cast, indent=4))
for m in cast["cast_members"]:
print("{:<30} {:<30}".format(m["name"], m["primary_type"] or "-"))
Prints:
Daniel Hendler Actor
Jorge Temponi Actor
Alfonso Tort Actor
Valentín Rivero -
Federico Veiroj Director
Valeria Mendieta -
Roberto Suárez Actor
Gonzalo Eyherabide -
Robert Moré Actor
Ignacio Mendy -