I am using the python requests package to scrape a webpage. This is the code:
import requests
from bs4 import BeautifulSoup
# Configure Settings
url = "https://mangaabyss.com/read/"
comic = "the-god-of-pro-wrestling"
# Run Scraper
page = requests.get(url comic "/")
soup = BeautifulSoup(page.content, 'html.parser')
print(soup.prettify())
The url it uses is "https://mangaabyss.com/read/the-god-of-pro-wrestling/" But in the output of soup, I only get the first div and no other child elements that are inside it. This is the output I get:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8"/>
<link href="/favicon.ico" rel="icon"/>
<meta content="width=device-width,initial-scale=1,minimum-scale=1,maximum-scale=1,viewport-fit=cover" name="viewport"/>
<meta content="#250339" name="theme-color"/>
<title>
MANGA ABYSS
</title>
<script crossorigin="" src="/assets/index.f4dc01fb.js" type="module">
</script>
<link href="/assets/index.9b4eb8b4.css" rel="stylesheet"/>
</head>
<body>
<div id="manga-mobile-app">
</div>
</body>
</html>
The content that I want to scrape is way deep inside that div I am looking to extract the number of chapters. This is the selector for it:
#manga-mobile-app > div > div.comic-info-component > div.page-normal.with-margin > div.comic-deatil-box.tab-content.a-move-in-right > div.comic-episodes > div.episode-header.f-clear > div.f-left > span
Can anyone help me where I'm going wrong?
CodePudding user response:
The data is loaded from external URL so beautifulsoup
doesn't see it. You can use requests
module to simulate this call:
import json
import requests
slug = "the-god-of-pro-wrestling"
url = "https://mangaabyss.com/circinus/Manga.Abyss.v1/ComicDetail?slug="
data = requests.get(url slug).json()
# uncomment to print all data:
# print(json.dumps(data, indent=4))
for ch in data["data"]["chapters"]:
print(
ch["chapter_name"],
"https://mangaabyss.com/read/{}/{}".format(slug, ch["chapter_slug"]),
)
Prints:
...
Chapter 4 https://mangaabyss.com/read/the-god-of-pro-wrestling/chapter-4
Chapter 3 https://mangaabyss.com/read/the-god-of-pro-wrestling/chapter-3
Chapter 2 https://mangaabyss.com/read/the-god-of-pro-wrestling/chapter-2
Chapter 1 https://mangaabyss.com/read/the-god-of-pro-wrestling/chapter-1