I'm trying to get the links in all li tags under the ul tag
HTML code:
<div id="chapter-list" style="">
<ul>
<li>
<a href="https://example.com/manga/name/2">
<div >
<span >
Chapter 2 </span>
</div>
</a>
</li>
<li>
<a href="https://example.com/manga/name/1">
<div >
<span >
Chapter 1 </span>
</div>
</a>
</li>
</ul>
</div>
The code I wrote:
from bs4 import BeautifulSoup
import requests
html_page = requests.get('https://example.com/manga/name/')
soup = BeautifulSoup(html_page.content, 'html.parser')
chapters = soup.find('div', {"id": "chapter-list"})
children = chapters.findChildren("ul" , recursive=False) # when printed, it gives the the whole ul content
for litag in children.find('li'):
print(litag.find("a")["href"])
When I try to print the li tags links, it gives the following error:
Traceback (most recent call last):
File "C:\0.py", line 12, in <module>
for litag in children.find('li'):
File "C:\Users\hs\AppData\Local\Programs\Python\Python310\lib\site-packages\bs4\element.py", line 2289, in __getattr__
raise AttributeError(
AttributeError: ResultSet object has no attribute 'find'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?
CodePudding user response:
You can use find
to find the ul
in the chapter list. And then find_all
to find the list items in the ul
. Finally, use find_all
again to find the links in each list item and print the URL. Details of these two methods can be found in find and find_all method documentation on bs4. You can use the get_text()
after searching by the class chapternum
on each link to get the link's text like Chapter 1
. Searching by class be found in bs4 documentation for searching element by class
(Updated) Code:
from bs4 import BeautifulSoup
html_doc = """
<div id="chapter-list" style="">
<ul>
<li>
<a href="https://example.com/manga/name/2">
<div >
<span >
Chapter 2 </span>
</div>
</a>
</li>
<li>
<a href="https://example.com/manga/name/1">
<div >
<span >
Chapter 1 </span>
</div>
</a>
</li>
</ul>
</div>
"""
soup = BeautifulSoup(html_doc, 'html.parser')
chapters = soup.find('div', {"id": "chapter-list"})
list_items = chapters.find('ul').find_all('li')
for list_item in list_items:
for link in list_item.find_all('a'):
title = link.find('span', class_='chapternum').get_text().strip()
href = link.get("href")
print(f"{title}: {href}")
Output:
Chapter 2: https://example.com/manga/name/2
Chapter 1: https://example.com/manga/name/1
References: