Home > Enterprise >  problem finding links in all li tag under ul tag
problem finding links in all li tag under ul tag

Time:06-17

I'm trying to get the links in all li tags under the ul tag

HTML code:

<div id="chapter-list"  style="">
<ul>
<li>
<a href="https://example.com/manga/name/2">
<div >
<span >
Chapter 2 </span>
</div>
</a>
</li>
<li>
<a href="https://example.com/manga/name/1">
<div >
<span >
Chapter 1 </span>
</div>
</a>
</li>
</ul>
</div>

The code I wrote:

from bs4 import BeautifulSoup
import requests

html_page = requests.get('https://example.com/manga/name/')

soup = BeautifulSoup(html_page.content, 'html.parser')
chapters = soup.find('div', {"id": "chapter-list"})

children = chapters.findChildren("ul" , recursive=False) # when printed, it gives the the whole ul content

for litag in children.find('li'):
    print(litag.find("a")["href"])

When I try to print the li tags links, it gives the following error:

Traceback (most recent call last):
  File "C:\0.py", line 12, in <module>
    for litag in children.find('li'):
  File "C:\Users\hs\AppData\Local\Programs\Python\Python310\lib\site-packages\bs4\element.py", line 2289, in __getattr__
    raise AttributeError(
AttributeError: ResultSet object has no attribute 'find'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?

CodePudding user response:

You can use find to find the ul in the chapter list. And then find_all to find the list items in the ul. Finally, use find_all again to find the links in each list item and print the URL. Details of these two methods can be found in find and find_all method documentation on bs4. You can use the get_text() after searching by the class chapternum on each link to get the link's text like Chapter 1. Searching by class be found in bs4 documentation for searching element by class

(Updated) Code:

from bs4 import BeautifulSoup

html_doc = """
<div id="chapter-list"  style="">
    <ul>
        <li>
            <a href="https://example.com/manga/name/2">
                <div >
<span >
Chapter 2 </span>
                </div>
            </a>
        </li>
        <li>
            <a href="https://example.com/manga/name/1">
                <div >
<span >
Chapter 1 </span>
                </div>
            </a>
        </li>
    </ul>
</div>
"""

soup = BeautifulSoup(html_doc, 'html.parser')
chapters = soup.find('div', {"id": "chapter-list"})

list_items = chapters.find('ul').find_all('li')

for list_item in list_items:
    for link in list_item.find_all('a'):
        title = link.find('span', class_='chapternum').get_text().strip()
        href = link.get("href")
        print(f"{title}: {href}")

Output:

Chapter 2: https://example.com/manga/name/2
Chapter 1: https://example.com/manga/name/1

References:

  • Related