Home > Software design >  Python BS4 Sort HTML in alphabetical order?
Python BS4 Sort HTML in alphabetical order?

Time:10-07

How to sort the html code to be in alphabetical order?

from bs4 import BeautifulSoup

html = '''[<div>Red </div>, <div>Green </div>, <div>Orange </div>, <div>Blue </div>]'''

soup = BeautifulSoup(html, 'html.parser')
paints = soup.findAll("div")

print(str(paints).strip('[]'))

Output:

<div>Red </div>, <div>Green </div>, <div>Orange </div>, <div>Blue </div>

Wanted Output:

<div>Blue </div>, <div>Green </div>, <div>Red </div>, <div>Orange </div>

CodePudding user response:

Just add sorted function in your code and use key for sorting as text from the tag using get_text() method and it will return data as per required.

sorted(paints, key=lambda x: x.get_text())

Output:

[<div>Blue </div>, <div>Green </div>, <div>Orange </div>, <div>Red </div>]

CodePudding user response:

Your expected output is not alphabetical - Anyway try to iterate the alphabet and check if your elements text startswith the character.

Example

from bs4 import BeautifulSoup
import string

html = '''[<div>Red </div>, <div>Green </div>, <div>Orange </div>, <div>Blue </div>]'''

sorted_html_tags =[]
soup = BeautifulSoup(html, 'html.parser')

for c in string.ascii_lowercase:
    for e in soup.find_all("div"):
        if e.text.lower().startswith(c):
            sorted_html_tags.append(e)

sorted_html_tags

Output

[<div>Blue </div>, <div>Green </div>, <div>Orange </div>, <div>Red </div>]
  • Related