Home > other >  Python BeautifulSoup HTML parse class select
Python BeautifulSoup HTML parse class select

Time:01-19

@bot.event
async def on_message(message):
    msg = message.content
    if message.content.startswith("https:"):
         response = requests.get(f"https://steamid.io/lookup/" f"{msg}")
         soup = BeautifulSoup(response.text, "html.parser")
         print(soup.prettify())

When I run the code, the codes of the site are written as html

  </div>
  <dl >
   <dt >
    steamID
   </dt>
   <dd >
    <img alt="copy to clipboard"  data-clipboard-text="STEAM_0:0:444916529" data-placement="bottom" data-toggle="tooltip" src="https://steamid.io/static/img/copy.png" title="copy to clipboard"/>
    <a href="https://steamid.io/lookup/STEAM_0:0:444916529" id="a" rel="nofollow">
     STEAM_0:0:444916529
    </a>
   </dd>
   <dt >
    steamID3
   </dt>
   <dd >
    <img alt="copy to clipboard"  data-clipboard-text="[U:1:889833058]" data-placement="bottom" data-toggle="tooltip" src="https://steamid.io/static/img/copy.png" title="copy to clipboard"/>
    <a href="https://steamid.io/lookup/[U:1:889833058]" rel="nofollow">
     [U:1:889833058]
    </a>
   </dd>
   <dt >
    steamID64
   </dt>
   <dd >
    <img alt="copy to clipboard"  data-clipboard-text="76561198850098786" data-placement="bottom" data-toggle="tooltip" src="https://steamid.io/static/img/copy.png" title="copy to clipboard"/>
    <a href="https://steamid.io/lookup/76561198850098786">
     76561198850098786
    </a>
   </dd>
   <dt >
    customURL
   </dt>
   <dd >
    not set
   </dd>
   <dt >
    profile state
   </dt>

I want to parse and select the following part from those codes

<dt >
    steamID64
   </dt>
   <dd >
    <img alt="copy to clipboard"  data-clipboard-text="76561198850098786" data-placement="bottom" data-toggle="tooltip" src="https://steamid.io/static/img/copy.png" title="copy to clipboard"/>
    <a href="https://steamid.io/lookup/76561198850098786">
     76561198850098786
    </a>

I want to parse and select the section that says "76561198850098786" from here, how can I do it?

CodePudding user response:

You can use the :-soup-contains to target the element with class key containing the term steamID64, then use an adjacent sibling combinator to move to the adjacent element with class value, then a child combinator to move to the a tag having the desired value

soup.select_one('.key:-soup-contains("steamID64")   .value > a').text

CodePudding user response:

You can find the last class key and then find_previous() a:

soup = BeautifulSoup(html, "html.parser")

print(
    soup.select_one(".key:last-of-type").find_previous("a").get_text(strip=True)
)

Output:

76561198850098786
  •  Tags:  
  • Related