Home > Software design >  get src link from img tag nested under div tag with BeautifulSoup4
get src link from img tag nested under div tag with BeautifulSoup4

Time:12-07

Goal: Extracting an image link from marked with a circle

It's nested as:

<div >
    <img src="https://cdn.brawlstats.com/ranked-ranks/ranked_ranks_l_10.png" >
    <div  style="color:#FFFFFF;font-size:18px;">
    </div><!----></div>

I want to get the "https://cdn.brawlstats.com/ranked-ranks/ranked_ranks_l_10.png" link stored in a variable.

Attempts:

Latest version of the code I tried:

async def league_rank(interaction: discord.Interaction, tag: str):
    url = "https://brawlstats.com/profile/"   tag.upper()
    soup = BeautifulSoup(requests.get(url).content, "html.parser")
    all_imgs = [img["src"] for img in soup.select(".mo25VS9slOfRz6jng3WTf img")]
    print(all_imgs)

It returns a blank response.

Reasons for creating the question: I tried several methods found on other Stackoverflow questions as well, none of which seem to work. How do I get the src of this image?

CodePudding user response:

You have to add valid headers, otherwise you'd get internal server error in the HTML.

For example, put the headers and use a CSS selector:

import requests
from bs4 import BeautifulSoup


headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:107.0) Gecko/20100101 Firefox/107.0",
    "Host": "brawlstats.com",
    "ReReferer": "https://brawlstats.com/profile/9J8LRGQU2",
}

url = "https://brawlstats.com/profile/RYJUGR8L"
image = (
    BeautifulSoup(requests.get(url, headers=headers).text, "lxml")
    .select_one("img[src*='ranked_ranks_l']")["src"]
)
print(image)

Output:

https://cdn.brawlstats.com/ranked-ranks/ranked_ranks_l_10.png
  • Related