Home > Back-end >  Can't get the src attribute of an img using BeautifulSoup4
Can't get the src attribute of an img using BeautifulSoup4

Time:01-09

I'm trying to scrape a website using BeautifulSoup. I'm trying to get the src attribute of an image but it just returns a completely different thing.

This is the img element: element

This is the code I'm using to scrape it (it returns other attributes perfectly fine so I'm sure I'm getting the right element):

pic = hrefs.a.div.div.span.img.get('src')

And the output of the pic variable is this:

data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7

CodePudding user response:

I tried to reproduce your example from the screenshot (not great to reproduce), did you try this?

from bs4 import BeautifulSoup

html = """
<div >
    <img alt="Air Jordan 1" src="https://cdn.myikas.com/images/blablabla"/>
</div>
"""

soup = BeautifulSoup(html, 'html.parser')
print(soup.find('img')['src'])

output

https://cdn.myikas.com/images/blablabla

CodePudding user response:

am using the following html document

<!DOCTYPE html>
<html>
  <head>
    <title>Title of the document</title>
  </head>
  <body>
    <div>
      <p>From wikipedia</p>
      <img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4
        //8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg==" alt="Red dot" />
    </div>
  </body>
</html>

the above is the html i am scraping

this is the image

<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4
        //8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg==" alt="Red dot" />

and this is the part we need to decode

iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4
        //8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg==

so you need to get the string after the last comma from your data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7

src = "data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7"
src.split(",")[-1] # This will get the last sequence of text after a comma
# OUTPUT : R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7

Ok so now you have the string ( am not going to use your string because when i tried it i didnt see an image, i will use the string in the html code above ).

This is how you go about decoding it now in python 3.10.4

import base64
image_base64_string = """iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4
        //8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg=="""

image_decoded = base64.b64decode( image_base64_string)

# Now its time to save the image
myfile = open("mygif.gif", "wb ")
myfile.write(image_decoded)
myfile.close()
# You should be able to see the file and open it
  • Related