I'm trying to scrape a website using BeautifulSoup. I'm trying to get the src attribute of an image but it just returns a completely different thing.
This is the img element: element
This is the code I'm using to scrape it (it returns other attributes perfectly fine so I'm sure I'm getting the right element):
pic = hrefs.a.div.div.span.img.get('src')
And the output of the pic variable is this:
data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
CodePudding user response:
I tried to reproduce your example from the screenshot (not great to reproduce), did you try this?
from bs4 import BeautifulSoup
html = """
<div >
<img alt="Air Jordan 1" src="https://cdn.myikas.com/images/blablabla"/>
</div>
"""
soup = BeautifulSoup(html, 'html.parser')
print(soup.find('img')['src'])
output
https://cdn.myikas.com/images/blablabla
CodePudding user response:
am using the following html document
<!DOCTYPE html>
<html>
<head>
<title>Title of the document</title>
</head>
<body>
<div>
<p>From wikipedia</p>
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4
//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg==" alt="Red dot" />
</div>
</body>
</html>
the above is the html i am scraping
this is the image
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4
//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg==" alt="Red dot" />
and this is the part we need to decode
iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4
//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg==
so you need to get the string after the last comma from your data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
src = "data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7"
src.split(",")[-1] # This will get the last sequence of text after a comma
# OUTPUT : R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
Ok so now you have the string ( am not going to use your string because when i tried it i didnt see an image, i will use the string in the html code above ).
This is how you go about decoding it now in python 3.10.4
import base64
image_base64_string = """iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4
//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg=="""
image_decoded = base64.b64decode( image_base64_string)
# Now its time to save the image
myfile = open("mygif.gif", "wb ")
myfile.write(image_decoded)
myfile.close()
# You should be able to see the file and open it