i was trying to parse a image link from a website.
When i inspect the link on the website, it is this one :https://static.nike.com/a/images/c_limit,w_592,f_auto/t_product_v1/df7c2668-f714-4ced-9f8f-1f0024f945a9/chaussure-de-basketball-zoom-freak-3-MZpJZF.png but when i parse it with my code the output is data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
.
from bs4 import BeautifulSoup
import requests
source = requests.get('https://www.nike.com/fr/w/hommes-chaussures-nik1zy7ok').text
soup = BeautifulSoup(source, 'lxml')
pair = soup.find('div', class_='product-card__body')
image_scr = pair.find('img', class_='css-1fxh5tw product-card__hero-image')['src']
print(image_scr)
I think the code isn't the issue but i don't know what's causing the link to come out in base64 format. So how could i set the code to render the link as .png ?
CodePudding user response:
As you want to grab src meaning image data, so downloading data from server using requests, you need to use .content
format as follows:
source = requests.get('https://www.nike.com/fr/w/hommes-chaussures-nik1zy7ok').content
CodePudding user response:
What happens?
First at all, take a look into your soup
- There is the truth. Website provides not all information static, there are a lot things provided dynamically and also done by the browser -> So requests
wont get this info this way.
Workaround
Take a look at the <noscript>
next to your selection, it holds a smaller version of the image and is providing the src
Example
from bs4 import BeautifulSoup
import requests
source = requests.get('https://www.nike.com/fr/w/hommes-chaussures-nik1zy7ok').content
soup = BeautifulSoup(source, 'lxml')
pair = soup.find('div', class_='product-card__body')
image_scr = pair.select_one('noscript img.css-1fxh5tw.product-card__hero-image')['src']
print(image_scr)