My issue is that i am scraping products from a website that loads products automaticcly when you scroll down and I did th srcaping for 24 itmes, So my qauestion is what code can i use it to loop all the products that i want in the following link, yet the link does not have a word which can indicate what page I am in
from bs4 import BeautifulSoup
import requests
import pandas as pd
from time import sleep
import urllib.parse
import urllib
import webbrowser
import json
import urllib.request
product_name = []
product_brand = []
product_price =[]
product_img = []
relative_url = []
website = 'https://en-saudi.ounass.com/women/beauty/fragrance'
response = requests.get(website)
soup = BeautifulSoup(response.content, 'html.parser')
results = soup.find_all('div', {'class':'Product-contents'})
for result in results :
#name
try:
product_name.append(result.find('div',{'class':'Product-name'}).get_text())
except:
product_name.append('n/a')
#brand
try:
product_brand.append ( result.find('div',{'class':'Product-brand'}).get_text())
except:
product_brand.append('n/a')
#price
try:
product_price.append ( result.find('span',{'class':'Product-minPrice'}).get_text())
except:
product_price.append('n/a')
#pics
try:
product_img.append (result.find('img',{'class':'Product-image'}).get('data-src'))
except:
product_img.append('n/a')
#relative_url
try:
relative_url.append (result.find('a',{'class':'Product-link'}).get('href'))
except:
relative_url.append('n/a')
CodePudding user response:
U just need to use the public API. There's a lot more information here that u ll need. And it also works much faster than selenium. Here is an example with the fields that were in ur question:
import requests
import pandas as pd
results = []
page = 0
while True:
url = f"https://en-saudi.ounass.com/api/women/beauty/fragrance?sortBy=popularity-asc&p={page}&facets=0"
hits = requests.get(url).json()['hits']
if hits:
page = 1
for hit in hits:
results.append({
'Name': hit['analytics']['name'],
'Brand': hit['analytics']['brand'],
'Price': hit['price'],
'Image': hit['_imageurl'],
'Link': f"https://en-saudi.ounass.com/{hit['slug']}.html"
})
else:
break
df = pd.DataFrame(results)
print(df)
OUTPUT:
Name ... Link
0 Cœur de Jardin Eau de Parfum, 100ml ... https://en-saudi.ounass.com/shop-miller-harris...
1 Patchouli Intense Eau de Parfum, 100ml ... https://en-saudi.ounass.com/shop-nicolai-parfu...
2 Blue Sapphire Eau de Parfum, 100ml ... https://en-saudi.ounass.com/shop-boadicea-the-...
3 Ambre Vanillé Eau de Toilette, 50ml ... https://en-saudi.ounass.com/shop-laura-mercier...
4 Baccarat Rouge 540 Scented Body Oil, 70ml ... https://en-saudi.ounass.com/shop-maison-franci...
... ... ... ...
2368 Olene Eau de Toilette, 100ml ... https://en-saudi.ounass.com/shop-diptyque-olen...
2369 Magnolia Nobile Leather Purse Spray, 20ml ... https://en-saudi.ounass.com/shop-acqua-di-parm...
2370 Eau du Soir Eau de Parfum, 100ml ... https://en-saudi.ounass.com/shop-sisley-eau-du...
2371 Yvresse Eau de Toilette, 80ml ... https://en-saudi.ounass.com/shop-ysl-beauty-yv...
2372 Lalibela Eau de Parfum, 75ml ... https://en-saudi.ounass.com/shop-memo-paris-la...
CodePudding user response:
You will need selenium to do this. Selenium opens a webpage (using drivers) and performs the actions you specify, like scrolling.
The code itself will depend on the website structure, but here are the main steps to get you started:
Download chrome or firefox driver
Import Selenium
Configure selenium to use the driver
Open the website
Find the element with scroll and uer arrow down key to scroll down.
Get the information you need from loaded products Use python sleep to make sure everything is loaded and scroll again as long as you need
# Import from selenium import webdriver from selenium.webdriver.common.keys import Keys # Open a driver (using Firefox in the example) which can download profile = webdriver.FirefoxProfile() profile.set_preference('intl.accept_languages', 'en-us') profile.update_preferences() driver = webdriver.Firefox(firefox_profile=profile, executable_path='executable_path') # Open the site driver.get('https://www.example.com/products') # Find the elmenet with the scroll and scroll using arrow down key (10 times) elem = driver.find_element_by_xpath('xpath_to_element_with_scroll') while i < 10: elem.send_keys(Keys.ARROW_DOWN) i # Here you will find the products and save them somewhere and do it all again if needed.