My code is not able to load next pages. Also when I refresh manually the webpage shows "Access Denied".
options = ChromeOptions()
options.add_argument("headless") # to hide window in 'background'
driver = Chrome(executable_path="C:/Users/samira.zade/AppData/Local/Programs/Python/Driver/chromedriver_win32/chromedriver.exe")
driver.get("https://www.connection.com/IPA/Shop/Product/Search?SearchType=1&term=tp-
link#1st Matches~12~List")# here change your link
driver.maximize_window()
time.sleep(5)
wait=WebDriverWait(driver,10)
pagenum = 10
data_connection = []
i = 0
for i in range(pagenum):
driver.get(f"https://www.connection.com/IPA/Shop/Product/Search?SearchType=1&term=tp-link#{i 1}~Best Matches~12~List")
time.sleep(5)
wait=WebDriverWait(driver,10)
CodePudding user response:
I usually try to avoid Selenium (or in other words use it as a last resort) to scrape a site. The source of the products data comes from https://www.connection.com/product/searchpage
. You can pass in the page query as a parameter.
I just used pandas
to parse the table to quickly show you. If you want to pull other/more from the page, you can do that with BeautifulSoup.
import pandas as pd
import requests
url = "https://www.connection.com/product/searchpage"
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
page = 1
continueLoop = True
result_df = pd.DataFrame()
while continueLoop == True:
payload = {
'SearchType': '1',
'term': 'tp-link',
'1st Matches~12~List': '',
'pageNumber': page,
'pageSize': '36',
'url': 'https://www.connection.com/IPA/Shop/Product/Search',
'mode': 'List'}
response = requests.get(url, headers=headers, params=payload)
if 'Pagination limit reached.' in response.text:
continueLoop = False
print('Pagination limit reached.')
continue
df = pd.read_html(response.text)[0]
result_df = result_df.append(df).reset_index(drop=True)
print(f'Collected page: {page}')
page =1
Output:
print(result_df)
Product Image ... Price
0 Compare Image Link ... $38.70 Qty: Add To Cart Add to Quicklist
1 Compare Image Link ... $55.42 Qty: Add To Cart Add to Quicklist
2 Compare Image Link ... $19.04 Qty: Add To Cart Add to Quicklist
3 Compare Image Link ... $52.03 Qty: Add To Cart Add to Quicklist
4 Compare Image Link ... $104.07 Qty: Add To Cart Add to Quicklist
.. ... ... ...
175 Compare Image Link ... $73.98 Qty: Add To Cart Add to Quicklist
176 Compare Image Link ... $99.77 Qty: Add To Cart Add to Quicklist
177 Compare Image Link ... $24.99 Qty: Add To Cart Add to Quicklist
178 Compare Image Link ... $24.99 Qty: Add To Cart Add to Quicklist
179 Compare Image Link ... $44.99 Qty: Add To Cart Add to Quicklist
[180 rows x 4 columns]
CodePudding user response:
You can make pagination following range and dot format method and this type of pagination a bit faster.Please just run the code whether it works..
import time
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
for page in range(1,11):
url ="https://www.connection.com/IPA/Shop/Product/Search?SearchType=1&term=tp- link#{page}~Best Matches~12~List".format(page=page)
print(url)
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.maximize_window()
driver.get(url)
time.sleep(5)
driver.close()
Output:
https://www.connection.com/IPA/Shop/Product/Search?SearchType=1&term=tp- link#1~Best Matches~12~List
https://www.connection.com/IPA/Shop/Product/Search?SearchType=1&term=tp- link#2~Best Matches~12~List
https://www.connection.com/IPA/Shop/Product/Search?SearchType=1&term=tp- link#3~Best Matches~12~List
https://www.connection.com/IPA/Shop/Product/Search?SearchType=1&term=tp- link#4~Best Matches~12~List
...10