Home > Software engineering >  Unable to scrape the "view details "button links as a list for the page "https://www.
Unable to scrape the "view details "button links as a list for the page "https://www.

Time:10-29

I am unable to scrap the "view details "button links as a list for the page "https://www.bmstores.co.uk/stores?location=KA8 9BF"..I have tried in both beautifulsoup and selenium in multiple ways.In terms of selenium i used, find element methods using x path and css selector class name but nothing worked.while using selenium got the pop up issue for the site but however it resolved using pop up blockers.

Searched in various sites but got the same beautifulsoup python codes but unable to complete the task. My code is here---when i run i get the 2 repeat errors

1.ElementNotInteractableException: element not interactable 2.NoSuchElementException: Message: no such element: Unable to locate element

My code is here--

from bs4 import BeautifulSoup
import requests
import pandas as pd
from selenium import webdriver as wd
import time
from selenium.common.exceptions import WebDriverException

local_path_of_chrome_driver = "E:\\chromedriver.exe"
driver = wd.Chrome(executable_path=local_path_of_chrome_driver)
driver.maximize_window()

data_links=[]

xpaths = 

["/html/body/div[9]/div/div/div/div/ul/li[1]/div/div[2]/a[1]","/html/body/div[9]/div/div/div/div/ul/li[2]/div/div[2]/a[1]","/html/body/div[9]/div/div/div/div/ul/li[4]/div/div[2]/a[1]","/html/body/div[9]/div/div/div/div/ul/li[5]/div/div[2]/a[1]"]
for j in xpaths:
        try:
            
            driver.find_element_by_xpath(j).click()
            
            time.sleep(3)
        
            driver.switch_to_window(driver.window_handles[-1])
            data_links.append(driver.current_url)
            
            time.sleep(3)
            
            driver.back()
        except:
            pass
            
 driver.close()

Can someone help me out?

CodePudding user response:

To scrape the View Details button links as a list from the page https://www.bmstores.co.uk/stores?location=KA8 9BF you have to induce WebDriverWait and you can use the following Locator Strategies:

  • Code Block:

    view_details = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.LINK_TEXT, "View Details")))
    for i in view_details:
        print(i.get_attribute("href"))
    
  • Console output:

    https://www.bmstores.co.uk/stores/ayr-heathfield-retail-park-90
    https://www.bmstores.co.uk/stores/prestwick-113
    https://www.bmstores.co.uk/stores/irvine-307
    https://www.bmstores.co.uk/stores/kilmarnock-310
    https://www.bmstores.co.uk/stores/stevenston-319
    https://www.bmstores.co.uk/stores/darnley-414
    https://www.bmstores.co.uk/stores/east-kilbride-304
    https://www.bmstores.co.uk/stores/paisley-linwood-423
    https://www.bmstores.co.uk/stores/linwood-hart-street-33
    https://www.bmstores.co.uk/stores/paisley-renfrew-road-428
    

CodePudding user response:

Here is an example as working solution to get data from each listing page.

Code:

from selenium import webdriver
import pandas as pd
import time
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup

driver = webdriver.Chrome()
driver.maximize_window()
time.sleep(8)

link_lists=[]
data = []
url='https://www.bmstores.co.uk/stores?location=KA8 9BF'
driver.get(url)

view_details = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.LINK_TEXT, "View Details")))
for i in view_details:
     link_lists.append(i.get_attribute("href"))

#print(link_lists)
#soup = BeautifulSoup(driver.page_source, 'lxml')
for link in link_lists:
    #print(link)
    driver = webdriver.Chrome()
    driver.maximize_window()
    time.sleep(8)
    url=link
    driver.get(url)
    soup = BeautifulSoup(driver.page_source, 'lxml')

    name = soup.select_one('span[itemprop="name"]').get_text()
    phone = soup.select_one('span[itemprop="telephone"]').get_text()
    #print(phone.get_text())

    data.append ([name,phone,url])

cols = ["Name", "Pnone Number","link"]

df = pd.DataFrame(data, columns= cols)
print(df)
#df.to_csv('info.csv',index = False)

driver.close()

Output:

      Name  ...                                               link
0  B&M AYR - HEATHFIELD RETAIL PARK  ...  https://www.bmstores.co.uk/stores/ayr-heathfie...       
1                     B&M PRESTWICK  ...    https://www.bmstores.co.uk/stores/prestwick-113       
2                        B&M IRVINE  ...       https://www.bmstores.co.uk/stores/irvine-307       
3                    B&M KILMARNOCK  ...   https://www.bmstores.co.uk/stores/kilmarnock-310       
4                    B&M STEVENSTON  ...   https://www.bmstores.co.uk/stores/stevenston-319       
5                       B&M DARNLEY  ...      https://www.bmstores.co.uk/stores/darnley-414       
6                 B&M EAST KILBRIDE  ...  https://www.bmstores.co.uk/stores/east-kilbrid...       
7             B&M PAISLEY - LINWOOD  ...  https://www.bmstores.co.uk/stores/paisley-linw...       
8         B&M LINWOOD - HART STREET  ...  https://www.bmstores.co.uk/stores/linwood-hart...       
9        B&M PAISLEY - RENFREW ROAD  ...  https://www.bmstores.co.uk/stores/paisley-renf...       

[10 rows x 3 columns]
  • Related