I would like to parse addresses from the following website: https://filialen.migros.ch/de/center:46.8202,6.9575/zoom:8/
So far I am able to go to the website and click away any pop-ups. But then I need to select the drop-down menu with "1163 STANDORTE" which I am not able to locate with my code. My code so far:
import pandas as pd
import requests
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select
from bs4 import BeautifulSoup
import time
import itertools
import os
import numpy as np
import csv
import pdb
os.chdir("Directory")
options = webdriver.ChromeOptions()
options.add_argument("--incognito")
driver = webdriver.Chrome('Directory/chromedriver.exe')
driver.get("https://filialen.migros.ch/de/center:46.8202,6.9575/zoom:8/")
time.sleep(1)
try:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//*[@class='close-icon']"))).click() # if there is smth to click away
except:
pass
time.sleep(4)
Then my attempts using the span and button element and several options of navigation:
#Version 1
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//span[@class='sc-hKFxyN jdMjfs']"))).click()
#Version 2
element = driver.find_element_by_class_name('sc-eCApnc kiXUNl sc-jSFjdj lcZmPE')
driver.execute_script("arguments[0].scrollIntoView();", element)
driver.execute_script("arguments[0].click();", element)
# Version 3
element = driver.find_element_by_class_name('sc-eCApnc kiXUNl sc-jSFjdj lcZmPE')
driver.execute_script("arguments[0].click();", element)
#Version 4
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//*[@class='sc-eCApnc kiXUNl sc-jSFjdj lcZmPE']"))).click()
# Version 5
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "/html/body/div[2]/div/main/nav/header/button[1]"))).click()
# Version 6
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//span[text()='1163 STANDORTE']"))).click()
Actually, there are three problems:
- If I just open the link on my Chrome manually, "1163 STANDORTE" appears, whereas if I open the link on Chrome using python, fewer STANDORTE appear, but I cannot zoom out. So I crucially need ALL 1163 STANDORTE to appear.
- I cannot locate the button using class and XPATH.
- Behind the button is a probably linked XML file, and the information of the addresses only appears after having clicked on the button. In the end I want to scrape text, written on the XML file linked to that button.
Any suggestions?
My question is similar to these previous questions: How to parse several attributes of website with same class name in python? and to Selenium-Debugging: Element is not clickable at point (X,Y)
CodePudding user response:
The data you are looking for is based of fetch
/ xhr
call.
You can get it without scraping. See below.
import requests
headers = {'Origin': 'https://filialen.migros.ch',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36'}
r = requests.get(
'https://web-api.migros.ch/widgets/stores?key=loh7Diephiengaiv&aggregation_options[empty_buckets]=true&filters[markets][0][0]=super&filters[markets][0][1]=mno&filters[markets][0][2]=voi&filters[markets][0][3]=mp&filters[markets][0][4]=out&filters[markets][0][5]=spx&filters[markets][0][6]=doi&filters[markets][0][7]=mec&filters[markets][0][8]=mica&filters[markets][0][9]=res&filters[markets][0][10]=flori&filters[markets][0][11]=gour&filters[markets][0][12]=alna&filters[markets][0][13]=cof&filters[markets][0][14]=chng&verbosity=store&offset=0&limit=5000',
headers=headers)
if r.status_code == 200:
print('stores data below:')
data = r.json()
print(data)
else:
print(f'Oops. Statud code is {r.status_code}')
CodePudding user response:
Few points :
Launch b
rowser in full screen mode.
Use explicit waits.
Use this xpath
//span[contains(@aria-label, 'Standorte anzeigen')]/..
Sample code :
driver = webdriver.Chrome(driver_path)
driver.maximize_window()
#driver.implicitly_wait(50)
wait = WebDriverWait(driver, 20)
driver.get("https://filialen.migros.ch/de/center:46.8202,6.9575/zoom:8/")
try:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//*[@class='close-icon']"))).click() # if there is smth to click away
except:
pass
wait.until(EC.element_to_be_clickable((By.XPATH, "//span[contains(@aria-label, 'Standorte anzeigen')]/.."))).click()
Imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC