How to click on an element and parse text from linked xml file (python)?-CodePudding

I would like to parse addresses from the following website: https://filialen.migros.ch/de/center:46.8202,6.9575/zoom:8/

So far I am able to go to the website and click away any pop-ups. But then I need to select the drop-down menu with "1163 STANDORTE" which I am not able to locate with my code. My code so far:

import pandas as pd
import requests
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select
from bs4 import BeautifulSoup
import time
import itertools
import os
import numpy as np
import csv
import pdb

os.chdir("Directory")
options = webdriver.ChromeOptions()
options.add_argument("--incognito")
driver = webdriver.Chrome('Directory/chromedriver.exe')
driver.get("https://filialen.migros.ch/de/center:46.8202,6.9575/zoom:8/")
time.sleep(1)
try:
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//*[@class='close-icon']"))).click() # if there is smth to click away
except:
    pass
time.sleep(4)

Then my attempts using the span and button element and several options of navigation:

#Version 1
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//span[@class='sc-hKFxyN jdMjfs']"))).click() 

#Version 2
element = driver.find_element_by_class_name('sc-eCApnc kiXUNl sc-jSFjdj lcZmPE')
driver.execute_script("arguments[0].scrollIntoView();", element)
driver.execute_script("arguments[0].click();", element)

# Version 3    
element = driver.find_element_by_class_name('sc-eCApnc kiXUNl sc-jSFjdj lcZmPE')
driver.execute_script("arguments[0].click();", element)

#Version 4
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//*[@class='sc-eCApnc kiXUNl sc-jSFjdj lcZmPE']"))).click() 

# Version 5
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "/html/body/div[2]/div/main/nav/header/button[1]"))).click() 

# Version 6
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//span[text()='1163 STANDORTE']"))).click()

Actually, there are three problems:

If I just open the link on my Chrome manually, "1163 STANDORTE" appears, whereas if I open the link on Chrome using python, fewer STANDORTE appear, but I cannot zoom out. So I crucially need ALL 1163 STANDORTE to appear.
I cannot locate the button using class and XPATH.
Behind the button is a probably linked XML file, and the information of the addresses only appears after having clicked on the button. In the end I want to scrape text, written on the XML file linked to that button.

Any suggestions?

My question is similar to these previous questions: How to parse several attributes of website with same class name in python? and to Selenium-Debugging: Element is not clickable at point (X,Y)

CodePudding user response：

The data you are looking for is based of fetch / xhr call.

You can get it without scraping. See below.

import requests

headers = {'Origin': 'https://filialen.migros.ch',
           'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36'}

r = requests.get(
    'https://web-api.migros.ch/widgets/stores?key=loh7Diephiengaiv&aggregation_options[empty_buckets]=true&filters[markets][0][0]=super&filters[markets][0][1]=mno&filters[markets][0][2]=voi&filters[markets][0][3]=mp&filters[markets][0][4]=out&filters[markets][0][5]=spx&filters[markets][0][6]=doi&filters[markets][0][7]=mec&filters[markets][0][8]=mica&filters[markets][0][9]=res&filters[markets][0][10]=flori&filters[markets][0][11]=gour&filters[markets][0][12]=alna&filters[markets][0][13]=cof&filters[markets][0][14]=chng&verbosity=store&offset=0&limit=5000',
    headers=headers)
if r.status_code == 200:
    print('stores data below:')
    data = r.json()
    print(data)
else:
    print(f'Oops. Statud code is {r.status_code}')

CodePudding user response：

Few points :

Launch browser in full screen mode.
Use explicit waits.
Use this xpath //span[contains(@aria-label, 'Standorte anzeigen')]/..

Sample code :

driver = webdriver.Chrome(driver_path)
driver.maximize_window()
#driver.implicitly_wait(50)
wait = WebDriverWait(driver, 20)

driver.get("https://filialen.migros.ch/de/center:46.8202,6.9575/zoom:8/")

try:
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//*[@class='close-icon']"))).click() # if there is smth to click away
except:
    pass

wait.until(EC.element_to_be_clickable((By.XPATH, "//span[contains(@aria-label, 'Standorte anzeigen')]/.."))).click()

Imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC