Home > OS >  Python Selenium not returning values at all, but also not for all elements in class
Python Selenium not returning values at all, but also not for all elements in class

Time:09-12

I am trying to figure out web scraping using Selenium, but keep bumping into issues with syntax. Can someone please help me understand how to modify my code so that it displays the day, weather, and temps for the first 8 days that pop up in google search? Thanks!

from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()

url = 'https://www.google.com/search?q=perth weather'
driver.get(url)

days = driver.find_elements(By.CLASS_NAME, "wob_df wob_ds")

for day in days:
    name = day.find_element(By.XPATH, './/*[@id="wob_dp"]/div[1]/div[1]').text
    rain = day.find_element(By.XPATH, './html/body/div[7]/div/div[11]/div[1]/div[2]/div[2]/div/div/div[1]/div/div/div/div/div[3]/div[3]/div/div[1]/div[2]').text
    temp_min = day.find_element(By.XPATH, './/*[@id="wob_dp"]/div[1]/div[3]/div[1]').text
    temp_max = day.find_element(By.XPATH, './/*[@id="wob_dp"]/div[1]/div[3]/div[2]').text
    print(name, rain, temp_min, temp_max)

CodePudding user response:

You were headed in the general right direction. Here's some feedback and suggestions:

  1. Your first locator is using a class name of "wob_df wob_ds." The problem is that's actually two class names, "wob_df" and "wob_ds." Class names are separated by a space. You will get an error on that line if you use more than one class when searching by class name. You can continue to use class name and use either class or better yet, you can use a CSS selector and then use both classes, e.g. ".wob_df.wob_ds" ("." indicates a class name).

  2. XPaths that start with html, have many levels, or use indices are likely to be very brittle (will break with the smallest change to the page). It's better to learn how to hand craft an XPath if you are going to use them but...

  3. You should prefer using locators in this order: By.ID, then By.CSS_SELECTOR, and then finally, and only when required, use By.XPATH. XPath is the only locator type that can locate an element by contained text and do complicated DOM traversal. CSS selectors are better supported, faster, and the syntax is simpler.

Making changes based on these suggestions, the updated code would look like

url = 'https://www.google.com/search?q=perth weather'
driver.get(url)

days = driver.find_elements(By.CSS_SELECTOR, "#wob_dp > div")
for day in days:
    parts = day.find_elements(By.CSS_SELECTOR, "div")
    name = parts[0].text
    weather = parts[1].find_element(By.CSS_SELECTOR, "img").get_attribute("alt")
    temp_min = parts[3].text
    temp_max = parts[4].text
    print(name, weather, temp_min, temp_max)

This should print

Tue Thunderstorm 62° 52°
  • Related