I am trying to figure out web scraping using Selenium, but keep bumping into issues with syntax. Can someone please help me understand how to modify my code so that it displays the day, weather, and temps for the first 8 days that pop up in google search? Thanks!
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
url = 'https://www.google.com/search?q=perth weather'
driver.get(url)
days = driver.find_elements(By.CLASS_NAME, "wob_df wob_ds")
for day in days:
name = day.find_element(By.XPATH, './/*[@id="wob_dp"]/div[1]/div[1]').text
rain = day.find_element(By.XPATH, './html/body/div[7]/div/div[11]/div[1]/div[2]/div[2]/div/div/div[1]/div/div/div/div/div[3]/div[3]/div/div[1]/div[2]').text
temp_min = day.find_element(By.XPATH, './/*[@id="wob_dp"]/div[1]/div[3]/div[1]').text
temp_max = day.find_element(By.XPATH, './/*[@id="wob_dp"]/div[1]/div[3]/div[2]').text
print(name, rain, temp_min, temp_max)
CodePudding user response:
You were headed in the general right direction. Here's some feedback and suggestions:
Your first locator is using a class name of "wob_df wob_ds." The problem is that's actually two class names, "wob_df" and "wob_ds." Class names are separated by a space. You will get an error on that line if you use more than one class when searching by class name. You can continue to use class name and use either class or better yet, you can use a CSS selector and then use both classes, e.g. ".wob_df.wob_ds" ("." indicates a class name).
XPaths that start with html, have many levels, or use indices are likely to be very brittle (will break with the smallest change to the page). It's better to learn how to hand craft an XPath if you are going to use them but...
You should prefer using locators in this order:
By.ID
, thenBy.CSS_SELECTOR
, and then finally, and only when required, useBy.XPATH
. XPath is the only locator type that can locate an element by contained text and do complicated DOM traversal. CSS selectors are better supported, faster, and the syntax is simpler.
Making changes based on these suggestions, the updated code would look like
url = 'https://www.google.com/search?q=perth weather'
driver.get(url)
days = driver.find_elements(By.CSS_SELECTOR, "#wob_dp > div")
for day in days:
parts = day.find_elements(By.CSS_SELECTOR, "div")
name = parts[0].text
weather = parts[1].find_element(By.CSS_SELECTOR, "img").get_attribute("alt")
temp_min = parts[3].text
temp_max = parts[4].text
print(name, weather, temp_min, temp_max)
This should print
Tue Thunderstorm 62° 52°