Home > Mobile >  Scraping Availability data from Booking.com
Scraping Availability data from Booking.com

Time:09-15

I would like to scrape the availability from booking.com.

I handle to click and open the dates tab but I am getting an error and I can't understand why.

The code through which I handled to click on the date tab is the below but the couldn't scrape the availability and the date.

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.maximize_window()
wait = WebDriverWait(driver,20)
test_url = 'https://www.booking.com/hotel/gr/diamandi-20.en-gb.html?label=gen173nr-1DCAEoggI46AdIM1gEaFyIAQGYAQm4ARjIAQzYAQPoAQGIAgGoAgS4ApGp7ZgGwAIB0gIkZTBjOTA2MTQtYTc0MC00YWUwLTk5ZWEtMWNiYzg3NThiNGQ12AIE4AIB&sid=47583bd8c0122ee70cdd7bb0b06b0944&aid=304142&ucfs=1&arphpl=1&checkin=2022-10-24&checkout=2022-10-30&dest_id=-829252&dest_type=city&group_adults=2&req_adults=2&no_rooms=1&group_children=0&req_children=0&hpos=2&hapos=2&sr_order=popularity&srpvid=f0f16af3449102aa&srepoch=1662736362&all_sr_blocks=852390201_352617405_2_0_0&highlighted_blocks=852390201_352617405_2_0_0&matching_block_id=852390201_352617405_2_0_0&sr_pri_blocks=852390201_352617405_2_0_0__30000&from=searchresults#hotelTmpl'
driver.get(test_url)
time.sleep(3)
soup2 = BeautifulSoup(driver.page_source, 'lxml')
date= []
blocked = []
info = []
button = driver.find_element(By.CLASS_NAME, 'xp__dates__checkin')
driver.execute_script('arguments[0].click()', button)
xpath = '//div[@data-bui-ref="calendar-content"]//span[@aria-label]'
for item in driver.find_elements(By.XPATH,xpath):
    date.append(item.get_attribute("aria-label"))
    blocked.append(item.get_attribute("calendar-day__price"))

The error is this one StaleElementReferenceException: Message: stale element reference: element is not attached to the page document

CodePudding user response:

There are 2 Check-in and Check-out date fields on that page. You are using the wrong xpath, that is why 'StaleElementReferenceException' error occurs.

I modified your code, it is working, just try it:

import time

from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromeService
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By


driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()))
driver.maximize_window()
driver.get("https://www.booking.com/hotel/gr/diamandi-20.en-gb.html?label=gen173nr-1DCAEoggI46AdIM1gEaFyIAQGYAQm4ARjIAQzYAQPoAQGIAgGoAgS4ApGp7ZgGwAIB0gIkZTBjOTA2MTQtYTc0MC00YWUwLTk5ZWEtMWNiYzg3NThiNGQ12AIE4AIB&sid=47583bd8c0122ee70cdd7bb0b06b0944&aid=304142&ucfs=1&arphpl=1&checkin=2022-10-24&checkout=2022-10-30&dest_id=-829252&dest_type=city&group_adults=2&req_adults=2&no_rooms=1&group_children=0&req_children=0&hpos=2&hapos=2&sr_order=popularity&srpvid=f0f16af3449102aa&srepoch=1662736362&all_sr_blocks=852390201_352617405_2_0_0&highlighted_blocks=852390201_352617405_2_0_0&matching_block_id=852390201_352617405_2_0_0&sr_pri_blocks=852390201_352617405_2_0_0__30000&from=searchresults#hotelTmpl")
driver.implicitly_wait(15)

date = []
blocked = []

checkin_date_ele = driver.find_element(By.XPATH,"(.//*[@class='sb-date-field__icon sb-date-field__icon-btn bk-svg-wrapper calendar-restructure-sb'])[3]")
# To scroll the page to view check-in date field
driver.execute_script("arguments[0].scrollIntoView(true)", driver.find_element(By.XPATH,"(.//*[contains(text(),'Most popular facilities')])[1]"))
time.sleep(1)
checkin_date_ele.click()
time.sleep(2)
checkin_dates = driver.find_elements(By.XPATH,"(.//*[@class='bui-calendar__main b-a11y-calendar-contrasts'])[2]//*[@class='calendar-day__number']")
length = len(checkin_dates)

for i in range(length):
    i =1
    print(driver.find_element(By.XPATH,
                              "(((.//*[@class='bui-calendar__main b-a11y-calendar-contrasts'])[2]//*[@class='calendar-day__number'])["   str(i)   "]//parent::span//parent::span)[1]").get_attribute('aria-label'))
    print(driver.find_element(By.XPATH,"((.//*[@class='bui-calendar__main b-a11y-calendar-contrasts'])[2]//*[@class='calendar-day__price'])["   str(i)   "]").text)
    print()

time.sleep(3)
# driver.quit()

The output will be:

1 October 2022
₹4.0K

2 October 2022
₹4.0K

3 October 2022
₹4.0K

4 October 2022
₹4.0K

5 October 2022
₹4.0K

and so on...

  • Related