Home > Software engineering >  Trying to grab href URLs from Kayak website using BeautifulSoup
Trying to grab href URLs from Kayak website using BeautifulSoup

Time:02-01

I'm trying to grab the URLs from each of the cards for flights that appear in enter image description here

CodePudding user response:

In Selenium you should use XPaths locating web elements, not their attributes.
To extract the href attribute values you need to collect all these a web elements into a list, and then iterate over that list to extract the href attribute from each web element in the list, as following:

hrefs = [link.get_attribute('href') for link in driver.find_elements(By.XPATH,"//div[@class='above-button']//a[contains(@class,'booking-link')]")]

In the code above you are gettin all matching web elements to a list and then for each link element in that list applying link.get_attribute('href') to eaxtract the href attribute value.
The result is collected into hrefs list.

CodePudding user response:

To extract the links from all the href attributes within the website you have to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following locator strategies:

  • Using CSS_SELECTOR and text attribute:

    driver.get("https://www.kayak.com/flights/AMS-WMI,nearby/2023-02-15/WMI-SOF,nearby/2023-02-18/SOF-BEG,nearby/2023-02-20/BEG-MIL,nearby/2023-02-23/MIL-AMS,nearby/2023-02-25/?sort=bestflight_a&fs=stops=-2&attempt=1&lastms=1675195877028")
    print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a[role='link'][href]")))])
    
  • Using XPATH and get_attribute("innerHTML"):

    driver.get("https://www.kayak.com/flights/AMS-WMI,nearby/2023-02-15/WMI-SOF,nearby/2023-02-18/SOF-BEG,nearby/2023-02-20/BEG-MIL,nearby/2023-02-23/MIL-AMS,nearby/2023-02-25/?sort=bestflight_a&fs=stops=-2&attempt=1&lastms=1675195877028")
    print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//a[@role='link' and @href]")))])
    
  • Console Output:

    ['https://www.kayak.com/book/flight?code=OiFir3l_8L.18wgSzIaLAlgpzrTH2pViLaYAeeTFjgE.81197.36c89f7717e84ac7a4ee2898627fa251&h=40d03211086c&sub=E-191e8b4083a&pageOrigin=F..RP.FE.M0', 'https://www.kayak.com/book/flight?code=OiFir3l_8L.47F3EeHCWiIEdn9PX-8xhQ.41000.a6f675f0a632a9d55b0fab7f1b09f9d8&h=8dce29003385&sub=E-10f42a14593&pageOrigin=F..RP.FE.M1', 'https://www.kayak.com/book/flight?code=OiFir3l_8L.18wgSzIaLAlgpzrTH2pViLaYAeeTFjgE.81397.58fb639ccf8938f61eec808f1e13c556&h=ba02be2bf0dc&sub=E-191e8b4083a&pageOrigin=F..RP.FE.M2', 'https://www.kayak.com/book/flight?code=OiFir3l_8L.18wgSzIaLAlgpzrTH2pViLaYAeeTFjgE.81197.aca9104db06bae99e4f55a158dfd3ff2&h=61a4dc653dc3&sub=E-191e8b4083a&pageOrigin=F..RP.FE.M4', 'https://www.kayak.com/book/flight?code=OiFir3l_8L.18wgSzIaLAlgpzrTH2pViLaYAeeTFjgE.81397.bcc92e8ae656b0e298dbe8a6555bd825&h=ece97a1b9509&sub=E-191e8b4083a&pageOrigin=F..RP.FE.M5', 'https://www.kayak.com/book/flight?code=OiFir3l_8L.18wgSzIaLAlgpzrTH2pViLaYAeeTFjgE.80697.732461bd95055d2478850abf1741221f&h=c94d3b283c0a&sub=E-191e8b4083a&pageOrigin=F..RP.FE.M6', 'https://www.kayak.com/book/flight?code=OiFir3l_8L.18wgSzIaLAlgpzrTH2pViLaYAeeTFjgE.80997.215cabd8ee10582a0d6b94c20dfb95ad&h=b493996d9e9d&sub=E-191e8b4083a&pageOrigin=F..RP.FE.M7', 'https://www.kayak.com/book/flight?code=OiFir3l_8L.18wgSzIaLAlgpzrTH2pViLaYAeeTFjgE.80697.cabd6f6051c17b3cd7f9129454607d0e&h=917f7fb0f2f5&sub=E-191e8b4083a&pageOrigin=F..RP.FE.M8', 'https://www.kayak.com/book/flight?code=OiFir3l_8L.18wgSzIaLAlgpzrTH2pViLaYAeeTFjgE.80997.07496f1f93e916d757ec284da1ef4638&h=52abbdb7d13a&sub=E-191e8b4083a&pageOrigin=F..RP.FE.M11', 'https://www.kayak.com/book/flight?code=OiFir3l_8L.eNCwACMVOeJpd4CyPwn0EI6M4XD8KcmF.71697.a1633218d7cbd5eb2fe950504a6207a9&h=c8fe9769a628&sub=E-15b10c5af5f&pageOrigin=F..RP.FE.M12', 'https://www.kayak.com/book/flight?code=OiFir3l_8L.eNCwACMVOeJpd4CyPwn0EI6M4XD8KcmF.71997.ebd834f5c265ae428e1bdbb3637a606b&h=d80290f43a93&sub=E-15b10c5af5f&pageOrigin=F..RP.FE.M13', 'https://www.kayak.com/book/flight?code=OiFir3l_8L.eNCwACMVOeJpd4CyPwn0EI6M4XD8KcmF.71697.c7a2ace471ba5c35334014e91956f849&h=2f3b292c166e&sub=E-15b10c5af5f&pageOrigin=F..RP.FE.M14', 'https://www.kayak.com/book/flight?code=OiFir3l_8L.eNCwACMVOeJpd4CyPwn0EI6M4XD8KcmF.71997.a281f919b379469a223fb34ed5510409&h=913a810b8e80&sub=E-15b10c5af5f&pageOrigin=F..RP.FE.M15', 'https://www.kayak.com/book/flight?code=OiFir3l_8L.eNCwACMVOeJpd4CyPwn0EI6M4XD8KcmF.71697.4fdb4fded43ccbf47dcdcad01bf919e6&h=ea07410d1dda&sub=E-15b10c5af5f&pageOrigin=F..RP.FE.M16', 'https://www.kayak.com/book/flight?code=OiFir3l_8L.eNCwACMVOeJpd4CyPwn0EI6M4XD8KcmF.71997.23101dac562249519c55956ba4cc7abf&h=45c62f765f1f&sub=E-15b10c5af5f&pageOrigin=F..RP.FE.M17', 'https://www.kayak.com/book/flight?code=OiFir3l_8L.eNCwACMVOeJpd4CyPwn0EI6M4XD8KcmF.71697.1377d5650be1523cc39b1849b7d9bbdf&h=c04d73c3ac61&sub=E-15b10c5af5f&pageOrigin=F..RP.FE.M18']
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
  • Related