Home > Software design >  Cannot get all HTML when using requests.get or webdriver.get
Cannot get all HTML when using requests.get or webdriver.get

Time:11-26

I am trying to scrape 100.0% from style attribute:

<div class="w-full mt-1 bg-white rounded-lg shadow">
  <div class="py-1 bg-purple-900 rounded-lg" style="width: 100.0%"></div>
</div> 

The page source response does not go this deep into the page, I have tried:

def ScrapePercents():
    URL = "https://citystrides.com/cities/26013/search_striders?page=1"
    page = requests.get(URL)

    soup = BeautifulSoup(page.content, "html.parser")    
    results = soup.find("div", class_="flex flex-wrap space-y-4")    
    percents = results.find_all("div", class_="py-1 bg-purple-900 rounded-lg")

    pl = []

    for percent in percents:
        cleantext = percent['style'].lstrip('width: ')
        percent_neat = (cleantext.strip('%'))
        percent_float = float(percent_neat)
        pl.append(percent_float)
        print("pl as it appends ", pl)

    return pl

And

def Selenium():
    print("shell for selenium")
    pl = "shell for selenium percents"
    driver = webdriver.Chrome("chromedriver.exe")
    driver.get("https://citystrides.com/cities/26013/search_striders")
    content = driver.page_source

    soup = BeautifulSoup(content)
    results = soup.find("div", class_="flex flex-wrap space-y-4")

    return soup

The second code is set just to see whether the content includes the nested div. I'm struggling to work out how to get the nested div.

CodePudding user response:

the element/div you are looking for with class w-full and py-1 are actually not there in the source of the website, I just tried to inspect and could not find them,can you see them when you open that website on your browser?

See the image

CodePudding user response:

Using Selenium to extract the DIV content you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

  • Using CSS_SELECTOR:

    driver.get("https://citystrides.com/cities/26013/search_striders")
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".flex.flex-wrap.space-y-4"))).get_attribute("innerHTML"))
    
  • Using XPATH:

    driver.get("https://citystrides.com/cities/26013/search_striders")
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='flex flex-wrap space-y-4']"))).get_attribute("innerHTML"))
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

Console Output:

<div id="user_10277" class="w-full max-w-lg p-2 transition duration-150 rounded-lg shadow-md bg-gradient-to-br from-gray-50 to-gray-200">
    <div class="flex-grow text-left">
      <h2 class="overflow-hidden text-sm font-bold text-gray-900 truncate">Pedro Queiroz</h2>

      <div class="text-xs uppercase">5233 streets</div>



    </div>
  </div>



  <div class="flex justify-between mt-5 text-sm">
    <div>
      <a data-turbo-frame="_top" class="purple-button flex items-center" href="/users/10277/map">
    <svg class="float-left w-6 h-6 mr-1" fill="none" stroke-linecap="round" stroke-linejoin="round" stroke-width="1" viewBox="0 0 24 24" stroke="currentColor"><path d="M5 3v4M3 5h4M6 17v4m-2-2h4m5-16l2.286 6.857L21 12l-5.714 2.143L13 21l-2.286-6.857L5 12l5.714-2.143L13 3z"></path></svg>
    LifeMap
</a>    </div>

    <div title="Pedro Queiroz is a Supporter">           
  • Related