Home > Software engineering >  How to get invisible data from website with BeautifulSoup
How to get invisible data from website with BeautifulSoup

Time:03-14

I need fiverr service delivery times but I could get just first package's(Basic) delivery time. How can I get second and third package's delivery time? Is there any chance I can get it without using Selenium?

enter image description here

import requests
from bs4 import BeautifulSoup


response = requests.get("https://www.fiverr.com/volkeins/provide-10x-dofollow-backlinks-from-amazon-da96-permanent")

# BEAUTIFULSOUP

soup = BeautifulSoup(response.text, 'lxml')
print(soup.find_all("b", class_ = "delivery"))

CodePudding user response:

The data that the url contain which is dynamic meaning data is generated by JavaScript and BeautifulSoup can't render javaSceipt.So, You need automation tool something like selenium with BeautifulSoup. Please just run the code.

import time
from bs4 import BeautifulSoup
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

url ="https://www.fiverr.com/volkeins/provide-10x-dofollow-backlinks-from-amazon-da96-permanent"
    
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.maximize_window()
time.sleep(8)
driver.get(url)
time.sleep(10)

soup = BeautifulSoup(driver.page_source, 'lxml')
driver.close()


print(soup.find("b", class_ = "delivery").text)

Output:

7 Days Delivery

CodePudding user response:

Using Selenium to print the text 7 Days Delivery you can use either of the following locator strategies:

  • Using css_selector and get_attribute("innerHTML"):

    driver.get('https://www.fiverr.com/volkeins/provide-10x-dofollow-backlinks-from-amazon-da96-permanent')
    print(driver.find_element(By.CSS_SELECTOR, "b.delivery").get_attribute("innerHTML"))
    
  • Using xpath and text attribute:

    driver.get('https://www.fiverr.com/volkeins/provide-10x-dofollow-backlinks-from-amazon-da96-permanent')   
    print(driver.find_element(By.XPATH, "//b[@class='delivery']").text)
    

To extract the text 7 Days Delivery ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:

  • Using CSS_SELECTOR and text attribute:

    driver.get('https://www.fiverr.com/volkeins/provide-10x-dofollow-backlinks-from-amazon-da96-permanent')
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "b.delivery"))).get_attribute("innerHTML"))
    
  • Using XPATH and get_attribute("innerHTML"):

    driver.get('https://www.fiverr.com/volkeins/provide-10x-dofollow-backlinks-from-amazon-da96-permanent')
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//b[@class='delivery']"))).text)
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
  • Console Output:

    7 Days Delivery
    

You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python


References

Link to useful documentation:

CodePudding user response:

With requests.get('https://...').text, you will receive the html content of the page. The problem is that most modern websites use client-side-rendering to built up the content for the page, so you will need javascript to render the page as your web browser does. You can use selenium to achieve this.

  • Related