How to get invisible data from website with BeautifulSoup-CodePudding

I need fiverr service delivery times but I could get just first package's(Basic) delivery time. How can I get second and third package's delivery time? Is there any chance I can get it without using Selenium?

import requests
from bs4 import BeautifulSoup


response = requests.get("https://www.fiverr.com/volkeins/provide-10x-dofollow-backlinks-from-amazon-da96-permanent")

# BEAUTIFULSOUP

soup = BeautifulSoup(response.text, 'lxml')
print(soup.find_all("b", class_ = "delivery"))

CodePudding user response：

The data that the url contain which is dynamic meaning data is generated by JavaScript and BeautifulSoup can't render javaSceipt.So, You need automation tool something like selenium with BeautifulSoup. Please just run the code.

import time
from bs4 import BeautifulSoup
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

url ="https://www.fiverr.com/volkeins/provide-10x-dofollow-backlinks-from-amazon-da96-permanent"
    
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.maximize_window()
time.sleep(8)
driver.get(url)
time.sleep(10)

soup = BeautifulSoup(driver.page_source, 'lxml')
driver.close()


print(soup.find("b", class_ = "delivery").text)

Output:

7 Days Delivery

CodePudding user response：

Using Selenium to print the text 7 Days Delivery you can use either of the following locator strategies:

Using css_selector and get_attribute("innerHTML"):

driver.get('https://www.fiverr.com/volkeins/provide-10x-dofollow-backlinks-from-amazon-da96-permanent')
print(driver.find_element(By.CSS_SELECTOR, "b.delivery").get_attribute("innerHTML"))

Using xpath and text attribute:

driver.get('https://www.fiverr.com/volkeins/provide-10x-dofollow-backlinks-from-amazon-da96-permanent')   
print(driver.find_element(By.XPATH, "//b[@class='delivery']").text)

To extract the text 7 Days Delivery ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:

Using CSS_SELECTOR and text attribute:

driver.get('https://www.fiverr.com/volkeins/provide-10x-dofollow-backlinks-from-amazon-da96-permanent')
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "b.delivery"))).get_attribute("innerHTML"))

Using XPATH and get_attribute("innerHTML"):

driver.get('https://www.fiverr.com/volkeins/provide-10x-dofollow-backlinks-from-amazon-da96-permanent')
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//b[@class='delivery']"))).text)

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Console Output:
```
7 Days Delivery
```

You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python

References

Link to useful documentation:

get_attribute() method Gets the given attribute or property of the element.
text attribute returns The text of the element.
Difference between text and innerHTML using Selenium

CodePudding user response：

With requests.get('https://...').text, you will receive the html content of the page. The problem is that most modern websites use client-side-rendering to built up the content for the page, so you will need javascript to render the page as your web browser does. You can use selenium to achieve this.