Home > Mobile >  How do I webscrape nested lists in Python?
How do I webscrape nested lists in Python?

Time:05-12

Link of website: https://www.zivame.com/rosaline-chromaticity-knit-cotton-top-florida-key.html?trksrc=category&trkid=search&trkorder=relevance

What I want to scrape: Short sleeves style, Relaxed fit for comfort (Basically the bullet points under Description)

This is the code I'm using currently:

from selenium import webdriver
import re
from bs4 import BeautifulSoup
import requests

result = requests.get("https://www.zivame.com/rosaline-chromaticity-knit-cotton-top-florida-key.html?trksrc=category&trkid=search&trkorder=relevance")

soup = BeautifulSoup(result.text, 'lxml')
page = soup.find('div', id="product-page")
description = page.find('div', id="product-basicdetail")
point1 = description.find('div', id="ff-rm text-size pd-b5")
print(point1)

CodePudding user response:

The data is coming as JSON data, you can scrape the data from the source page directly.

import requests
from lxml import html

r = requests.get('https://www.zivame.com/rosaline-chromaticity-knit-cotton-top-florida-key.html?trksrc=category&trkid=search&trkorder=relevance')
source_page = html.fromstring(r.text)
json_value = source_page.xpath("//script[contains(.,'window.__product=')]/text()")[0]
json_value = json_value.split("{features:{values:[{list:[")[1].split("]}],count:1}}},modelMetaData:")[0]
print(json_value.split(','))

CodePudding user response:

The url is completely depends on JavaScript. So you can grab data like the bullet points under Description using selenium with bs4

import time
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options


option = webdriver.ChromeOptions()
option.add_argument("start-maximized")

#chrome to stay open
option.add_experimental_option("detach", True)

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()),options=option)
driver.get('https://www.zivame.com/rosaline-chromaticity-knit-cotton-top-florida-key.html?trksrc=category&trkid=search&trkorder=relevance')
time.sleep(2)

soup = BeautifulSoup(driver.page_source,'lxml')
#page = soup.find('div', id="product-page")
#description = page.find('div', id="product-basicdetail")
for point in soup.select('div[] ul li')[0:2]:
    point1 = point.get_text(strip=True)
    print(point1)

for point in soup.select('div[] ul li')[2:]:
    point2 = point.get_text(strip=True)
    print(point2)

Output:

Short sleeves style
Relaxed fit for comfort

Fabric: Polyester Cotton
Do Not Dry Clean
Do Not Bleach
Dry In Shade
Tumble Dry At Lower Temperature
Wash Dark Colours Separately
Machine Wash Allowedless
  • Related