Im scraping a web page and need to pull an item from a bulleted list
I cant use something like the code below because the length of the list changes on every page. The link that im using to test is https://www.chadwellsupply.com/categories/appliances/Stove-Ranges/hotpoint-24-spacesaver-electric-range---white/
import requests
from bs4 import BeautifulSoup
url = 'https://www.chadwellsupply.com/categories/appliances/Stove-Ranges/hotpoint-24-spacesaver- electric-range---white/'
response = requests.get(url)
repo =soup.find('div',class_="tabs").find_all('li')[2]
print(repo.text.strip())
The below code pulls the entire list but I need to extract the "MFG#" from the output
import requests
from bs4 import BeautifulSoup
url = 'https://www.chadwellsupply.com/categories/appliances/Stove-Ranges/hotpoint-24-spacesaver-electric-range---white/'
response = requests.get(url)
repo =soup.find('div',class_="tabs").find_all('li')
print(repo)
This is the ouput im trying to pull the "MFG#" from
[<li >
<a aria-controls="Features" aria-selected="true" data-toggle="tab" href="#mobile-features" id="mobile-tab1" role="tab">
Features
</a>
</li>, <li >
<a aria-controls="Specifications" aria-selected="false" data-toggle="tab" href="#mobile-specifications" id="mobile-tab2" role="tab">
Specifications
</a>
</li>, <li>2.9 cu. Ft. oven capacity</li>, <li>NEW Sensi-Temp Technology</li>, <li>Standard clean oven</li>, <li>(3) 6" 1250W & (1) 8" 2400W coil heating element</li>, <li>2 oven racks</li>, <li>Includes broiler pan with grid</li>, <li>Lift-Up cooktop</li>, <li>Chrome drip bowls</li>, <li>ADA Compliant</li>, <li>41-7/8"H x 23-3/4"W x 26-5/8"D</li>, <li>White</li>, **<li>MFG# RAS240DMWW</li>**, <li>Power cord not included</li>, **<li>
Mfg:
RAS240DMWW**
</li>, <li>
Color:
White
</li>, <li>
Height:
41-7/8"
</li>, <li>
Width:
23-3/4"
</li>, <li>
Depth:
26-5/8"
</li>, <li>
Size:
3.0 cu ft.
</li>, <li>
Type:
Electric
</li>, <li>
ADA Compliant:
True
</li>, <li>
Page:
32
</li>]
CodePudding user response:
Just filter for MFG
in text value.
For example:
import requests
from bs4 import BeautifulSoup
url = 'https://www.chadwellsupply.com/categories/appliances/Stove-Ranges/hotpoint-24-spacesaver-electric-range---white/'
response = requests.get(url)
soup = [
li.getText() for li in
BeautifulSoup(response.text, "lxml")
.select_one(".Chadwell-Pages-CatalogEntry .tabs .tab-content ul")
if "MFG" in li.getText()
]
print(soup)
Output:
['MFG# RAS240DMWW']
CodePudding user response:
You can use CSS selector with :-soup-contains()
to search for a tag that contains specific text:
import requests
from bs4 import BeautifulSoup
url = "https://www.chadwellsupply.com/categories/appliances/Stove-Ranges/hotpoint-24-spacesaver-electric-range---white/"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
mfg = soup.select_one("li:-soup-contains(Mfg)").text
print(mfg.split(":")[-1].strip())
Prints:
RAS240DMWW