I don't know how to scrape this text
Telefon Mobil Apple iPhone 13, Super Retina XDR OLED 6.1", 256GB Flash, Camera Duala 12 12 MP, Wi-Fi, 5G, iOS (Negru)
<div >
<h2>
<a href="/solutii-mobile-telefoane-mobile/apple-telefon-mobil-apple-iphone-13-super-retina-xdr-oled-6.1-256gb-flash-camera-duala-12-12-mp-wi-fi-5g-ios-negru-3824456.html">
<span style="color:red">Stoc limitat!</span>
Telefon Mobil Apple iPhone 13, Super Retina XDR OLED 6.1", 256GB Flash, Camera Duala 12 12 MP, Wi-Fi, 5G, iOS (Negru)
</a>
</h2>
</div>
What I've tried:
for n in j.find_all("div","npi_name"):
n2=n.find("a", href=True, text=True)
try:
n1=n2['href']
except:
n2=n.find("a")
n1=n2['href']
n3=n2.string
print(n3)
Output:
None
CodePudding user response:
Try:
from bs4 import BeautifulSoup
html_doc = """
<div >
<h2>
<a href="/solutii-mobile-telefoane-mobile/apple-telefon-mobil-apple-iphone-13-super-retina-xdr-oled-6.1-256gb-flash-camera-duala-12-12-mp-wi-fi-5g-ios-negru-3824456.html">
<span style="color:red">Stoc limitat!</span>
Telefon Mobil Apple iPhone 13, Super Retina XDR OLED 6.1", 256GB Flash, Camera Duala 12 12 MP, Wi-Fi, 5G, iOS (Negru)
</a>
</h2>
</div>
"""
soup = BeautifulSoup(html_doc, "html.parser")
t = "".join(soup.select_one(".npi_name a").find_all(text=True, recursive=False))
print(t.strip())
Prints:
Telefon Mobil Apple iPhone 13, Super Retina XDR OLED 6.1", 256GB Flash, Camera Duala 12 12 MP, Wi-Fi, 5G, iOS (Negru)
CodePudding user response:
I've made a few assumptions but something like this should work:
for n in j.find_all("div", {"class": "npi_name"}):
print(n.find("a").contents[2].strip())
This is how I arrived at my answer (the HTML you provided was entered in to a.html
):
from bs4 import BeautifulSoup
def main():
with open("a.html", "r") as file:
html = file.read()
soup = BeautifulSoup(html, "html.parser")
divs = soup.find_all("div", {"class": "npi_name"})
for div in divs:
a = div.find("a").contents[2].strip()
# Testing
print(a)
if __name__ == "__main__":
main()
CodePudding user response:
texts = []
for a in soup.select("div.npi_name a[href]"):
texts.append(a.contents[-1].strip())
Or more explicitly:
texts = []
for a in soup.select("div.npi_name a[href]"):
if a.span:
text = a.span.next_sibling
else:
text = a.string
texts.append(text.strip())
CodePudding user response:
Select your elements more specific e.g. css selectors
and use stripped_strings
to get text, assuming it is always the last node in your element:
for e in soup.select('div.npi_name a[href]'):
text = list(e.stripped_strings)[-1]
print(text)
This way you could also process other information if needed e.g. href,span text,...
Example
Select multiple items, store information in list of dicts and convert it into a dataframe:
from bs4 import BeautifulSoup
import pandas as pd
html = '''
<div >
<h2>
<a href="/solutii-mobile-telefoane-mobile/apple-telefon-mobil-apple-iphone-13-super-retina-xdr-oled-6.1-256gb-flash-camera-duala-12-12-mp-wi-fi-5g-ios-negru-3824456.html">
<span style="color:red">Stoc limitat!</span>
Telefon Mobil Apple iPhone 13, Super Retina XDR OLED 6.1", 256GB Flash, Camera Duala 12 12 MP, Wi-Fi, 5G, iOS (Negru)
</a>
</h2>
</div>
'''
soup = BeautifulSoup(html)
data = []
for e in soup.select('div.npi_name a[href]'):
data.append({
'url' : e['href'],
'stock': s.text if (s := e.span) else None,
'label' :list(e.stripped_strings)[-1]
})
pd.DataFrame(data)
Output
url | stock | label |
---|---|---|
/solutii-mobile-telefoane-mobile/apple-telefon-mobil-apple-iphone-13-super-retina-xdr-oled-6.1-256gb-flash-camera-duala-12-12-mp-wi-fi-5g-ios-negru-3824456.html | Stoc limitat! | Telefon Mobil Apple iPhone 13, Super Retina XDR OLED 6.1", 256GB Flash, Camera Duala 12 12 MP, Wi-Fi, 5G, iOS (Negru) |