Scrape a line of text from a website inside a div-CodePudding

I don't know how to scrape this text

Telefon Mobil Apple iPhone 13, Super Retina XDR OLED 6.1", 256GB Flash, Camera Duala 12 12 MP, Wi-Fi, 5G, iOS (Negru)

 <div >
        <h2>
            <a href="/solutii-mobile-telefoane-mobile/apple-telefon-mobil-apple-iphone-13-super-retina-xdr-oled-6.1-256gb-flash-camera-duala-12-12-mp-wi-fi-5g-ios-negru-3824456.html"> 
                <span style="color:red">Stoc limitat!</span>  
                Telefon Mobil Apple iPhone 13, Super Retina XDR OLED 6.1", 256GB Flash, Camera Duala 12   12 MP, Wi-Fi, 5G, iOS (Negru)
            </a>        
        </h2>
    </div>

What I've tried:

for n in j.find_all("div","npi_name"):
   n2=n.find("a", href=True, text=True)
   try:
       n1=n2['href']
   except:
       n2=n.find("a")
       n1=n2['href']
   n3=n2.string
   print(n3)

Output:

None

CodePudding user response：

Try:

from bs4 import BeautifulSoup

html_doc = """
 <div >
        <h2>
            <a href="/solutii-mobile-telefoane-mobile/apple-telefon-mobil-apple-iphone-13-super-retina-xdr-oled-6.1-256gb-flash-camera-duala-12-12-mp-wi-fi-5g-ios-negru-3824456.html"> 
                <span style="color:red">Stoc limitat!</span>  
                Telefon Mobil Apple iPhone 13, Super Retina XDR OLED 6.1", 256GB Flash, Camera Duala 12   12 MP, Wi-Fi, 5G, iOS (Negru)
            </a>        
        </h2>
    </div>
"""

soup = BeautifulSoup(html_doc, "html.parser")

t = "".join(soup.select_one(".npi_name a").find_all(text=True, recursive=False))
print(t.strip())

Prints:

Telefon Mobil Apple iPhone 13, Super Retina XDR OLED 6.1", 256GB Flash, Camera Duala 12   12 MP, Wi-Fi, 5G, iOS (Negru)

CodePudding user response：

I've made a few assumptions but something like this should work:

 for n in j.find_all("div", {"class": "npi_name"}):
      print(n.find("a").contents[2].strip())

This is how I arrived at my answer (the HTML you provided was entered in to a.html):

 from bs4 import BeautifulSoup


 def main():

   with open("a.html", "r") as file:

     html = file.read()
     soup = BeautifulSoup(html, "html.parser")

     divs = soup.find_all("div", {"class": "npi_name"})
     for div in divs:
       a = div.find("a").contents[2].strip()
       
       # Testing
       print(a)

 if __name__ == "__main__":
   main()

CodePudding user response：

texts = []
for a in soup.select("div.npi_name a[href]"):
    texts.append(a.contents[-1].strip())

Or more explicitly:

texts = []
for a in soup.select("div.npi_name a[href]"):
    if a.span:
        text = a.span.next_sibling
    else:
        text = a.string

    texts.append(text.strip())

CodePudding user response：

Select your elements more specific e.g. css selectors and use stripped_strings to get text, assuming it is always the last node in your element:

for e in soup.select('div.npi_name a[href]'):
    text = list(e.stripped_strings)[-1]
    print(text)

This way you could also process other information if needed e.g. href,span text,...

Example

Select multiple items, store information in list of dicts and convert it into a dataframe:

from bs4 import BeautifulSoup
import pandas as pd

html = '''
<div >
    <h2>
        <a href="/solutii-mobile-telefoane-mobile/apple-telefon-mobil-apple-iphone-13-super-retina-xdr-oled-6.1-256gb-flash-camera-duala-12-12-mp-wi-fi-5g-ios-negru-3824456.html"> 
            <span style="color:red">Stoc limitat!</span>  
                Telefon Mobil Apple iPhone 13, Super Retina XDR OLED 6.1", 256GB Flash, Camera Duala 12   12 MP, Wi-Fi, 5G, iOS (Negru)
        </a>
    </h2>
</div>
'''

soup = BeautifulSoup(html)

data = []

for e in soup.select('div.npi_name a[href]'):
    data.append({
        'url' : e['href'],
        'stock': s.text if (s := e.span) else None,
        'label' :list(e.stripped_strings)[-1]
    })

pd.DataFrame(data)

Output

url	stock	label
/solutii-mobile-telefoane-mobile/apple-telefon-mobil-apple-iphone-13-super-retina-xdr-oled-6.1-256gb-flash-camera-duala-12-12-mp-wi-fi-5g-ios-negru-3824456.html	Stoc limitat!	Telefon Mobil Apple iPhone 13, Super Retina XDR OLED 6.1", 256GB Flash, Camera Duala 12 12 MP, Wi-Fi, 5G, iOS (Negru)