Home > Net >  How to get only value of parent div element and exclude remaining child div elements using Beautiful
How to get only value of parent div element and exclude remaining child div elements using Beautiful

Time:04-13

Decided to play around with web scraping. Got stuck with a tricky div block, and spent hours searching and trying to figure out how to solve this issue and return the expected output I would have expected by default. But can't seem to get my head around the approach to take.

I'm having problems with div under the class "listing__details-pricing". Div with class "listing__details-pricing" comes in three different forms. Form 3 returns my expected outcomes, the other forms return additional values that I didn't expect to be returned.

Form 1:

<div >
   €16,000 
   <div >Private</div>
</div>

Form 2:

<div >
   €16,000
   <div >
      <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512">
         <path d="M235.4 172.2c0-11.4 9.3-19.9 20.5-19.9 11.4 0 20.7 8.5 20.7 19.9s-9.3 20-20.7 20c-11.2 0-20.5-8.6-20.5-20zm1.4 35.7H275V352h-38.2V207.9z"></path>
         <path d="M256 76c48.1 0 93.3 18.7 127.3 52.7S436 207.9 436 256s-18.7 93.3-52.7 127.3S304.1 436 256 436c-48.1 0-93.3-18.7-127.3-52.7S76 304.1 76 256s18.7-93.3 52.7-127.3S207.9 76 256 76m0-28C141.1 48 48 141.1 48 256s93.1 208 208 208 208-93.1 208-208S370.9 48 256 48z"></path>
      </svg>
      €306
      <div >PER MONTH</div>
   </div>
</div>

Form 3:

<div >€16,250</div>

Code:

from bs4 import BeautifulSoup


html = """<html>
<body>
       <div >
                         <div >
                            <div >Meath</div>
                            <div >
                               <h2>VOLKSWAGEN Golf</h2>
                               <p>1.6 TDI MATCH EDITION BLUEMOTION 110PS 5DR</p>
                            </div>
                            <div >
                               <div >
                                  <p>2016</p>
                               </div>
                               <div >(161 REG)</div>
                               <div >140,012 km</div>
                            </div>
                            <div >
                               €16,000
                               <div >Private</div>
                            </div>
                            <div >
                               <span  style="background-color: black;"></span>
                               <p>Black</p>
                            </div>
                         </div>
                      
         
                 
                         <div >
                            <div >Longford</div>
                            <div >
                               <h2>VOLKSWAGEN Passat</h2>
                               <p>2.0 TDI SE BUSINESS</p>
                            </div>
                            <div >
                               <div >
                                  <p>2015</p>
                               </div>
                               <div >(152 REG)</div>
                               <div >164,778 km</div>
                            </div>
                            <div >€16,250</div>
                            <div >
                               <span  style="background-color: black;"></span>
                               <p>Black</p>
                            </div>
                         </div>
                         
                         <div >
                            <div >Monaghan</div>
                            <div >
                               <h2>VOLKSWAGEN Passat</h2>
                               <p>HIGHLINE BE 2.0 TDI MANUAL 6SPEED FWD 150HP 4DR</p>
                            </div>
                            <div >
                               <div >
                                  <p>2016</p>
                               </div>
                               <div >(161 REG)</div>
                               <div >230,000 km</div>
                            </div>
                            <div >
                               €16,000
                               <div >
                                  <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512">
                                     <path d="M235.4 172.2c0-11.4 9.3-19.9 20.5-19.9 11.4 0 20.7 8.5 20.7 19.9s-9.3 20-20.7 20c-11.2 0-20.5-8.6-20.5-20zm1.4 35.7H275V352h-38.2V207.9z"></path>
                                     <path d="M256 76c48.1 0 93.3 18.7 127.3 52.7S436 207.9 436 256s-18.7 93.3-52.7 127.3S304.1 436 256 436c-48.1 0-93.3-18.7-127.3-52.7S76 304.1 76 256s18.7-93.3 52.7-127.3S207.9 76 256 76m0-28C141.1 48 48 141.1 48 256s93.1 208 208 208 208-93.1 208-208S370.9 48 256 48z"></path>
                                  </svg>
                                  €306
                                  <div >PER MONTH</div>
                               </div>
                            </div>
                            <div >
                               <span  style="background-color: black;"></span>
                               <p>Black</p>
                            </div>
                         </div>
             <div ></div>
          </div>

</body>
</html>
"""

soup = BeautifulSoup(html, "html.parser")
results = soup.find(class_="vehicle-search-form__results")

job_elements = results.find_all(class_="listing__details listing__details--desktop")
for job_element in job_elements:
    price = job_element.find(class_="listing__details-pricing")

    print(price.text.strip())

Current output:

€16,000
Private
€16,250
€16,000€306PER MONTH

Expected output:

€16,000
€16,250
€16,000

CodePudding user response:

Change the last line to:

print(price.contents[0].strip())

This prints:

€16,000
€16,250
€16,000

Or:

print(price.find(text=True).strip())

CodePudding user response:

All price values are immediate after <div > which is called text node. You directly can apply class_="listing__details-pricing" then to get text node value by calling find(text=True)

from bs4 import BeautifulSoup


html = """<html>
<body>
       <div >
                         <div >
                            <div >Meath</div>
                            <div >
                               <h2>VOLKSWAGEN Golf</h2>
                               <p>1.6 TDI MATCH EDITION BLUEMOTION 110PS 5DR</p>
                            </div>
                            <div >
                               <div >
                                  <p>2016</p>
                               </div>
                               <div >(161 REG)</div>
                               <div >140,012 km</div>
                            </div>
                            <div >
                               €16,000
                               <div >Private</div>
                            </div>
                            <div >
                               <span  style="background-color: black;"></span>
                               <p>Black</p>
                            </div>
                         </div>
                      
         
                 
                         <div >
                            <div >Longford</div>
                            <div >
                               <h2>VOLKSWAGEN Passat</h2>
                               <p>2.0 TDI SE BUSINESS</p>
                            </div>
                            <div >
                               <div >
                                  <p>2015</p>
                               </div>
                               <div >(152 REG)</div>
                               <div >164,778 km</div>
                            </div>
                            <div >€16,250</div>
                            <div >
                               <span  style="background-color: black;"></span>
                               <p>Black</p>
                            </div>
                         </div>
                         
                         <div >
                            <div >Monaghan</div>
                            <div >
                               <h2>VOLKSWAGEN Passat</h2>
                               <p>HIGHLINE BE 2.0 TDI MANUAL 6SPEED FWD 150HP 4DR</p>
                            </div>
                            <div >
                               <div >
                                  <p>2016</p>
                               </div>
                               <div >(161 REG)</div>
                               <div >230,000 km</div>
                            </div>
                            <div >
                               €16,000
                               <div >
                                  <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512">
                                     <path d="M235.4 172.2c0-11.4 9.3-19.9 20.5-19.9 11.4 0 20.7 8.5 20.7 19.9s-9.3 20-20.7 20c-11.2 0-20.5-8.6-20.5-20zm1.4 35.7H275V352h-38.2V207.9z"></path>
                                     <path d="M256 76c48.1 0 93.3 18.7 127.3 52.7S436 207.9 436 256s-18.7 93.3-52.7 127.3S304.1 436 256 436c-48.1 0-93.3-18.7-127.3-52.7S76 304.1 76 256s18.7-93.3 52.7-127.3S207.9 76 256 76m0-28C141.1 48 48 141.1 48 256s93.1 208 208 208 208-93.1 208-208S370.9 48 256 48z"></path>
                                  </svg>
                                  €306
                                  <div >PER MONTH</div>
                               </div>
                            </div>
                            <div >
                               <span  style="background-color: black;"></span>
                               <p>Black</p>
                            </div>
                         </div>
             <div ></div>
          </div>

</body>
</html>
"""

soup = BeautifulSoup(html, "html.parser")
job_elements = soup.find_all(class_="listing__details-pricing")
for job_element in job_elements:
    price = job_element.find(text=True).strip()

    print(price)

Output:

€16,000
€16,250
€16,000
  • Related