Home > Mobile >  BeautifulSoup: How to extract one or two values from li tags under span
BeautifulSoup: How to extract one or two values from li tags under span

Time:01-29

I am trying to scrape this url: Target Data

Can any one suggest how to scrape this data?

CodePudding user response:

Based on that part of question:

So, I tried to rewrite the code using the below loop.

To get a single value of the bullet select it with css selector and pseudo class :-soup-contains() with next sibling operator:

soup.select_one('#detailBulletsWrapper_feature_div span:-soup-contains("Manufacturer")   span').text

To get a dict of the bullets and its values use a dict comprehension what enables you to pick or filter based on available keys:

{e.text.split('\n')[0]:e.find_next_sibling('span').text for e in soup.select('#detailBulletsWrapper_feature_div li .a-text-bold')}

Be aware, if there are duplictaed bullets, this has to be adjust in a way it fits best to your needs, because there have to be unique keys in a dict

Possible solution for first value wins:

details = {}
for e in soup.select('#detailBulletsWrapper_feature_div li .a-text-bold'):
    if not details.get(e.text.split('\n')[0]):
        details.update({e.text.split('\n')[0]:e.find_next_sibling('span').text} )

Example

from bs4 import BeautifulSoup

html = '''
<div id="detailBulletsWrapper_feature_div" data-feature-name="detailBullets" data-template-name="detailBullets"  data-cel-widget="detailBulletsWrapper_feature_div"> <hr aria-hidden="true" > <h2>Product details</h2>
    <div id="detailBullets_feature_div">
             <ul >        <li><span > <span >Product Dimensions
                                    ‏
                                        :
                                    ‎
                                </span> <span>33 x 23 x 12 cm; 600 Grams</span> </span></li>          <li><span > <span >Date First Available
                                    ‏
                                        :
                                    ‎
                                </span> <span>30 June 2021</span> </span></li>                                  <li><span > <span >Manufacturer
                                    ‏
                                        :
                                    ‎
                                </span> <span>RELAXO FOOTWEARS LIMITED</span> </span></li>          <li><span > <span >ASIN
                                    ‏
                                        :
                                    ‎
                                </span> <span>B098BC48PZ</span> </span></li>          <li><span > <span >Item model number
                                    ‏
                                        :
                                    ‎
                                </span> <span>SX0687G</span> </span></li>          <li><span > <span >Country of Origin
                                    ‏
                                        :
                                    ‎
                                </span> <span>India</span> </span></li>          <li><span > <span >Department
                                    ‏
                                        :
                                    ‎
                                </span> <span>Mens</span> </span></li>          <li><span > <span >Manufacturer
                                    ‏
                                        :
                                    ‎
                                </span> <span>RELAXO FOOTWEARS LIMITED, RELAXO FOOTWEARS LIMITED, Aggarwal City Square, Plot No 10, Mangalam Palace. District Center, Rohini Sector-3, Delhi - 110085</span> </span></li>          <li><span > <span >Packer
                                    ‏
                                        :
                                    ‎
                                </span> <span>VIRAJ ENTERPRISES, Killa No. 31/18/1/2(2-4), Surya Nagar, Gali No. 1, Near Parle Factory, Jhajjar, Bahadurgarh, 124507</span> </span></li>  </div>
</div>
'''
soup = BeautifulSoup(html)

details = {}
for e in soup.select('#detailBulletsWrapper_feature_div li .a-text-bold'):
    if not details.get(e.text.split('\n')[0]):
        details.update({e.text.split('\n')[0]:e.find_next_sibling('span').text} )

print(soup.select_one('#detailBulletsWrapper_feature_div span:-soup-contains("Manufacturer")   span').text)

print(details)

Outputs

Under Armour

and

{'Product Dimensions': '33 x 23 x 12 cm; 600 Grams',
 'Date First Available': '30 June 2021',
 'Manufacturer': 'RELAXO FOOTWEARS LIMITED',
 'ASIN': 'B098BC48PZ',
 'Item model number': 'SX0687G',
 'Country of Origin': 'India',
 'Department': 'Mens',
 'Packer': 'VIRAJ ENTERPRISES, Killa No. 31/18/1/2(2-4), Surya Nagar, Gali No. 1, Near Parle Factory, Jhajjar, Bahadurgarh, 124507'}
  • Related