Scraping values of i and span tag that are inside of a p tag-CodePudding

I am trying to scrape the release date and number of downloads from the below code

<p><i >Release date</i> : <span >2022-06-02</span></p>
<p><i >Downloads</i> : <span  data-times-funtouch="">703</span></p>

Here's is my function to scrape it

def phone_data(url):
    r = requests.get(url)
    sp = BeautifulSoup(r.text, 'lxml')
    data = {
        "Release_Date" : sp.select_one('i.no-flip-over').text.strip().replace('\n', ' '),
        "Downloads" : sp.select_one('i.no-flip-over').text.strip().replace('\n', ' '),
    }
    print(data)


phone_data('https://www.vivo.com/in/support/upgradePackageData?id=132')

Here's my output:

{'Release_Date': '', 'Downloads': ''}

I am unable to see the values besides the keys in the dictionary

CodePudding user response：

I would use :-soup-contains to target in addition to the class, as well as remove the span as you need that as the adjacent element. You can use an adjacent sibling combinator to move from the element initially matched by class and :-soup-contains to the adjacent span.

You then avoid repeating the same info twice and can remove the calls to strip() and replace().

def phone_data(url):
    r = requests.get(url)
    sp = BeautifulSoup(r.text, 'lxml')
    data = {
        "Release_Date" : sp.select_one('.no-flip-over:-soup-contains("Release date")   span').text,
        "Downloads" : sp.select_one('.no-flip-over:-soup-contains("Downloads")   span').text,
    }
    print(data)


phone_data('https://www.vivo.com/in/support/upgradePackageData?id=132')

CodePudding user response：

Solution provided by @QHarr I would also recommend in fact you know exactly about the facts to scrape, so this is just an alternative that comes from the other site and may fits title of the question a bit better

Simply iterate all specs and create a dict with key value pair:

data = dict(e.text.split(' : ',1) for e in sp.select('.msg h1 ~ p:has(i span)'))

Sure you will scrape more as these two facts, but also get a very good overview about all the .keys() maybe there are some with typos, ... and you can pick an adjust in post processing.

Example

import requests
from bs4 import BeautifulSoup

def phone_data(url):
    r = requests.get(url)
    sp = BeautifulSoup(r.text, 'lxml')
    data = dict(e.text.split(' : ',1) for e in sp.select('.msg h1 ~ p:has(i span)'))
    return data

phone_data('https://www.vivo.com/in/support/upgradePackageData?id=132')

{'Release date': '2022-02-25',
 'File size': '1.87M',
 'Downloads': '3545',
 'Support system': 'Windows'}