I am trying to scrape the release date and number of downloads from the below code
<p><i >Release date</i> : <span >2022-06-02</span></p>
<p><i >Downloads</i> : <span data-times-funtouch="">703</span></p>
Here's is my function to scrape it
def phone_data(url):
r = requests.get(url)
sp = BeautifulSoup(r.text, 'lxml')
data = {
"Release_Date" : sp.select_one('i.no-flip-over').text.strip().replace('\n', ' '),
"Downloads" : sp.select_one('i.no-flip-over').text.strip().replace('\n', ' '),
}
print(data)
phone_data('https://www.vivo.com/in/support/upgradePackageData?id=132')
Here's my output:
{'Release_Date': '', 'Downloads': ''}
I am unable to see the values besides the keys in the dictionary
CodePudding user response:
I would use :-soup-contains
to target in addition to the class, as well as remove the span as you need that as the adjacent element. You can use an adjacent sibling combinator to move from the element initially matched by class and :-soup-contains
to the adjacent span.
You then avoid repeating the same info twice and can remove the calls to strip()
and replace()
.
def phone_data(url):
r = requests.get(url)
sp = BeautifulSoup(r.text, 'lxml')
data = {
"Release_Date" : sp.select_one('.no-flip-over:-soup-contains("Release date") span').text,
"Downloads" : sp.select_one('.no-flip-over:-soup-contains("Downloads") span').text,
}
print(data)
phone_data('https://www.vivo.com/in/support/upgradePackageData?id=132')
CodePudding user response:
Solution provided by @QHarr I would also recommend in fact you know exactly about the facts to scrape, so this is just an alternative that comes from the other site and may fits title of the question a bit better
Simply iterate all specs and create a dict with key value pair:
data = dict(e.text.split(' : ',1) for e in sp.select('.msg h1 ~ p:has(i span)'))
Sure you will scrape more as these two facts, but also get a very good overview about all the .keys()
maybe there are some with typos, ... and you can pick an adjust in post processing.
Example
import requests
from bs4 import BeautifulSoup
def phone_data(url):
r = requests.get(url)
sp = BeautifulSoup(r.text, 'lxml')
data = dict(e.text.split(' : ',1) for e in sp.select('.msg h1 ~ p:has(i span)'))
return data
phone_data('https://www.vivo.com/in/support/upgradePackageData?id=132')
{'Release date': '2022-02-25',
'File size': '1.87M',
'Downloads': '3545',
'Support system': 'Windows'}