I am trying to extract the price data as well as the number of students from an Udemy website. I am on windows and I am using Python 3.8 and BeautifoulSoup in a conda environment.
This is my code:
url = 'https://www.udemy.com/course/business-analysis-conduct-a-strategy-analysis/'
html = requests.get(url).content
bs = BeautifulSoup(html, 'lxml')
searchingprice = bs.find('div', {'class':'price-text--price-part--2npPm udlite-clp-discount-price udlite-heading-xxl','data-purpose':'course-price-text'})
searchingstudents = bs.find('div', {'class':'','data-purpose':'enrollment'})
print(searchingprice)
print(searchingstudents)
And I only get info about students, not prices. What I am doing wrong?
None
<div class="" data-purpose="enrollment">
13,490 students
</div>
Here a screenshot about the website:
Thanks!
CodePudding user response:
html = """<div
data-purpose="price-text-container"><div
data-purpose="course-price-text">
<span >Current price</span>
<span><span>$14.99</span></span></div>
<div data-purpose="original-price-container">
<div data-purpose="course-old-price-text"><span >Original Price</span>
<span><s><span>$99.99</span></s></span></div></div>
<div
data-purpose="discount-percentage"><span >Discount</span><span>85% off</span>
</div></div>"""
soup = BeautifulSoup(html, 'lxml')
# find the children of the main div class
lst = soup.find('div', class_='price-text--container--103D9 udlite-clp-price-text').findChildren('span')
# list comprehension to find the span text that starts with $ and keep the first element
print([span.text for span in lst if span.text.startswith('$')][0]) # -> '$14.99'
CodePudding user response:
the price is not in the source, it's fetched with javascript. we'll have to take the same steps. this code goes after your own, bs is already loaded
# get id of the course
course_id=bs.body.attrs['data-clp-course-id']
# build proper request, feel free to delete unneeded data requests
link=f'https://www.udemy.com/api-2.0/pricing/?course_ids={course_id}&fields[pricing_result]=price,discount_price,list_price,price_detail,price_serve_tracking_id'
# fetch the data
res=requests.get(link).json()
print(res)
>>> {'courses': {'1596446': {'_class': 'pricing_result', 'price_serve_tracking_id': 'rbNYz3yCSiS2G1J62gtSzg', 'price': {'amount': 16.99, 'currency': 'EUR', 'price_string': '€16.99', 'currency_symbol': '€'}, 'list_price': {'amount': 119.99, 'currency': 'EUR', 'price_string': '€119.99', 'currency_symbol': '€'}, 'discount_price': {'amount': 17.0, 'currency': 'EUR', 'price_string': '€17', 'currency_symbol': '€'}, 'price_detail': {'amount': 119.99, 'currency': 'EUR', 'price_string': '€119.99', 'currency_symbol': '€'}}}, 'bundles': {}}