Scraping prices with BeautifulSoup4 in Python3 Udemy Website-CodePudding

I am trying to extract the price data as well as the number of students from an Udemy website. I am on windows and I am using Python 3.8 and BeautifoulSoup in a conda environment.

This is my code:

url = 'https://www.udemy.com/course/business-analysis-conduct-a-strategy-analysis/'
html = requests.get(url).content
bs = BeautifulSoup(html, 'lxml')
searchingprice = bs.find('div', {'class':'price-text--price-part--2npPm udlite-clp-discount-price udlite-heading-xxl','data-purpose':'course-price-text'})
searchingstudents = bs.find('div', {'class':'','data-purpose':'enrollment'})
print(searchingprice)
print(searchingstudents)

And I only get info about students, not prices. What I am doing wrong?

None
<div class="" data-purpose="enrollment">
13,490 students
</div>

Here a screenshot about the website:

Thanks!

CodePudding user response：

html = """<div  
data-purpose="price-text-container"><div  
data-purpose="course-price-text">
<span >Current price</span>
<span><span>$14.99</span></span></div>
<div  data-purpose="original-price-container">
<div data-purpose="course-old-price-text"><span >Original Price</span>
<span><s><span>$99.99</span></s></span></div></div>
<div 
data-purpose="discount-percentage"><span >Discount</span><span>85% off</span>
</div></div>"""

soup = BeautifulSoup(html, 'lxml')
# find the children of the main div class
lst = soup.find('div', class_='price-text--container--103D9 udlite-clp-price-text').findChildren('span')
# list comprehension to find the span text that starts with $ and keep the first element
print([span.text for span in lst if span.text.startswith('$')][0])  # -> '$14.99'

CodePudding user response：

the price is not in the source, it's fetched with javascript. we'll have to take the same steps. this code goes after your own, bs is already loaded

# get id of the course
course_id=bs.body.attrs['data-clp-course-id']
# build proper request, feel free to delete unneeded data requests
link=f'https://www.udemy.com/api-2.0/pricing/?course_ids={course_id}&fields[pricing_result]=price,discount_price,list_price,price_detail,price_serve_tracking_id'
# fetch the data
res=requests.get(link).json()
print(res)
>>> {'courses': {'1596446': {'_class': 'pricing_result', 'price_serve_tracking_id': 'rbNYz3yCSiS2G1J62gtSzg', 'price': {'amount': 16.99, 'currency': 'EUR', 'price_string': '€16.99', 'currency_symbol': '€'}, 'list_price': {'amount': 119.99, 'currency': 'EUR', 'price_string': '€119.99', 'currency_symbol': '€'}, 'discount_price': {'amount': 17.0, 'currency': 'EUR', 'price_string': '€17', 'currency_symbol': '€'}, 'price_detail': {'amount': 119.99, 'currency': 'EUR', 'price_string': '€119.99', 'currency_symbol': '€'}}}, 'bundles': {}}