Good day, guys. I have a task to collect Name and Email for person from this site: https://www.espeakers.com/s/nsas/search?available_on=&awards&budget=0,10&bureau_id=304&distance=1000&fee=false&items_per_page=3701&language=en&location=&norecord=false&nt=0&page=0&presenter_type=&q=[]&require&review=false&sort=speakername&video=false&virtual=false
I use selenium and python to scrape it, but I have a problem with accessing an url for people. The sample structure of person card is:
<div >
<div id="sid12026">
<div style='background-image: url("https://streamer.espeakers.com/assets/6/12026/159445.jpg"); background-size: contain;'>
<div >
<div >
</div>
<div >
<i >
</i>
</div>
</div>
</div>
<div >
<div >
Alex Aanderud
</div>
<div style="margin-top: 15px;">
<div >
<div >
<i >
</i>
AZ
<span>
,
</span>
US
</div>
</div>
<div >
<div >
</div>
</div>
</div>
<div >
<p>
</p>
<div>
Certified Trainer of Advanced Integrative Psychology and Certified John Maxwell Speaker, Trainer, Coach, will transform your organization and improve your results.
</div>
</div>
<div >
<div >
</div>
</div>
<div >
<div >
<div >
<div >
<span >
View Profile
</span>
<span >
Profile
</span>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
And the when you click on
<span >
View Profile
</span>
It moves you to page with person info where I can access it. How I can use selenium to do this, or there are others solutions that can help me. Thanks!
CodePudding user response:
If you notice, all the profile urls are of the form
https://www.espeakers.com/s/nsas/profile/id
where id
is a 5 digits number such as 27397. So you just need to extract the id and concatenate it with the base url to obtain the profile url.
url = 'https://www.espeakers.com/s/nsas/profile/'
profile_urls = [url el.get_attribute('id')[3:] for el in driver.find_elements(By.CSS_SELECTOR, '.speaker-tile')]
names = [el.text for el in driver.find_elements(By.CSS_SELECTOR, '.speaker-name')]
names
is a list containing all the names, urls
is a list containing the corresponding profile urls