Home > Net >  Pyton, Selenium: I need to collect urls but there no a tags in element
Pyton, Selenium: I need to collect urls but there no a tags in element

Time:05-19

Good day, guys. I have a task to collect Name and Email for person from this site: https://www.espeakers.com/s/nsas/search?available_on=&awards&budget=0,10&bureau_id=304&distance=1000&fee=false&items_per_page=3701&language=en&location=&norecord=false&nt=0&page=0&presenter_type=&q=[]&require&review=false&sort=speakername&video=false&virtual=false

I use selenium and python to scrape it, but I have a problem with accessing an url for people. The sample structure of person card is:

    <div >
   <div  id="sid12026">
    <div  style='background-image: url("https://streamer.espeakers.com/assets/6/12026/159445.jpg"); background-size: contain;'>
     <div >
      <div >
      </div>
      <div >
       <i >
       </i>
      </div>
     </div>
    </div>
    <div >
     <div >
      Alex Aanderud
     </div>
     <div  style="margin-top: 15px;">
      <div >
       <div >
        <i >
        </i>
        AZ
        <span>
         ,
        </span>
        US
       </div>
      </div>
      <div >
       <div >
       </div>
      </div>
     </div>
     <div >
      <p>
      </p>
      <div>
       Certified Trainer of Advanced Integrative Psychology and Certified John Maxwell Speaker, Trainer, Coach, will transform your organization and improve your results.
      </div>
     </div>
     <div >
      <div >
      </div>
     </div>
     <div >
      <div >
       <div >
        <div >
         <span >
          View Profile
         </span>
         <span >
          Profile
         </span>
        </div>
       </div>
      </div>
     </div>
    </div>
   </div>
  </div>

And the when you click on

<span >
      View Profile
</span>

It moves you to page with person info where I can access it. How I can use selenium to do this, or there are others solutions that can help me. Thanks!

CodePudding user response:

If you notice, all the profile urls are of the form

https://www.espeakers.com/s/nsas/profile/id

where id is a 5 digits number such as 27397. So you just need to extract the id and concatenate it with the base url to obtain the profile url.

url = 'https://www.espeakers.com/s/nsas/profile/'
profile_urls = [url   el.get_attribute('id')[3:] for el in driver.find_elements(By.CSS_SELECTOR, '.speaker-tile')]
names = [el.text for el in driver.find_elements(By.CSS_SELECTOR, '.speaker-name')]

names is a list containing all the names, urls is a list containing the corresponding profile urls

  • Related