Home > Software design >  Isolate scrape with Beautifulsoup & selenium
Isolate scrape with Beautifulsoup & selenium

Time:07-07

I am trying to grab a "master_key" from an href. Could someone please help me isolate the key out of the text? Ideally, I would be able to run a find_all and get all the ones available. Thanks!!

from bs4 import BeautifulSoup as bs
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.firefox.options import Options
import requests

options = Options()
options.headless = False
driver = webdriver.Firefox(options=options)

driver.get("https://annual.asaecenter.org/expo.cfm?")

driver.find_element(By. XPATH, "//*[@id='clickAgreeCookie']").click()

driver.find_element(By. XPATH, "//*[@id='search_table']/tbody/tr[7]/td[2]/input[2]").click()

page = bs(driver.page_source, 'html.parser')

key = page.find("a", href="javascript:void(0)")

print(key)

Output:

<a href="javascript:void(0)" onclick="javascript:ExhibitorPopup('profile.cfm?profile_name=exhibitor&amp;master_key=F4DAF300-9A26-4024-9E49-3E29116E5A36&amp;inv_mast_key=93A17E5D-A46F-F21E-77E4-77B38A3B30EE&amp;xtemplate','exhibitor_profile');;analytics(1,'','F4DAF300-9A26-4024-9E49-3E29116E5A36')">108 Ideaspace</a>

Desired output:

F4DAF300-9A26-4024-9E49-3E29116E5A36

CodePudding user response:

You'll have to use regex. using the built-in re module. For example:

import re

key = """<a href="javascript:void(0)" onclick="javascript:ExhibitorPopup('profile.cfm?profile_name=exhibitor&amp;master_key=F4DAF300-9A26-4024-9E49-3E29116E5A36&amp;inv_mast_key=93A17E5D-A46F-F21E-77E4-77B38A3B30EE&amp;xtemplate','exhibitor_profile');;analytics(1,'','F4DAF300-9A26-4024-9E49-3E29116E5A36')">108 Ideaspace</a>"""

print(re.search("master_key=(.*?)&amp", key).group(1))

This searches for all the text between "master_key=" and "&amp". In your case, you'll need to use:

key = page.find("a", href="javascript:void(0)")
print(re.search("master_key=(.*?)&amp", str(key)).group(1))

Output:

F4DAF300-9A26-4024-9E49-3E29116E5A36
  • Related