I am trying to grab a "master_key" from an href. Could someone please help me isolate the key out of the text? Ideally, I would be able to run a find_all and get all the ones available. Thanks!!
from bs4 import BeautifulSoup as bs
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.firefox.options import Options
import requests
options = Options()
options.headless = False
driver = webdriver.Firefox(options=options)
driver.get("https://annual.asaecenter.org/expo.cfm?")
driver.find_element(By. XPATH, "//*[@id='clickAgreeCookie']").click()
driver.find_element(By. XPATH, "//*[@id='search_table']/tbody/tr[7]/td[2]/input[2]").click()
page = bs(driver.page_source, 'html.parser')
key = page.find("a", href="javascript:void(0)")
print(key)
Output:
<a href="javascript:void(0)" onclick="javascript:ExhibitorPopup('profile.cfm?profile_name=exhibitor&master_key=F4DAF300-9A26-4024-9E49-3E29116E5A36&inv_mast_key=93A17E5D-A46F-F21E-77E4-77B38A3B30EE&xtemplate','exhibitor_profile');;analytics(1,'','F4DAF300-9A26-4024-9E49-3E29116E5A36')">108 Ideaspace</a>
Desired output:
F4DAF300-9A26-4024-9E49-3E29116E5A36
CodePudding user response:
You'll have to use regex. using the built-in re module. For example:
import re
key = """<a href="javascript:void(0)" onclick="javascript:ExhibitorPopup('profile.cfm?profile_name=exhibitor&master_key=F4DAF300-9A26-4024-9E49-3E29116E5A36&inv_mast_key=93A17E5D-A46F-F21E-77E4-77B38A3B30EE&xtemplate','exhibitor_profile');;analytics(1,'','F4DAF300-9A26-4024-9E49-3E29116E5A36')">108 Ideaspace</a>"""
print(re.search("master_key=(.*?)&", key).group(1))
This searches for all the text between "master_key=" and "&". In your case, you'll need to use:
key = page.find("a", href="javascript:void(0)")
print(re.search("master_key=(.*?)&", str(key)).group(1))
Output:
F4DAF300-9A26-4024-9E49-3E29116E5A36