Scraping data from a website in which a lot of text was hidden under the "see more" tab.
Via selenium
click all such buttons and then scrape using beautifulsoup
. However, a few of the buttons have extra whitespaces in their HTML tags. copying and pasting them to browser.find_element_by_class_name('')
always yields an error.
Notice how there's extra white space behind artdeco, could anyone help me with this please? Manually adding those spaces or putting them in same line doesn't do anything.
CodePudding user response:
As @HedgeHog mentioned in the answer, you won't be able to pass multiple classnames within
driver.find_element(By.CLASS_NAME, "classA classB classC classD")
as it may raise invalid selector.
Moreover as you find multiple classnames pv-profile-section__see-more-inline
, pv-profile-section__text-truncate-toggle
, artdeco-button--tertiary
, etc with extra whitespaces an ideal approach would be to consider a unique classname e.g. pv-profile-section__see-more-inline which appears unique to functionality of the given element, i.e. see more and you can use either of the following Locator Strategies:
Using
class_name
:browser.find_element(By.CLASS_NAME, "classname")
Using
css_selector
:browser.find_element(By.CSS_SELECTOR, "input.pv-profile-section__see-more-inline")
Using
xpath
:browser.find_element(By.XPATH, "input[@class='pv-profile-section__see-more-inline']")
CodePudding user response:
Note: You won't be able to pass multiple classnames as argument through find_element_by_class_name()
cause it only accept a single classname
To find element by multiple classnames try to use css-selectors:
browser.find_element_by_css_selector(".firstClassName.secondClassName")