My goal is to be able to scrape definitions of words in python.
To begin with, I am trying to get just the first definition of the word "assist" which should be "to help". I am using dictionary.cambridge.org
//web driver goes to page
driver.get("https://dictionary.cambridge.org/dictionary/english/assist")
//to give time for the page to load
time.sleep(4)
//click "accept cookies"
driver.find_element_by_xpath("/html[@class='i-amphtml-singledoc i-amphtml-standalone']/body[@class='break default_layout amp-mode-mouse']/div[@id='onetrust-consent-sdk']/div[@id='onetrust-banner-sdk']/div[@class='ot-sdk-container']/div[@class='ot-sdk-row']/div[@id='onetrust-button-group-parent']/div[@id='onetrust-button-group']/div[@class='banner-actions-container']/button[@id='onetrust-accept-btn-handler']").click()
Up this point, everything is working correctly. However, when I try to print the first definition using "find element by xpath", I get a NoSuchElementException. I'm pretty familiar with selenium and have scraped web stuff hundreds of times before but on this webpage, I don't know what I'm doing wrong. Here's the code I am using:
print(driver.find_element_by_xpath("/html[@class='i-amphtml-singledoc i-amphtml-standalone']/body[@class='break default_layout amp-mode-mouse']/div[@class='cc fon']/div[@class='pr cc_pgwn']/div[@class='x lpl-10 lpr-10 lpt-10 lpb-25 lmax lp-m_l-20 lp-m_r-20']/div[@class='hfr-m ltab lp-m_l-15']/article[@id='page-content']/div[@class='page']/div[@class='pr dictionary'][1]/div[@class='link']/div[@class='pr di superentry']/div[@class='di-body']/div[@class='entry']/div[@class='entry-body']/div[@class='pr entry-body__el'][1]/div[@class='pos-body']/div[@class='pr dsense dsense-noh']/div[@class='sense-body dsense_b']/div[@class='def-block ddef_block ']/div[@class='ddef_h']/div[@class='def ddef_d db']").text())
CodePudding user response:
Instead of Absolute xpath, opt for Relative xpaths. You can refer this link
Tried with below code and it retrieved the data.
driver.get("https://dictionary.cambridge.org/dictionary/english/assist")
print(driver.find_element_by_xpath("(//div[@class='ddef_h'])[1]/div").get_attribute("innerText"))
to help:
CodePudding user response:
To print the scrape definitions of words you can use either of the following Locator Strategies:
Using
xpath
and text attribute:print(driver.find_element_by_xpath("//span[contains(@class, 'epp-xref dxref')]//following::div[1]").text)
Using
xpath
and innerText:print(driver.find_element_by_xpath("//span[contains(@class, 'epp-xref dxref')]//following::div[1]").get_attribute("innerText"))
Console Output:
to help: