Selenium (Python)- Webscraping verb-conjugation tables (Accessing web elements underneath '#doc-CodePudding

Section 0: Introduction:

This is my first webscraping project and I am not experienced in using selenium . I am trying to scrape arabic verb-conjugation tables from the website:

After this, I have to click the 'Generate Sarf Table' button.

Section 2: My Attempt:

Here is my code:

#------------------ Just Setting Up the web_driver:
s = Service('/usr/local/bin/chromedriver')
# Set some selenium chrome options:
chromeOptions = Options()
# chromeOptions.headless = False
driver = webdriver.Chrome(service=s, options=chromeOptions)
driver.get('https://sites.google.com/view/sarfgenerator/home')

# I switch the frame once:
iframe = driver.find_elements(by=By.CSS_SELECTOR, value='iframe')[0]
driver.switch_to.frame(iframe)
# I switch the frame again:
iframe = driver.find_elements(by=By.CSS_SELECTOR, value='iframe')[0]
driver.switch_to.frame(iframe)

This takes me to the frame within which the webelements that I need are located.

Now, I print the html to see where I am at:

print(BeautifulSoup(driver.execute_script("return document.body.innerHTML;"),'html.parser'))

Here is the output that I get:

<iframe frameborder="0" id="userHtmlFrame" scrolling="yes">
</iframe>
<script>function loadGapi(){var loaderScript=document.createElement('script');loaderScript.setAttribute('src','https://apis.google.com/js/api.js?checkCookie=1');loaderScript.onload=function(){this.onload=function(){};loadGapiClient();};loaderScript.onreadystatechange=function(){if(this.readyState==='complete'){this.onload();}};(document.head||document.body||document.documentElement).appendChild(loaderScript);}function updateUserHtmlFrame(userHtml,enableInteraction,forceIosScrolling){var frame=document.getElementById('userHtmlFrame');if(enableInteraction){if(forceIosScrolling){var iframeParent=frame.parentElement;iframeParent.classList.add('forceIosScrolling');}else{frame.style.overflow='auto';}}else{frame.setAttribute('scrolling','no');frame.style.pointerEvents='none';}clearCookies();clearStorage();frame.contentWindow.document.open();frame.contentWindow.document.write('<base target="_blank">' userHtml);frame.contentWindow.document.close();}function onGapiInitialized(){gapi.rpc.call('..','innerFrameGapiInitialized');gapi.rpc.register('updateUserHtmlFrame',updateUserHtmlFrame);}function loadGapiClient(){gapi.load('gapi.rpc',onGapiInitialized);}if(document.readyState=='complete'){loadGapi();}else{self.addEventListener('load',loadGapi);}function clearCookies(){var cookies=document.cookie.split(";");for(var i=0;i<cookies.length;i  ){var cookie=cookies[i];var equalPosition=cookie.indexOf("=");var name=equalPosition>-1?cookie.substr(0,equalPosition):cookie;document.cookie=name "=;expires=Thu, 01 Jan 1970 00:00:00 GMT";document.cookie=name "=;expires=Thu, 01 Jan 1970 00:00:01 GMT ;domain=.googleusercontent.com";}}function clearStorage(){try{localStorage.clear();sessionStorage.clear();}catch(e){}}</script>

However, the actual html on the website looks like this:

Section 3: The main problem with my approach:

I am unable to access the anything #document contained within the iframe.

Section 4: Conclusion:

Is there a possible solution that can fix my current approach to the problem?
Is there any other way to solve the problem described in Section 1?

CodePudding user response：

You put a lot of effort into structuring your question, so I couldn't not answer it, even if it meant double negation. Here is how you can drill down into the iframe with content: EDIT: here is how you can select some options, click the button and access the results:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select

chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument('disable-notifications')
chrome_options.add_argument("window-size=1280,720")

webdriver_service = Service("chromedriver_linux64/chromedriver") ## path to where you saved chromedriver binary
driver = webdriver.Chrome(service=webdriver_service, options=chrome_options)
wait = WebDriverWait(driver, 25)

url = 'https://sites.google.com/view/sarfgenerator/home'
driver.get(url)
wait.until(EC.frame_to_be_available_and_switch_to_it((By.XPATH, '//*[@aria-label="Custom embed"]')))
wait.until(EC.frame_to_be_available_and_switch_to_it((By.XPATH, '//*[@id="innerFrame"]')))
wait.until(EC.frame_to_be_available_and_switch_to_it((By.XPATH, '//*[@id="userHtmlFrame"]')))

first_select = Select(wait.until(EC.element_to_be_clickable((By.XPATH, '//select[@id="root1"]'))))
second_select = Select(wait.until(EC.element_to_be_clickable((By.XPATH, '//select[@id="root2"]'))))
third_select = Select(wait.until(EC.element_to_be_clickable((By.XPATH, '//select[@id="root3"]'))))

first_select.select_by_visible_text("ج")
second_select.select_by_visible_text("ت")
third_select.select_by_visible_text("ص")

wait.until(EC.element_to_be_clickable((By.XPATH, ('//button[@onclick="sarfGenerator(false)"]')))).click()
print('clicked')

result = wait.until(EC.presence_of_element_located((By.XPATH, '//p[@id="demo"]')))
print(result.text)

Result printed in terminal:

clicked
جَتَّصَ      يُجَتِّصُ      تَجتِيصًا      مُجَتِّصٌ
جُتِّصَ      يُجَتَّصُ      تَجتِيصًا      مُجَتَّصٌ
جَتِّصْ       لا تُجَتِّصْ      مُجَتَّصٌ          Highlight Root Letters

Selenium setup is for Linux, you just have to observe the imports, and the part after defining the driver. Selenium documentation can be found here.