selenium python iframe get https requests-CodePudding

I'm using selenium with python, I want to get all https requests from iframe element. here I get the iframe element and after I select a row from table and press button, http post request will start.

part of my code

define chrome driver

 chrome_options = Options()
chrome_options.add_experimental_option("detach", True)
chrome_options.add_argument("--start-maximized")
chrome_options.add_argument('--auto-open-devtools-for-tabs')
chrome_options.add_argument('--log-level=2')
chrome_options.add_argument('--disable-features=IsolateOrigins,site-per-process')
chrome_options.add_experimental_option('excludeSwitches', ['enable-logging'])


capabilities = DesiredCapabilities.CHROME
capabilities["goog:loggingPrefs"] = {"performance": "ALL",'browser':'ALL','server':'ALL'}  # chromedriver 75 
capabilities["goog:chromeOptions"] = {"w3c": "false","args": "%w[headless window-size=1280,800]"}  # chromedriver 75 
capabilities['acceptSslCerts'] = True
capabilities['acceptInsecureCerts'] = True
capabilities['PageLoadStrategy'] = None

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options,desired_capabilities=capabilities)
driver.get(os.environ['URL'])

get iframe element and click on row table

  WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH, '//*[@id="myiframe"]')))
row_selector = '//*[@id="root"]/div/div/div/div[2]/div[2]/div/div/div/div/table/tbody/tr[1]/th[3]'
row_selector_clickable = '//*[@id="root"]/div/div/div/div[2]/div[2]/div/div/div/div/table/tbody/tr[1]/th[3]/div/div/div[2]/div/button'

WebDriverWait(driver, 30).until(EC.visibility_of_element_located((By.XPATH, row_selector)))
actions = ActionChains(driver)
row_element = driver.find_element(By.XPATH, row_selector)
row_clickable = driver.find_element(By.XPATH, row_selector_clickable)
actions.move_to_element(row_element)
actions.click(row_clickable)
actions.perform()

then here, I get all http post requests and write them to file

 logs = driver.get_log("performance")
def process_browser_logs_for_network_events(logs):
    """
 Return only logs which have a method that start with "Network.response", "Network.request", or "Network.webSocket"
 since we're interested in the network events specifically.
 """
    for entry in logs:
         log = json.loads(entry["message"])["message"]
         yield log


events = process_browser_logs_for_network_events(logs)
li = []
with open(f"log_entries-{datetime.datetime.now()}.txt", "wt") as out:
    for event in events:
        print(event)
        if 'method' in event.get('params', {}).get('request', {}) and event.get('params', {}).get('request',
                                                                                                  {}).get('method',
                                                                                                          '') == 'POST':
            li.append(event)


    out.write(json.dumps(li))

but the issue is that it shows me requests from the first page I guess, even If I switch to iframe and it select me the right elements from iframe. the flow is this way: I make login to website then I redirect to main page and then I click on button and it open new tab and there I have the iframe, I switch the iframe, press on row on table and there is http request that take 5-10 seconds (in this time is pending status) when it success it make redirect to gmail website and the http request is disappeared because the redirect so I tried to add preserve logs but still.

I can't expose the https requests because it's of my job, but what I'm seeing is requests from the first page and not from the current iframe..

CodePudding user response：

Ok, I'll try to be as clear as I can: selenium setup below is linux/selenium/chrome, you can adapt it to your own, just observe the imports, and the code after defining the browser/driver.

For intercepting browser's requests I used selenium-wire: https://pypi.org/project/selenium-wire/

If you prefer, you can use native selenium request intercepting.

I looked around for an example website containing an iframe with which you interact with, i.e. click a button (OP should have made the legwork and provide such example, but anyway).

Code:

from seleniumwire import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
import time as t
from datetime import datetime

chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("window-size=1280,720")
webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)

url = 'https://fasecolda.com/ramos/automoviles/historial-de-accidentes-de-vehiculos-asegurados/'
browser.get(url)
for x in browser.requests:
    print('URL:', x.url)
    print('ORIGIN:', x.headers['origin'])
    print('HOST:', x.headers['Host'])
    print('SEC-FETCH-DEST:', x.headers['sec-fetch-dest'])
    print('TIMESTAMP:', x.date)
    print('______________________________________')

WebDriverWait(browser, 20).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH, "//*[@title='Términos']")))
t.sleep(3)
print('switched to iframe')
button = WebDriverWait(browser,5).until(EC.element_to_be_clickable((By.XPATH, '//*[text()="Acepto los Términos y Condiciones"]')))
print('located the button, bringing it into view')
button.location_once_scrolled_into_view
print('now waiting 30 seconds, for clear separation of requests')
t.sleep(30)
print('printing last request again:')
print('URL:', browser.last_request.url)
print('ORIGIN:', browser.last_request.headers['origin'])
print('HOST:', browser.last_request.headers['Host'])
print('SEC-FETCH-DEST:', browser.last_request.headers['sec-fetch-dest'])
print('TIMESTAMP:', browser.last_request.date)
last_date = browser.last_request.date

print('clicked the button at', datetime.now())
button.click()
print('waiting another 30 seconds')
t.sleep(30)
print('and now printing the requests again (the ones after interacting with iframe)')
for x in browser.requests:
    if x.date > last_date:
        print('URL:', x.url)
        print('ORIGIN:', x.headers['origin'])
        print('HOST:', x.headers['Host'])
        print('SEC-FETCH-DEST:', x.headers['sec-fetch-dest'])
        print('TIMESTAMP:', x.date)
        print('______________________________________')

As you can see, it's pretty straightforward:

go to website
print requests made (url, origin, host, sec-fetch-dest and timestamp)
locate the iframe and switch to it
locate the button you want to click on, and bring it into view
waiting 30 seconds, for any eventual requests made by JS in page
after 30 seconds, printing last request made (url, origin, host, sec-fetch-dest and timestamp) - also saving the timestamp into a variable, to be able to filter subsequent requests
clicking the button and registering the timestamp when we clicked it
waiting another 30 seconds, just to make sure all requests were performed
printing the requests made after the timestamp variable saved previously

The result in terminal:

[...]
______________________________________
URL: https://fonts.gstatic.com/s/roboto/v30/KFOlCnqEu92Fr1MmEU9fBBc4.woff2
ORIGIN: https://siniestroshava.com.co
HOST: None
SEC-FETCH-DEST: font
TIMESTAMP: 2022-10-08 21:44:44.794670
______________________________________
switched to iframe
located the button, bringing it into view
now waiting 30 seconds, for clear separation of requests
printing last request again:
URL: https://optimizationguide-pa.googleapis.com/v1:GetModels?key=AIzaSyCkfPOPZXDKNn8hhgu3JrA62wIgC93d44k
ORIGIN: None
HOST: None
SEC-FETCH-DEST: empty
TIMESTAMP: 2022-10-08 21:44:57.413952
clicked the button at 2022-10-08 21:45:19.036690
waiting another 30 seconds
and now printing the requests again (the ones after interacting with iframe)
URL: https://siniestroshava.com.co/hava/Seguridad/SolicitarCorreo
ORIGIN: None
HOST: siniestroshava.com.co
SEC-FETCH-DEST: iframe
TIMESTAMP: 2022-10-08 21:45:19.209288
______________________________________
URL: https://siniestroshava.com.co/hava/css/hava/estiloslocales.css
ORIGIN: None
HOST: siniestroshava.com.co
SEC-FETCH-DEST: style
TIMESTAMP: 2022-10-08 21:45:19.633076
______________________________________
URL: https://siniestroshava.com.co/hava/css/vendor.css?v=U1BT8Ls9ntdpDS12L5xpMjmSP3Eitncl_SyDnU5LLHk
ORIGIN: None
HOST: siniestroshava.com.co
SEC-FETCH-DEST: style
TIMESTAMP: 2022-10-08 21:45:19.645382
______________________________________
URL: https://siniestroshava.com.co/hava/css/devextreme/dx.material.hava.css
ORIGIN: None
HOST: siniestroshava.com.co
SEC-FETCH-DEST: style
TIMESTAMP: 2022-10-08 21:45:19.646197
______________________________________
   [...]
    ______________________________________

As you can see, the main website is https://fasecolda.com, and iframe src is https://siniestroshava.com.co/. You can clearly observe all the requests made since loading the original page (I didn't post them all, too many), you can see the last request made before interacting with the iframe, the timestamp of interacting with iframe, and the subsequent request - the first one made having SEC-FETCH-DEST: iframe - obviously the request made by the iframe, due to us clicking the button. Also host and origin are relevant header keys, if they are present.

This is a method to isolate the requests made from the iframe, as opposed to the ones made from main page.

I believe this should answer your question as asked.