Python Selenium: close all instances of webdriver-CodePudding

I am working on this browser automation project that performs some browser tasks in parallel. The idea is to:

open four browsers
do some tasks
wait for all browsers to finish with the tasks before we close all browsers

Here's a simple web driver function for demo purposes.

# For initializing webdriver
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options

def initialize_driver(starting_url: str = 'https://www.google.com/'):
    ''' Open a webdriver and go to Google
    '''
    # Webdriver option(s): keep webdriver opened
    chrome_options = Options()
    chrome_options.add_experimental_option("detach", True) 

    # Initialize webdriver
    driver = webdriver.Chrome(
         service=Service(ChromeDriverManager().install()), 
         options=chrome_options)
    
    # Open website; wait until fully loaded
    driver.get(starting_url)
    driver.implicitly_wait(10)
    time.sleep(1)

    return driver

Using this function, I can now create four jobs that will run in parallel using multiprocessing.

# Import package
import multiprocessing as mp

# List of workers
workers = []

# Run in parallel
for _ in range(4):
    worker = mp.Process(target=phm2.worker_bot_test)
    worker.start()
    workers.append(worker)

for worker in workers:
    worker.join()

These already covered the first two points, but as far as I know, we can only close a webdriver at a time using driver.close(). Is there a way that we can close them all at once? I actually tried creating a list of webdrivers and appending it with a webdriver at the end of the function. Then, close them one by one. But for some reason, it isn't working.

# I added drivers.append(driver) at the end of the function from earlier
# This will now be a global variable to store the list of drivers
drivers = []

# Insert multiprocessing code here...

# Close all drivers
for driver in drivers:
   driver.close()

What could I possibly try to do to achieve the last step? I've been seeing that we can tweak the Process class to include return values (having return values would be a big help), but, as much as possible, I don't want to do that since it's kinda complex.

CodePudding user response：

Each webdriver object is absolutely independent object instance.
In the same way as when you applying f.e. get() method on some specific webdriver object this has no influence on any other webdriver object, similarly when you applying quit() or close() on some webdriver object this will absolutely no influence on any other webdriver object.
So, the only way to close ALL your webdriver sessions is to keep all the webdriver object in some structure, like list etc.
And when you will need to close all the sessions is to iterate over that list and apply driver.quit() on each and every objects in that list.
BTW, in order to clearly close the session you should use quit() method, not close().

CodePudding user response：

I would first observe that since the selenium driver is already running as a child process you only really need to use multithreading. I am assuming that any work done by your threads after a web page and its elements have been retrieved is not particularly CPU-intensive. If this is not the case you can always create a multiprocessing pool that is passed to the worker_bot_test worker function for executing any CPU-intensive operations in parallel.

By using threads we can create a class that creates the driver and has a __del__ finalizer that "quits" the driver when the class instance is garbage collected. We keep a reference to that class instance in thread local storage so that the finalizer is only called when the thread terminates and thread local storage is garbage collected. To ensure this garbage collection we can explicitly call gc.collect after the child threads terminate. If we were using multiprocessing instead of multithreading, this call to gc.collect would have no effect because it only garbage collects the current process.

# For initializing webdriver
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options

import threading

class ChromeDriver:
    def __init__(self, starting_url):
        chrome_options = Options()
        chrome_options.add_experimental_option("detach", True)
        # Not a bad option to add:
        #chrome_options.add_experimental_option('excludeSwitches', ['enable-logging'])
        # If we don't need to see the browsers:
        #chrome_options.add_argument("headless")

        # Initialize webdriver
        self.driver = webdriver.Chrome(
             service=Service(ChromeDriverManager().install()),
             options=chrome_options)

        # Open website; wait until fully loaded
        self.driver.get(starting_url)
        self.driver.implicitly_wait(10)
        # What is the purpose of the following line?
        #time.sleep(1)

    def __del__(self):
        self.driver.quit() # clean up driver when we are cleaned up
        print('The driver has been "quitted".')

threadLocal = threading.local()

def initialize_driver(starting_url: str = 'https://www.google.com/'):
    chrome_driver =  ChromeDriver(starting_url)
    # Make sure there is a reference to the ChromeDriver instance so that
    # it is not prematurely finalized:
    threadLocal.driver = chrome_driver
    return chrome_driver.driver

def worker_bot_test():
    driver = initialize_driver()
    print(len(driver.page_source))


if __name__ == '__main__':
    # List of workers
    workers = []

    # Run in parallel
    for _ in range(4):
        worker = threading.Thread(target=worker_bot_test)
        worker.start()
        workers.append(worker)

    for worker in workers:
        worker.join()

    # Ensure finalizers are executed:
    import gc
    gc.collect()

Prints:

...
163036
163050
163183
165486
The driver has been "quitted".
The driver has been "quitted".
The driver has been "quitted".
The driver has been "quitted".