I am about to get the details of each lawyer in https://chambers.com/all-lawyers-asia-pacific-8. There are about 5k lawyers listed. But their details are listed in the associated links in the site. I don't have a problem scraping a single web page. However, it will take forever for me to visit each lawyer profile page and scrape them individually. Is there a way to loop this process?
I really don't know what to do because I was tasked to get the lawyer's name, the link to their profile, their law firm, and their ranks.
CodePudding user response:
I recommend you to use threading to boost up the process. It is possible that the site will ban you for too many requests. In that case, you should use a different user agent for each thread or make requests via tor or vpn.
CodePudding user response:
Use selenium webdriver
.
With find_elements(by.xpath)
get all href attributes of all records as a list and loop through them in a for each.
In a loop use href value to open details page webdriver.get(href_value)
. That will open page with record details where you scrape needed info.