Home > Mobile >  Is there a way to web scrape more results, using the sites own dropdown menu?
Is there a way to web scrape more results, using the sites own dropdown menu?

Time:10-18

I am currently using python to scrape this site, with thousands of pages and it is doing fine, but it takes a couple of hours to go through all the pages in parts (because I have a short delay between each page which I believe is fair to the provider of the site.) However on the real site there is a dropdown menu with an option to display more results on the page. In the HTML that looks like this:

<div class="page-sizer">
    <select id="itemsPerPage" class="form-control input-sm">
            <option value="10" selected>10</option>
            <option value="50" >50</option>
            <option value="200" >200</option>
    </select>
</div>
<script>
    $(document).on('bb:ready', function () {
        var pageSizeOptions = {
            setPageSizeUrl: '/Pager/SetPageSize'
        };

        ScrapeThisWebsite.PageSize.init(pageSizeOptions);
    });
</script>

Is there any way for me to automatically display the 200 results per page instead of only 10 and save some time for the provider and me? The selection does not show in the link. So, if I copy the page-link to another browser, it returns to the default.

I'm going through the pages using the following simple steps:

myheaders = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36', "Upgrade-Insecure-Requests": "1","DNT": "1","Accept": "text/html,application/xhtml xml,application/xml;q=0.9,*/*;q=0.8","Accept-Language": "en-US,en;q=0.5","Accept-Encoding": "gzip, deflate"}
    
page = requests.get(url,headers=myheaders)

Is it linked to how the page is loaded?

CodePudding user response:

You can use selenium library to interact with the dropdown. Also, might be worth checking if there is an API from which you could fetch data directly. To see it inspect the page, go to Network tab and see Fetch/XHR, if API is there you could fetch data using requests library.

Here is how to select the value in the dropdown using selenium. More on select in the docs.

from selenium import webdriver
from selenium.webdriver.support.ui import Select

driver = webdriver.Chrome('/Users/username/chromedriver') # here is the path where your web driver is

#get the website
driver.get('https://yourwebsite.com')

# Get the element by ID
dropdown= Select(driver.find_element_by_id('itemsPerPage'))

#Click on the dropdown
dropdown.select_by_value('200')

  • Related