Home > Software design >  scraping a tab in JS website without having to click on the tab
scraping a tab in JS website without having to click on the tab

Time:10-22

I am trying to scrape this website: https://www.casablanca-bourse.com/bourseweb/Societe-Cote.aspx?codeValeur=12200 the problem is i only want extract data from the tab "indicateurs clès" and i can't find a way to access to it in code source without clicking on it. Indeed, i can't figure out the URL of this specific tab... i checked the code source and i found that there's a generated code that changed whenver i clicked on that tab Any suggestions?

Thanks in advance

CodePudding user response:

The problem is that this website uses AJAX to get the table in the "Indicateurs Clès", so it is requested from the server only when you click on the tab. To scrape the data, you should send the same request to the server. In other words, try to mimic the browser's behavior.

You can do it this way (for Chromium; for other browsers with DevTools it's pretty much similar):

  1. Press F12 to open the DevTools.
  2. Switch to the "Network" tab.
  3. Select Fetch/XHR filter.
  4. Click on the "Indicateurs Clès" tab on the page.
  5. Inspect the new request(s) you see in the DevTools.
  6. Once you find the request that returns the information you need ("Preview" and "Response"), right-click the request and select "Copy as cURL".
  7. Go to https://curl.trillworks.com/
  8. Select the programming language you're using for scraping
  9. Paste the cURL to the left (into the "curl command" textarea).
  10. Copy the code that appeared on the right and work with it. In some cases, you might need to inspect the request further and modify it.

In this particular case, the request data contains `__VIEWSTATE` and other info, which is used by the server to send only the data necessary to update the already existing table.

At the same time, you can omit everything but the __EVENTTARGET (the tab ID) and codeValeur. In such a case the server will return page XHTML, which includes the whole table. After that, you can parse that table and get all you need.

I don't know what tech stack you were initially going to use for scraping the website, but here is how you can get the tables with Python requests and BeautifulSoup4:

import requests
from bs4 import BeautifulSoup


params = (
    ('codeValeur', '12200'),
)

data = {
  '__EVENTTARGET': 'SocieteCotee1$LBFicheTech',
}

response = requests.post('https://www.casablanca-bourse.com/bourseweb/Societe-Cote.aspx', params=params, data=data)
soup = BeautifulSoup(response.content)

# Parse XHTML to get the data you exactly need
  • Related