I am trying to scrape some data from website, but the data is contained in an Iframe. Initially I scraped the source link but from the source also I am not able to scrape the data. I need help how to extract the data from this source link. Here is the source link: https://chartviewer-europublic.bigapis.net/nzgaV/index.html
Also I am sharing the screenshot here showing the download button url of the data under "a" tag but I am not able to extract this link also.
Here is the code I have used. I have used BeautifulSoup for the scraping.
# Libraries
from bs4 import BeautifulSoup
import requests
# Original website link
url_spain_total="https://anfac.com/cifras-clave/matriculaciones-turismos-y-todoterreno/"
page_total=requests.get(url_spain_total).text
soup_spain_total=BeautifulSoup(page_total,"lxml")
print(soup_spain_total.prettify())
# Getting the list of links in the iframe
result_spain=soup_spain_total.find_all("iframe")
result_spain
# Getting the required source link
total_main_link=result_spain[1]["src"]
total_main_link
After getting the source link, I am not able to extract the table contents.
Any help is appreciated. Thanks in Advance!
CodePudding user response:
The following is an example of how you can get that data using selenium:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
# chrome_options.add_argument("--headless")
chrome_options.add_argument('disable-notifications')
chrome_options.add_argument("window-size=1920,1080")
webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)
wait = WebDriverWait(browser, 20)
url = ' https://chartviewer-europublic.bigapis.net/nzgaV/index.html'
browser.get(url)
table = wait.until(EC.element_to_be_clickable((By.ID, "datatable")))
df = pd.read_html(table.get_attribute("outerHTML"))[0]
print(df)
This will get the information as a dataframe, and display it in terminal:
Categoría | Ago-22 | Ago-21 | % Variacion | Acumulado 2022 | Acumulado 2021 | % Variacion Acumulado | |
---|---|---|---|---|---|---|---|
0 | Gasolina | 22.3402 | 20.0702 | 11311.31 | 231.348 | 279.89 | -17-17.34 |
1 | Diesel | 8.9639 | 8.06481 | 11211.15 | 92.9799 | 119.641 | -22-22.29 |
2 | Resto | 20.6042 | 19.4492 | 595.94 | 208.715 | 188.782 | 1110.56 |
3 | Total combustibles | 51.9075 | 47.5835 | 919.09 | 533.043 | 588.314 | -9-9.39 |
4 | Particular | 24.9512 | 26.0833 | -4,3-4.34 | 233.413 | 236.728 | -1-1.4 |
5 | Empresa | 21.7122 | 17.6732 | 22922.85 | 224.337 | 215.654 | 44.03 |
6 | Alquiler | 5.24452 | 3.82738 | 37037.03 | 75.2928 | 135.931 | -45-44.61 |
7 | Total canales | 51.9075 | 47.5835 | 919.09 | 533.043 | 588.314 | -9-9.39 |
The selenium setup is for linux. However, if you would just peruse the questions on Selenium on this forum, you would find countless examples of selenium/chromedriver setups for Windows, if you are using Windows (or for Mac, for that matter).
Also, Selenium documentation is helpful: https://www.selenium.dev/documentation/webdriver/getting_started/