Home > database >  How to scrape a horizontal bar chart?
How to scrape a horizontal bar chart?

Time:09-22

You can find the graph I want to scrape at this address: https://www.algopoly.com/res-rapor.html

My desire is that:

KARABEL RES 5.23 TL/MWh
GÖKÇEDAĞ RES 21.28 TL/MWh
.
.
.
HAMSİ RES 486.47TL/MWh

I've tried:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


url = 'https://www.algopoly.com/res-rapor.html'

options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver = webdriver.Chrome(options=options, service=Service(ChromeDriverManager().install()))
driver.get(url)
WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.XPATH, "//iframe[@id='518061208646906024']")))

iframe_element = driver.find_element(By.XPATH, "//iframe[@id='518061208646906024']")
data = driver.switch_to.frame(iframe_element)
print(data)

I can't get anything. Is it possible to scrape this chart with selenium?

CodePudding user response:

Task is not trivial: data for that bar chart is being pulled down from a different address. If you inspect the HTML, you can see an iframe - you need to scrape that iframe's source, for the actual chart data. Here is one way to do it, avoiding selenium:

import requests
from bs4 import BeautifulSoup as bs
import json

headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}

r = requests.get('https://s3.eu-west-1.amazonaws.com/algopoly.com/res_rapor/index.html', headers=headers)

soup = bs(r.text, 'lxml')
script = soup.select_one('script[type="application/json"]')
json_obj = json.loads(script.text)
soup = bs(json_obj['x']['html'], 'lxml')
# print(soup.prettify())
elements = soup.select('rect[fill-opacity="1"]')
for el in elements:
    print(el.get('title'))

Result in terminal:

None
None
Org: EGENER<br>UEVCB: KARABEL RES<br>Toplam Dengesizlik: 4095 TL<br>Toplam KÃPST: 616 TL<br>Toplam Ãretim: 901.08 MWh
Org: ROTOR<br>UEVCB: GÃKÃEDAÄ RES<br>Toplam Dengesizlik: 883763 TL<br>Toplam KÃPST: 170890 TL<br>Toplam Ãretim: 49571.7 MWh
Org: YENÄ° BELEN<br>UEVCB: ÅENBÃK RES(YENÄ° BELEN ENR.)<br>Toplam Dengesizlik: 360862 TL<br>Toplam KÃPST: 97310 TL<br>Toplam Ãretim: 17903.76 MWh
Org: BELEN<br>UEVCB: BELEN ELEKTRÄ°K ÃRETÄ°M A.Å.<br>Toplam Dengesizlik: 467731 TL<br>Toplam KÃPST: 87781 TL<br>Toplam Ãretim: 20176 MWh
Org: Ä°MBAT<br>UEVCB: SARITEPE RES<br>Toplam Dengesizlik: 590565 TL<br>Toplam KÃPST: 121483 TL<br>Toplam Ãretim: 24895 MWh
[...]

Expect to do a little data cleanup. You can also inspect the response and see if there is other data you need in that xml data you extract from json.

  • Related