You can find the graph I want to scrape at this address: https://www.algopoly.com/res-rapor.html
My desire is that:
KARABEL RES 5.23 TL/MWh
GÖKÇEDAĞ RES 21.28 TL/MWh
.
.
.
HAMSİ RES 486.47TL/MWh
I've tried:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = 'https://www.algopoly.com/res-rapor.html'
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver = webdriver.Chrome(options=options, service=Service(ChromeDriverManager().install()))
driver.get(url)
WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.XPATH, "//iframe[@id='518061208646906024']")))
iframe_element = driver.find_element(By.XPATH, "//iframe[@id='518061208646906024']")
data = driver.switch_to.frame(iframe_element)
print(data)
I can't get anything. Is it possible to scrape this chart with selenium?
CodePudding user response:
Task is not trivial: data for that bar chart is being pulled down from a different address. If you inspect the HTML, you can see an iframe - you need to scrape that iframe's source, for the actual chart data. Here is one way to do it, avoiding selenium:
import requests
from bs4 import BeautifulSoup as bs
import json
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}
r = requests.get('https://s3.eu-west-1.amazonaws.com/algopoly.com/res_rapor/index.html', headers=headers)
soup = bs(r.text, 'lxml')
script = soup.select_one('script[type="application/json"]')
json_obj = json.loads(script.text)
soup = bs(json_obj['x']['html'], 'lxml')
# print(soup.prettify())
elements = soup.select('rect[fill-opacity="1"]')
for el in elements:
print(el.get('title'))
Result in terminal:
None
None
Org: EGENER<br>UEVCB: KARABEL RES<br>Toplam Dengesizlik: 4095 TL<br>Toplam KÃPST: 616 TL<br>Toplam Ãretim: 901.08 MWh
Org: ROTOR<br>UEVCB: GÃKÃEDAÄ RES<br>Toplam Dengesizlik: 883763 TL<br>Toplam KÃPST: 170890 TL<br>Toplam Ãretim: 49571.7 MWh
Org: YENÄ° BELEN<br>UEVCB: ÅENBÃK RES(YENÄ° BELEN ENR.)<br>Toplam Dengesizlik: 360862 TL<br>Toplam KÃPST: 97310 TL<br>Toplam Ãretim: 17903.76 MWh
Org: BELEN<br>UEVCB: BELEN ELEKTRÄ°K ÃRETÄ°M A.Å.<br>Toplam Dengesizlik: 467731 TL<br>Toplam KÃPST: 87781 TL<br>Toplam Ãretim: 20176 MWh
Org: Ä°MBAT<br>UEVCB: SARITEPE RES<br>Toplam Dengesizlik: 590565 TL<br>Toplam KÃPST: 121483 TL<br>Toplam Ãretim: 24895 MWh
[...]
Expect to do a little data cleanup. You can also inspect the response and see if there is other data you need in that xml data you extract from json.