Home > Software engineering >  Scraping data from multiple Highcharts charts using python
Scraping data from multiple Highcharts charts using python

Time:09-29

I am trying to use python (selenium) to extract the data from all the RSPO CREDITS highcharts into a pandas dataframe with Name of chart, Year, Month, and values (No of credits and Price (USD)) on https://rspo.org/palmtrace and have been looking at some other posts like this and this to do this. However, it looks like these charts are formatted a bit differently so any help with this is much appreciated.

CodePudding user response:

Considering your site has two 22-series charts and two 16-series charts, a rough solution would be:

from selenium import webdriver
import time
import pandas as pd

driver = webdriver.Chrome()

website = "https://rspo.org/palmtrace"

driver.get(website)
time.sleep(2)

my_data = []

for chart in range(2):
    for series in range(22):
        temp = driver.execute_script('return window.Highcharts.charts[{}]'
                                '.series[{}].options.data'.format(chart,series))
        temp.insert(0, driver.execute_script('return window.Highcharts.charts[{}]'
                                '.series[{}].options.name'.format(chart,series)))
        my_data.append(temp)

for chart in range(2,4):
    for series in range(16):
        temp = driver.execute_script('return window.Highcharts.charts[{}]'
                                '.series[{}].options.data'.format(chart,series))
        temp.insert(0, driver.execute_script('return window.Highcharts.charts[{}]'
                                '.series[{}].options.name'.format(chart,series)))
        my_data.append(temp)

df = pd.DataFrame(my_data)
print(df)
  • Related