How to scrape data from an interactive chart?-CodePudding

I want to scrape data from charts on this page and export it to a csv.

I tried requests but failed to get any data.

Here's the code using requests:

from requests_html import HTMLSession
from csv import DictWriter

url = 'https://www.fidelitypensionmanagers.com/Home/PriceHistory'

s = HTMLSession()
r = s.get(url)

r.html.render(sleep=1)

entry = r.html.xpath('//*[@id="fund-I"]', first=True)

colums = ['Fund_Type', 'Valuation_Date', 'Unit Price']

with open('data.csv', 'a') as f:
    w = DictWriter(f, fieldnames=colums)
    w.writerow(entry)

Is there's any way to scrape data from those charts using python?

CodePudding user response：

The data is stored inside <script> in the page. To parse it you can use next example:

import re
import json
import requests


url = "https://crusaderpensions.com/services/fund-history/"

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:105.0) Gecko/20100101 Firefox/105.0"
}

html_doc = requests.get(url, headers=headers).text
pat = re.compile(
    r"window\['ninja_charts_instance_\d '\] = (.*)(?=\s*</script>)"
)

for data in map(json.loads, pat.findall(html_doc)):
    # uncomment this to print whole chart data:
    # print(data)
    print(data["chart_name"])
    print(data["chart_data"]["datasets"][0]["data"])

Prints:

FUND I
['1.6582', '1.6586', '1.6589', '1.6593', '1.6595', '1.6596', '1.6604']
FUND II
['6.4274', '6.4290', '6.4306', '6.4322', '6.4322', '6.4297', '6.4379']
FUND III
['1.6776', '1.6780', '1.6785', '1.6790', '1.6799', '1.6800', '1.6814']
Retiree Fund History
['4.8109', '4.8123', '4.8138', '4.8153', '4.8159', '4.8176', '4.8196']
FUND V
['1.1898', '1.1901', '1.1904', '1.1907', '1.1911', '1.1914', '1.1917']
FUND VI
['1.0190', '1.0194', '1.0197', '1.0201', '1.0204', '1.0208', '1.0211']