I want to scrape data from charts on this page and export it to a csv.
I tried requests but failed to get any data.
Here's the code using requests:
from requests_html import HTMLSession
from csv import DictWriter
url = 'https://www.fidelitypensionmanagers.com/Home/PriceHistory'
s = HTMLSession()
r = s.get(url)
r.html.render(sleep=1)
entry = r.html.xpath('//*[@id="fund-I"]', first=True)
colums = ['Fund_Type', 'Valuation_Date', 'Unit Price']
with open('data.csv', 'a') as f:
w = DictWriter(f, fieldnames=colums)
w.writerow(entry)
Is there's any way to scrape data from those charts using python?
CodePudding user response:
The data is stored inside <script>
in the page. To parse it you can use next example:
import re
import json
import requests
url = "https://crusaderpensions.com/services/fund-history/"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:105.0) Gecko/20100101 Firefox/105.0"
}
html_doc = requests.get(url, headers=headers).text
pat = re.compile(
r"window\['ninja_charts_instance_\d '\] = (.*)(?=\s*</script>)"
)
for data in map(json.loads, pat.findall(html_doc)):
# uncomment this to print whole chart data:
# print(data)
print(data["chart_name"])
print(data["chart_data"]["datasets"][0]["data"])
Prints:
FUND I
['1.6582', '1.6586', '1.6589', '1.6593', '1.6595', '1.6596', '1.6604']
FUND II
['6.4274', '6.4290', '6.4306', '6.4322', '6.4322', '6.4297', '6.4379']
FUND III
['1.6776', '1.6780', '1.6785', '1.6790', '1.6799', '1.6800', '1.6814']
Retiree Fund History
['4.8109', '4.8123', '4.8138', '4.8153', '4.8159', '4.8176', '4.8196']
FUND V
['1.1898', '1.1901', '1.1904', '1.1907', '1.1911', '1.1914', '1.1917']
FUND VI
['1.0190', '1.0194', '1.0197', '1.0201', '1.0204', '1.0208', '1.0211']