I am trying to scrape data from the chart at https://www.transfermarkt.com/neymar/marktwertverlauf/spieler/68290. I tried accessing the data using the respective xpath of the data in the boxes, but it doesn't seem to work.
I tried using Scrapy:
date = response.xpath('//*[@id="highcharts-0"]/div/span/b[1]').get()
market_value = response.xpath('//*[@id="highcharts-0"]/div/span/b[1]').get()
club = response.xpath('//*[@id="highcharts-0"]/div/span/b[3]').get()
age = response.xpath('//*[@id="highcharts-0"]/div/span/b[4]').get()
How can I scrape all the data from the chart using Scrapy or Selenium?
CodePudding user response:
This data is being rendered on the client (browser) after consuming an inline JS on the HTML body.
You need regex if you're about to use scrapy
eg (not tested)
import re
import json
body = response.body()
data = re.findall(r"(?<=\'series\'\:).*?}}]}]", body)
if not data:
return None
data = json.loads(data[0])
CodePudding user response:
import time
from webdriver_manager.chrome import ChromeDriverManager
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(ChromeDriverManager().install(), options = chrome_options)
driver.get(url)
time.sleep(5)
temp = driver.execute_script('return window.Highcharts.charts[0]'
'.series[0].options.data')
data = [item for item in temp]
print(data)