Home > Net >  Web Scraping - Obtaining REGEX expression to pull var from website
Web Scraping - Obtaining REGEX expression to pull var from website

Time:09-26

Similar to this: Unable to retrieve data from Macro Trends using selenium and read_html to create a data frame?

I'm trying to use the accepted answer's method of pulling data.

Same website, but trying to pull from 'https://www.macrotrends.net/stocks/charts/TSLA/tesla/pe-ratio' instead and I'm trying to pull 'chartData' instead of 'originalData'.

The problem is that I can't find the right regex expression '\r\n\r\n\r' to get the data I want, it always crashes at 'data = json.loads(p.findall(r.text)[0]' because 'IndexError: list index out of range'. I've played around with various combinations of lengths of \r\n\r. The

@QHarr who answered the original question stated 'I tweaked the regex until it brought back the json string I wanted.', but I can't quite figure out the combination to get me the data I want, it always crashes in the way I stated above.

r = requests.get('https://www.macrotrends.net/stocks/charts/TSLA/tesla/pe-ratio')
p = re.compile(r' var chartData = (.*?);\r\n\r\n\r\n\r', re.DOTALL)
data = json.loads(p.findall(r.text)[0])

CodePudding user response:

The chartData is inside <iframe>, so you should request different URL:

import re
import json
import requests

ticker = "TSLA"
url = f"https://www.macrotrends.net/assets/php/fundamental_iframe.php?t={ticker}&type=pe-ratio&statement=price-ratios&freq=Q"

html_doc = requests.get(url).text

data = re.search(r"chartData = (\[\{.*?\}\])", html_doc).group(1)
data = json.loads(data)

print(data)

Prints:

[
    {"date": "2011-03-31", "v1": 5.55, "v2": 0.098, "v3": 56.63},
    {"date": "2011-06-30", "v1": 5.826, "v2": 0.986, "v3": 5.91},
    {"date": "2011-09-30", "v1": 4.878, "v2": 0.936, "v3": 5.21},
    {"date": "2011-12-31", "v1": 5.712, "v2": -0.506, "v3": 0},
    {"date": "2012-03-31", "v1": 7.448, "v2": -0.576, "v3": 0},
    {"date": "2012-06-30", "v1": 6.258, "v2": -0.656, "v3": 0},
    {"date": "2012-09-30", "v1": 5.856, "v2": -0.74, "v3": 0},
    {"date": "2012-12-31", "v1": 6.774, "v2": -0.738, "v3": 0},
    {"date": "2013-03-31", "v1": 7.578, "v2": -0.566, "v3": 0},
    {"date": "2013-06-30", "v1": 21.472, "v2": -0.418, "v3": 0},
    {"date": "2013-09-30", "v1": 38.674, "v2": -0.272, "v3": 0},
    {"date": "2013-12-31", "v1": 30.0858, "v2": -0.124, "v3": 0},
    {"date": "2014-03-31", "v1": 41.69, "v2": -0.204, "v3": 0},
    {"date": "2014-06-30", "v1": 48.012, "v2": -0.252, "v3": 0},
    {"date": "2014-09-30", "v1": 48.536, "v2": -0.308, "v3": 0},
    {"date": "2014-12-31", "v1": 44.482, "v2": -0.472, "v3": 0},
    {"date": "2015-03-31", "v1": 37.754, "v2": -0.636, "v3": 0},
    {"date": "2015-06-30", "v1": 53.652, "v2": -0.826, "v3": 0},
    {"date": "2015-09-30", "v1": 49.68, "v2": -1.062, "v3": 0},
    {"date": "2015-12-31", "v1": 48.002, "v2": -1.386, "v3": 0},
    {"date": "2016-03-31", "v1": 45.954, "v2": -1.568, "v3": 0},
    {"date": "2016-06-30", "v1": 42.456, "v2": -1.696, "v3": 0},
    {"date": "2016-09-30", "v1": 40.806, "v2": -1.312, "v3": 0},
    {"date": "2016-12-31", "v1": 42.738, "v2": -0.936, "v3": 0},
    {"date": "2017-03-31", "v1": 55.66, "v2": -0.918, "v3": 0},
    {"date": "2017-06-30", "v1": 72.322, "v2": -0.908, "v3": 0},
    {"date": "2017-09-30", "v1": 68.22, "v2": -1.676, "v3": 0},
    {"date": "2017-12-31", "v1": 62.27, "v2": -2.366, "v3": 0},
    {"date": "2018-03-31", "v1": 53.226, "v2": -2.796, "v3": 0},
    {"date": "2018-06-30", "v1": 68.59, "v2": -3.232, "v3": 0},
    {"date": "2018-09-30", "v1": 52.954, "v2": -2.142, "v3": 0},
    {"date": "2018-12-31", "v1": 66.56, "v2": -1.14, "v3": 0},
    {"date": "2019-03-31", "v1": 55.972, "v2": -1.122, "v3": 0},
    {"date": "2019-06-30", "v1": 44.692, "v2": -0.74, "v3": 0},
    {"date": "2019-09-30", "v1": 48.174, "v2": -0.93, "v3": 0},
    {"date": "2019-12-31", "v1": 83.666, "v2": -0.98, "v3": 0},
    {"date": "2020-03-31", "v1": 104.8, "v2": -0.14, "v3": 0},
    {"date": "2020-06-30", "v1": 215.962, "v2": 0.422, "v3": 511.76},
    {"date": "2020-09-30", "v1": 429.01, "v2": 0.532, "v3": 806.41},
    {"date": "2020-12-31", "v1": 705.67, "v2": 0.64, "v3": 1102.61},
    {"date": "2021-03-31", "v1": 667.93, "v2": 1.01, "v3": 661.32},
    {"date": "2021-06-30", "v1": 679.7, "v2": 1.93, "v3": 352.18},
    {"date": "2021-09-24", "v1": 774.39, "v3": 401.24},
]
  • Related