I would like to create a dataframe by pulling only certain information from this website.
https://www.stockrover.com/build/production/Research/tail.js?1644930560
I would like to pull all the entries like this one. ["0005.HK","HSBC HOLDINGS","",""]
Another problem is, suppose I only want only the first 20,000 lines which is the stock information and there is other information after line 20,000 that I don't want included in the dataframe.
To summarize, could someone show me how to pull out just the information I'm trying to extract and create a dataframe with those results if this is possible.
A sample of the website results
function getStocksLibraryArray(){return[["0005.HK","HSBC HOLDINGS","",""],["0006.HK","Power Assets Holdings Ltd","",""],["000660.KS","SK hynix","",""],["004370.KS","Nongshim","",""],["005930.KS","Samsung Electroni","",""],["0123.HK","YUEXIU PROPERTY","",""],["0336.HK","HUABAO INTL","",""],["0408.HK","YIP'S CHEMICAL","",""],["0522.HK","ASM PACIFIC","",""],["0688.HK","CHINA OVERSEAS","",""],["0700.HK","TENCENT","",""],["0762.HK","CHINA UNICOM","",""],["0808.HK","PROSPERITY REIT","",""],["0813.HK","SHIMAO PROPERTY",
Code to pull all lines including ones not wanted
import requests
import pandas as pd
import requests
url = "https://www.stockrover.com/build/production/Research/tail.js?1644930560"
payload={}
headers = {}
response = requests.request("GET", url, headers=headers, data=payload)
print(response.text)
CodePudding user response:
Use regex to extract the details followed by literal_eval
to convert string to python object
import re
from ast import literal_eval
import pandas as pd
import requests
url = "https://www.stockrover.com/build/production/Research/tail.js?1644930560"
response = requests.request("GET", url, headers={}, data={})
regex_ = re.compile(r"getStocksLibraryArray\(\)\{return(. ?)}", re.DOTALL)
print(pd.DataFrame(literal_eval(regex_.search(response.text).group(1))))
0 1 2 3
0 0005.HK HSBC HOLDINGS
1 0006.HK Power Assets Holdings Ltd
2 000660.KS SK hynix
3 004370.KS Nongshim
4 005930.KS Samsung Electroni
... ... ... ... ..
21426 ZZHGF ZhongAn Online P&C _INSUP
21427 ZZHGY ZhongAn Online P&C _INSUP
21428 ZZLL ZZLL Information Tech _INTEC
21429 ZZZ.TO Sleep Country Canada _SPECR
21430 ZZZOF Zinc One Resources _OTHEI