Do you guys know a way I could go the nth page (e.g. page 100) in this websites pagination without going through each and every page?
Here is the link to the website: https://www.sustainalytics.com/esg-ratings
(Note: Just an example, I am not collecting or selling this data)
I could also do it manually through chrome if there is a way.
Thank you
CodePudding user response:
You would do something like this, with requests/BeautifulSoup:
import requests
from bs4 import BeautifulSoup
data = {
'industry': '',
'rating': '',
'filter': '',
'page': 100, ### this is where you would select the specific page
'pageSize': 100,
'resourcePackage': 'Sustainalytics'
}
r = requests.post('https://www.sustainalytics.com/sustapi/companyratings/getcompanyratings', data=data)
soup = BeautifulSoup(r.text, 'html.parser')
print(soup.get_text(strip=True))
This would return the text response of that actual table (which is not a table per se, you would need to further isolate and select elements from html response - see the elegant response from Andrej):
Sera Prognostics, Inc.NAS:SERA23.9Medium ESG RiskSerba Dinamik Holdings Bhd.KLS:527944.6Severe ESG RiskSerco Group PLCLON:SRP19.1Low ESG RiskSercomm Corp.TAI:538824.3Medium ESG RiskSeres Therapeutics IncNAS:MCRB35.9High ESG RiskSeria Co. Ltd.TKS:278221.7Medium [....]
CodePudding user response:
To get all data from pages into pandas dataframe you can use next example:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "https://www.sustainalytics.com/sustapi/companyratings/getcompanyratings"
data = {
"industry": "",
"rating": "",
"filter": "",
"page": "1",
"pageSize": "10",
"resourcePackage": "Sustainalytics",
}
all_rows = []
for data["page"] in range(1, 3): # <-- increase the range here
soup = BeautifulSoup(requests.post(url, data=data).content, "html.parser")
for s in soup.select(".company-row"):
all_rows.append(s.get_text(strip=True, separator="\n").split("\n"))
df = pd.DataFrame(
all_rows, columns=["Name", "Symbol", "ESG Risk Rating", "Text"]
)
print(df)
Prints:
Name Symbol ESG Risk Rating Text
0 1-800-Flowers.com Inc NAS:FLWS 22.1 Medium ESG Risk
1 1&1 AG ETR:1U1 22.3 Medium ESG Risk
2 10X Genomics, Inc. NAS:TXG 22.6 Medium ESG Risk
3 111, Inc. NAS:YI 28.7 Medium ESG Risk
4 17 Education & Technology Group, Inc. NAS:YQ 27.0 Medium ESG Risk
5 1Life Healthcare, Inc. NAS:ONEM 24.6 Medium ESG Risk
6 1st Source Corp NAS:SRCE 31.7 High ESG Risk
7 1stdibs.com, Inc. NAS:DIBS 28.0 Medium ESG Risk
8 22nd Century Group Inc NAS:XXII 31.7 High ESG Risk
9 2i Rete Gas SpA - 35.3 High ESG Risk
10 2U, Inc. NAS:TWOU 19.8 Low ESG Risk
11 360 DigiTech, Inc. NAS:QFIN 28.8 Medium ESG Risk
12 360 Security Technology, Inc. SHG:601360 19.7 Low ESG Risk
13 361 Degrees International Ltd. HKG:1361 19.1 Low ESG Risk
14 3D Systems Corp. NYS:DDD 25.8 Medium ESG Risk
15 3i Group PLC LON:III 11.6 Low ESG Risk
16 3M Co NYS:MMM 33.6 High ESG Risk
17 3M India Ltd. BOM:523395 23.8 Medium ESG Risk
18 3R Petroleum Óleo e Gás SA BSP:RRRP3 56.0 Severe ESG Risk
19 3SBio Inc HKG:1530 27.1 Medium ESG Risk