Home > database >  Navigate to nth page in pagination with Selenium (Python) without clicking on every page?
Navigate to nth page in pagination with Selenium (Python) without clicking on every page?

Time:07-29

Do you guys know a way I could go the nth page (e.g. page 100) in this websites pagination without going through each and every page?

Here is the link to the website: https://www.sustainalytics.com/esg-ratings

(Note: Just an example, I am not collecting or selling this data)

I could also do it manually through chrome if there is a way.

Thank you

CodePudding user response:

You would do something like this, with requests/BeautifulSoup:

import requests
from bs4 import BeautifulSoup

data = {
    'industry': '',
    'rating': '',
    'filter': '',
    'page': 100, ### this is where you would select the specific page
    'pageSize': 100,
    'resourcePackage': 'Sustainalytics'
}

r = requests.post('https://www.sustainalytics.com/sustapi/companyratings/getcompanyratings', data=data)
soup = BeautifulSoup(r.text, 'html.parser')
print(soup.get_text(strip=True))

This would return the text response of that actual table (which is not a table per se, you would need to further isolate and select elements from html response - see the elegant response from Andrej):

Sera Prognostics, Inc.NAS:SERA23.9Medium ESG RiskSerba Dinamik Holdings Bhd.KLS:527944.6Severe ESG RiskSerco Group PLCLON:SRP19.1Low ESG RiskSercomm Corp.TAI:538824.3Medium ESG RiskSeres Therapeutics IncNAS:MCRB35.9High ESG RiskSeria Co. Ltd.TKS:278221.7Medium  [....]

CodePudding user response:

To get all data from pages into pandas dataframe you can use next example:

import requests
import pandas as pd
from bs4 import BeautifulSoup

url = "https://www.sustainalytics.com/sustapi/companyratings/getcompanyratings"

data = {
    "industry": "",
    "rating": "",
    "filter": "",
    "page": "1",
    "pageSize": "10",
    "resourcePackage": "Sustainalytics",
}

all_rows = []
for data["page"] in range(1, 3):   # <-- increase the range here
    soup = BeautifulSoup(requests.post(url, data=data).content, "html.parser")
    for s in soup.select(".company-row"):
        all_rows.append(s.get_text(strip=True, separator="\n").split("\n"))

df = pd.DataFrame(
    all_rows, columns=["Name", "Symbol", "ESG Risk Rating", "Text"]
)
print(df)

Prints:

                                     Name      Symbol ESG Risk Rating             Text
0                   1-800-Flowers.com Inc    NAS:FLWS            22.1  Medium ESG Risk
1                                  1&1 AG     ETR:1U1            22.3  Medium ESG Risk
2                      10X Genomics, Inc.     NAS:TXG            22.6  Medium ESG Risk
3                               111, Inc.      NAS:YI            28.7  Medium ESG Risk
4   17 Education & Technology Group, Inc.      NAS:YQ            27.0  Medium ESG Risk
5                  1Life Healthcare, Inc.    NAS:ONEM            24.6  Medium ESG Risk
6                         1st Source Corp    NAS:SRCE            31.7    High ESG Risk
7                       1stdibs.com, Inc.    NAS:DIBS            28.0  Medium ESG Risk
8                  22nd Century Group Inc    NAS:XXII            31.7    High ESG Risk
9                         2i Rete Gas SpA           -            35.3    High ESG Risk
10                               2U, Inc.    NAS:TWOU            19.8     Low ESG Risk
11                     360 DigiTech, Inc.    NAS:QFIN            28.8  Medium ESG Risk
12          360 Security Technology, Inc.  SHG:601360            19.7     Low ESG Risk
13         361 Degrees International Ltd.    HKG:1361            19.1     Low ESG Risk
14                       3D Systems Corp.     NYS:DDD            25.8  Medium ESG Risk
15                           3i Group PLC     LON:III            11.6     Low ESG Risk
16                                  3M Co     NYS:MMM            33.6    High ESG Risk
17                          3M India Ltd.  BOM:523395            23.8  Medium ESG Risk
18             3R Petroleum Óleo e Gás SA   BSP:RRRP3            56.0  Severe ESG Risk
19                              3SBio Inc    HKG:1530            27.1  Medium ESG Risk
  • Related