How to access data for all pages on a site using BeautifulSoup?-CodePudding

I am trying to access the data on every page that exists on the site at

url = 'https://apexranked.com/'

page = 1 

while page != 121: 
    url = f'https://apexranked.com/?page={page}'
    print(url) 
    page = page   1

CodePudding user response：

That website uses Javascript to fetch pages. You can investigate the urls fetched by JS and try to follow them, or you can use Selenium to scrape it.

CodePudding user response：

You can use this example how to navigate the pages and load the data into pandas DataFrame:

import requests
import pandas as pd


url = "https://apexranked.com/wp-admin/admin-ajax.php"

params = {
    "action": "get_player_data",
    "page": "2",
    "total_pages": "196",
}

all_df = []
for params["page"] in range(1, 3):  # <-- increase number of pages here
    df = pd.read_html(requests.get(url, params=params).text)[0]
    all_df.append(df)

final_df = pd.concat(all_df)
print(final_df.tail(10).to_markdown(index=False))

Prints:

Rank	Display Name	Rank Score
#108	Imp	20933 252
#110	SephiRuff	20893 2137
#113	tttch1ekyttt_SBI	20846 864
#114	Rue_y	20801 926
#115	FTX_Verhu1st	20793 704
#116	DF_G4isen	20780 1063
#117	iWeakQ	20776 676
#119	Ken	20775 1379
#120	scrappy on twitch	20761 574
#121	KrEy	20759 272