Home > Back-end >  How to access data for all pages on a site using BeautifulSoup?
How to access data for all pages on a site using BeautifulSoup?

Time:07-11

I am trying to access the data on every page that exists on the site at enter image description here

url = 'https://apexranked.com/'

page = 1 

while page != 121: 
    url = f'https://apexranked.com/?page={page}'
    print(url) 
    page = page   1

CodePudding user response:

That website uses Javascript to fetch pages. You can investigate the urls fetched by JS and try to follow them, or you can use Selenium to scrape it.

CodePudding user response:

You can use this example how to navigate the pages and load the data into pandas DataFrame:

import requests
import pandas as pd


url = "https://apexranked.com/wp-admin/admin-ajax.php"

params = {
    "action": "get_player_data",
    "page": "2",
    "total_pages": "196",
}

all_df = []
for params["page"] in range(1, 3):  # <-- increase number of pages here
    df = pd.read_html(requests.get(url, params=params).text)[0]
    all_df.append(df)

final_df = pd.concat(all_df)
print(final_df.tail(10).to_markdown(index=False))

Prints:

Rank Display Name Rank Score
#108 Imp 20933 252
#110 SephiRuff 20893 2137
#113 tttch1ekyttt_SBI 20846 864
#114 Rue_y 20801 926
#115 FTX_Verhu1st 20793 704
#116 DF_G4isen 20780 1063
#117 iWeakQ 20776 676
#119 Ken 20775 1379
#120 scrappy on twitch 20761 574
#121 KrEy 20759 272
  • Related