I am trying to access the data on every page that exists on the site at
url = 'https://apexranked.com/'
page = 1
while page != 121:
url = f'https://apexranked.com/?page={page}'
print(url)
page = page 1
CodePudding user response:
That website uses Javascript to fetch pages. You can investigate the urls fetched by JS and try to follow them, or you can use Selenium to scrape it.
CodePudding user response:
You can use this example how to navigate the pages and load the data into pandas DataFrame:
import requests
import pandas as pd
url = "https://apexranked.com/wp-admin/admin-ajax.php"
params = {
"action": "get_player_data",
"page": "2",
"total_pages": "196",
}
all_df = []
for params["page"] in range(1, 3): # <-- increase number of pages here
df = pd.read_html(requests.get(url, params=params).text)[0]
all_df.append(df)
final_df = pd.concat(all_df)
print(final_df.tail(10).to_markdown(index=False))
Prints:
Rank | Display Name | Rank Score |
---|---|---|
#108 | Imp | 20933 252 |
#110 | SephiRuff | 20893 2137 |
#113 | tttch1ekyttt_SBI | 20846 864 |
#114 | Rue_y | 20801 926 |
#115 | FTX_Verhu1st | 20793 704 |
#116 | DF_G4isen | 20780 1063 |
#117 | iWeakQ | 20776 676 |
#119 | Ken | 20775 1379 |
#120 | scrappy on twitch | 20761 574 |
#121 | KrEy | 20759 272 |