I want to get the HTML this site https://www.forebet.com/en/football-predictions after pressing the button More[ ] enough times to load all games. Each time the button More[ ] on the bottom of the page the HTML changes and shows more football games. How do I get the request to the page with all the football games loaded?
from bs4 import BeautifulSoup
import requests
leagues = {"EPL","UCL","Es1","De1","Fr1","Pt1","It1","UEL"}
class ForeBet:
#gets all games from the leagues on leagues returning the games on a string list
#game format is League|Date|Hour|Home Team|Away Team|Prob Home|Prob Tie| Prob Away
def get_games_and_probs(self):
response=requests.get('https://www.forebet.com/en/football-prediction')
soup = BeautifulSoup(response.text, 'html.parser')
results=list()
games = soup.findAll(class_='rcnt tr_0') soup.findAll(class_='rcnt tr_1')
for game in games:
if(leagues.__contains__(game.find(class_='shortTag').text.strip())):
game=game.find(class_='shortTag').text "|" \
game.find(class_='date_bah').text.split(" ")[0] "|" \
game.find(class_='date_bah').text.split(" ")[1] "|" \
game.find(class_='homeTeam').text "|" \
game.find(class_='awayTeam').text "|" \
game.find(class_='fprc').findNext().text "|" \
game.find(class_='fprc').findNext().findNext().text "|" \
game.find(class_='fprc').findNext().findNext().findNext().text
print(game)
results.append(game)
return results
CodePudding user response:
Like stated, requests and beautfulsoup are used to parse data, not to interact with the site. To do that you need Selenium.
Your other option is to see if you can fetch the data directly, and see if there are parameters that can make another request as if you clicked the get more. Does this do the trick for you?
import pandas as pd
import requests
results = pd.DataFrame()
i=0
while True:
print(i)
url = 'https://m.forebet.com/scripts/getrs.php'
payload = {
'ln': 'en',
'tp': '1x2',
'in': '%s' %(i 11),
'ord': '0'}
jsonData = requests.get(url, params=payload).json()
results = results.append(pd.DataFrame(jsonData[0]), sort=False).reset_index(drop=True)
if max(results['id'].value_counts()) <=1:
i =1
else:
results = results.drop_duplicates()
break
Output:
print(results)
id pr_under ... country full_name
0 1473708 31 ... England Isthmian League
1 1473713 35 ... England Isthmian League
2 1473745 28 ... England Isthmian League
3 1473710 35 ... England Isthmian League
4 1473033 28 ... England Premier League 2
.. ... ... ... ... ...
515 1419208 47 ... Argentina Torneo Federal A
516 1419156 57 ... Argentina Torneo Federal A
517 1450589 50 ... Armenia Premier League
518 1450590 35 ... Armenia Premier League
519 1450591 52 ... Armenia Premier League
[518 rows x 73 columns]