I am very new to scraping, and am trying to pull data from a section of this website - https://projects.fivethirtyeight.com/soccer-predictions/premier-league/. The data I'm trying to get is in the second tab, "Matches," and is the section titled "Upcoming Matches."
I have attempted to do this with SelectorGadget and using rvest, as follows -
library(rvest)
url <- ("https://projects.fivethirtyeight.com/soccer-predictions/premier-league/")
url %>%
html_nodes(".prob, .name") %>%
html_text()
this returns values, however corresponding to the first tab on the page, "Standings." How can I reference the correct section that I am trying to pull?
CodePudding user response:
First:I don't know R but Python.
When you click Matches
then page uses JavaScript to generate matches and it loads JSON data from:
https://projects.fivethirtyeight.com/soccer-predictions/forecasts/2021_premier-league_forecast.json
https://projects.fivethirtyeight.com/soccer-predictions/forecasts/2021_premier-league_matches.json
https://projects.fivethirtyeight.com/soccer-predictions/forecasts/2021_premier-league_clinches.json
I checked only one of them - 2021_premier-league_matches.json
- and I see it has data for Completed Matches
I made example in Python:
import requests
url = 'https://projects.fivethirtyeight.com/soccer-predictions/forecasts/2021_premier-league_matches.json'
response = requests.get(url)
data = response.json()
for item in data:
# search date
if item['datetime'].startswith('2022-03-16'):
print('team1:', item['team1_code'], '|', item['team1'])
print('prob1:', item['prob1'])
print('score1:', item['score1'])
print('adj_score1:', item['adj_score1'])
print('chances1:', item['chances1'])
print('moves1:', item['moves1'])
print('---')
print('team2:', item['team2_code'], '|', item['team2'])
print('prob2:', item['prob2'])
print('score2:', item['score2'])
print('adj_score2:', item['adj_score2'])
print('chances2:', item['chances2'])
print('moves2:', item['moves2'])
print('----------------------------------------')
Result:
team1: BHA | Brighton and Hove Albion
prob1: 0.30435
score1: 0
adj_score1: 0.0
chances1: 1.244
moves1: 1.682
---
team2: TOT | Tottenham Hotspur
prob2: 0.43627
score2: 2
adj_score2: 2.1
chances2: 1.924
moves2: 1.056
----------------------------------------
team1: ARS | Arsenal
prob1: 0.22114
score1: 0
adj_score1: 0.0
chances1: 0.569
moves1: 0.514
---
team2: LIV | Liverpool
prob2: 0.55306
score2: 2
adj_score2: 2.1
chances2: 1.243
moves2: 0.813
----------------------------------------