I am trying to learn how to scrape data. I am very new to Python, so bare with me. Upon searching YouTube, I found a tutorial and tried to scrape some data off of "https://www.pgatour.com/competition/2022/hero-world-challenge/leaderboard.html"
from bs4 import BeautifulSoup
import requests
SCRAPE = requests.get("https://www.pgatour.com/competition/2022/hero-world-challenge/leaderboard.html")
print(SCRAPE)
#Response [200] = Succesful...
#http response status codes
#Information Responses 100-199
#Successful 200-299
#Redirects 300-399
#Client Errors 400-499
#Server Errors 500-599
soup = BeautifulSoup(SCRAPE.content, 'html.parser')
#tells that the data is html and we need to parse it
table = soup.find_all('div', class_="leaderboard leaderboard-table large" )
#pick the large section that contains all the info you need
#then, pick each smaller section, find the type and class.
for list in table:
name = list.find('div', class_="player-name-col")
position = list.find('td', class_="position")
total = list.find('td', class_="total")
print(name, position, total)
Above is my code.. I also included pictures with the inspect open so I can show you what I was thinking when I tried to find the type and class within the leaderboard.
When I print, nothing happens. Any help is appreciated!
CodePudding user response:
Data is loaded dynamically by JavaScript and bs4 can't render JS that's why your code is printing nothing but you can pull the required data from API
.
Example:
import pandas as pd
import requests
api_url= 'https://lbdata.pgatour.com/2022/r/478/leaderboard.json?userTrackingId=eyJhbGciOiJIUzI1NiJ9.eyJpYXQiOjE2Njg5OTEzNTcsIm5iZiI6MTY2ODk5MTM1NywiZXhwIjoxNjY4OTkzMTU3fQ.eTvZpdJgVp5yzSQz4J8n8ovzaBnKPmLhZm6gfitKJeU'
headers={
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'
}
data=[]
res=requests.get(api_url,headers=headers)
#print(res)
for item in res.json()['rows']:
startRanks = item['total']
data.append({'total':startRanks})
df= pd.DataFrame(data)
print(df)
Output:
total
0 -18
1 -17
2 -15
3 -15
4 -14
5 -14
6 -13
7 -13
8 -11
9 -11
10 -11
11 -10
12 -10
13 -8
14 -8
15 -8
16 -7
17 -6
18 1
19 6