I am trying to scrape the data from a website but the below code will only pull the first row from the website's table despite being in a for loop. What am I missing?
import requests
from bs4 import BeautifulSoup
import pandas
import xlsxwriter
r = requests.get("https://www.fantasypros.com/nfl/stats/qb.php")
c = r.content
soup=BeautifulSoup(c, "html.parser")
all=soup.find_all("div",{"class":"mobile-table double-header"})
l=[]
for item in all:
d={}
d["Player"] = (item.find("a",{"class","player-name"}).text.strip())
l.append(d)
df=pandas.DataFrame(l)
df.to_csv("Output.csv")
CodePudding user response:
I used find to get the first table occurrence and then iterated over all div.player-name elements using find_all.
import requests
from bs4 import BeautifulSoup
import pandas
r = requests.get("https://www.fantasypros.com/nfl/stats/qb.php")
c = r.content
soup=BeautifulSoup(c, "html.parser")
table=soup.find("div",{"class":"mobile-table double-header"})
l = []
for item in table.find_all("a",{"class","player-name"}):
d={}
d["Player"] = item.text.strip()
l.append(d)
df=pandas.DataFrame(l)
df.to_csv("Output.csv")
CodePudding user response:
Of course, you can also do something like this:
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://www.fantasypros.com/nfl/stats/qb.php'
r = requests.get(url)
soup = BeautifulSoup(r.text)
table = soup.select('table#data')
df = pd.read_html(str(table))[0]
print(df)
This returns that table in full:
Unnamed: 0_level_0 Unnamed: 1_level_0 PASSING RUSHING MISC
Rank Player CMP ATT PCT YDS Y/A TD INT SACKS ATT YDS TD FL G FPTS FPTS/G ROST
0 1 Josh Allen (BUF) 409 646 63.3 4407 6.8 36 15 26 122 763 6 3 17 417.7 24.6 99.9%
1 2 Justin Herbert (LAC) 443 672 65.9 5014 7.5 38 15 31 63 302 3 1 17 395.6 23.3 99.8%
2 3 Tom Brady (TB) 485 719 67.5 5316 7.4 43 12 22 28 81 2 3 17 386.7 22.7 96.1%
3 4 Patrick Mahomes II (KC) 436 658 66.3 4828 7.3 37 13 28 66 381 2 4 17 374.2 22.0 99.9%
4 5 Matthew Stafford (LAR) 404 601 67.2 4886 8.1 41 17 30 32 43 0 2 17 346.8 20.4 90.4%
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...