here's the code, I'm trying to parse a spreadsheet whit over 2000 items but when ever I run this script I only get the last one hundred or so what could I do to fix this I have tried different parsers, and I haven't found any solution's
from bs4 import BeautifulSoup
import requests
url = "https://backpack.tf/spreadsheet"
sourse = requests.get("https://backpack.tf/spreadsheet").text
soup = BeautifulSoup(sourse, "html.parser")
try:
for name in soup.find_all("tr"):
header = name.text
print(header)
except:
pass
coulden't get the html to work sorry so pls go to https://backpack.tf/spreadsheet
CodePudding user response:
Simplest way to read the table from this page is with pd.read_html
:
import requests
import pandas as pd
url = "https://backpack.tf/spreadsheet"
r = requests.get(
url,
headers={
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:103.0) Gecko/20100101 Firefox/103.0"
},
)
df = pd.read_html(r.text)[0]
print(df)
Prints:
Name Type Genuine Vintage Unique Strange Haunted Collector's
0 A Brush with Death Cosmetic NaN NaN 5.11–5.22 ref 18.4 keys NaN 200 keys
1 A Color Similar to Slate Tool NaN NaN 33.11–33.22 ref NaN NaN NaN
2 A Color Similar to Slate (Non-Craftable) Tool NaN NaN 30 ref NaN NaN NaN
...
2851 Zepheniah's Greed (Non-Craftable) Tool NaN NaN 12 ref NaN NaN NaN
2852 Zipperface (Non-Craftable) Cosmetic NaN NaN 10.55 ref NaN NaN NaN
2853 Zipperface Cosmetic NaN NaN 1.65 keys NaN 13–14.66 ref NaN