this is for sure a lack of knowledge issue as I am generally new to scraping. What I am trying to accomplish with this code is to scrape all of the data on the webpage which I am accomplishing. The issue is before the loops continues I want pandas to write the current position_text variable to the ["Positions"] column. I confirmed with the print statement it is pulling exactly what I am looking to write to the new ["Position"] column, but it is only writing the last instance to ["Position"] which is "C"
Link: https://www.fantasypros.com/daily-fantasy/nba/fanduel-defense-vs-position.php
df_results = pd.DataFrame()
follow_loop=list(range(1,7))
for i in follow_loop:
xpath = '//*[@id="main-container"]/div/div/div/div[4]/div[1]/ul/li['
xpath = str(i)
xpath = "]"
driver.find_element(By. XPATH,(xpath)).click()
sleep (2)
driver.execute_script("window.scrollTo(1,1200)")
sleep(2)
driver.execute_script("window.scrollTo(1,-1200)")
html=driver.page_source
soup = BeautifulSoup(html,'html.parser')
stats_table=soup.find(id="data-table")
position='//*[@id="main-container"]/div/div/div/div[4]/div[1]/ul/li['
position = str(i)
position = "]"
position_text = driver.find_element(By. XPATH,(position)).text
df_results = df_results.append(pd.read_html(str(stats_table)))
df_results["Position"] = position_text
print(position_text)
sleep (2)
ALL
PG
SG
SF
PF
C
CodePudding user response:
Here is one way of getting the data from all tables, in one big dataframe:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time as t
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
big_df = pd.DataFrame()
chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument('disable-notifications')
chrome_options.add_argument("window-size=1280,720")
webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
driver = webdriver.Chrome(service=webdriver_service, options=chrome_options)
wait = WebDriverWait(driver, 20)
url = "https://www.fantasypros.com/daily-fantasy/nba/fanduel-defense-vs-position.php"
driver.get(url)
tables_list = wait.until(EC.presence_of_all_elements_located((By.XPATH, '//ul[@]/li')))
for x in tables_list:
x.click()
print('selected', x.text)
t.sleep(2)
table = wait.until(EC.element_to_be_clickable((By.XPATH, '//table[@id="data-table"]')))
df = pd.read_html(table.get_attribute('outerHTML'))[0]
df['Category'] = x.text.strip()
big_df = pd.concat([big_df, df], axis=0, ignore_index=True)
print('done, moving to next table')
print(big_df)
big_df.to_csv('fanduel.csv')
This will save the data to a csv file, and also display it in terminal:
Team PTS REB AST 3PM STL BLK TO FD PTS Category
0 HOUHouston Rockets 23.54 9.10 5.10 2.54 1.88 1.15 2.65 48.55 ALL
1 OKCOklahoma City Thunder 22.22 9.61 5.19 2.70 1.67 1.18 2.52 47.57 ALL
2 PORPortland Trail Blazers 22.96 8.92 5.31 2.74 1.63 0.99 2.65 46.84 ALL
3 SACSacramento Kings 23.00 9.10 5.03 2.58 1.61 0.95 2.50 46.65 ALL
4 ORLOrlando Magic 22.35 9.39 4.94 2.62 1.57 1.04 2.50 46.36 ALL
... ... ... ... ... ... ... ... ... ... ...
175 DENDenver Nuggets 22.96 12.91 3.68 0.96 1.21 1.76 2.62 50.26 C
176 PHIPhiladelphia 76ers 21.95 13.35 3.01 1.15 1.14 1.94 2.07 49.66 C
177 BOSBoston Celtics 19.52 14.46 3.58 0.61 1.40 1.82 2.80 49.10 C
178 NYKNew York Knicks 19.31 14.48 3.02 1.07 1.02 1.98 2.26 47.96 C
179 MIAMiami Heat 19.00 14.44 2.95 0.64 1.24 1.55 2.71 46.41 C