I am gathering sports fixtures and results on the webpage, first of all, I am going to use Pandas to scrape, however, there is an option for selecting "timezone" on the page, so I add slenium for the auto-choosing timezone, therefore I do not know how to scrape with pandas after I use slenium. Would everybody please do me a favour, thank you very much.
here is my work:
from selenium import webdriver
from selenium.webdriver.support.ui import Select
import pandas as pd
PATH ="C:/Users/XXX/Desktop/chromedriver.exe"
driver = webdriver.Chrome( PATH )
driver.get("https://fixturedownload.com")
select = Select(driver.find_element_by_name("timezone"))
select.select_by_value("SE Asia Standard Time" )
driver.find_element_by_xpath('/html/body/div[2]/div/div[2]/form/div/input[1]').click()
List = pd.read_html(I am stuck here)
CodePudding user response:
You don't need selenium. Issue a POST request to the server with your desired timezone (provided appears in dropdown list).
The available values to use appear against the value
attribute of the option
tags within the parent select
element:
Then parse the response to extract your desired download format links e.g. you can grab the header row links for the csvs downloads for all fixtures within each table as follows:
import requests
# import pandas as pd
from bs4 import BeautifulSoup as bs
headers = {'User-Agent': 'Safari/537.36'}
data = {
'timezone': 'Nepal Standard Time',
'command': 'Set Timezone'
}
r = requests.post('https://fixturedownload.com/', headers=headers, data=data)
soup = bs(r.content, 'lxml')
csv_links = ['https://fixturedownload.com' i['href'] for i in soup.select('.fixture tr:nth-child(1) td:nth-child(3) a')]
print(csv_links)
You can then combine csvs if headers match, simply download and store, manipulate etc.
There is no point using read_html
as you will lose the links to the actual data.
CodePudding user response:
To select the timezone as SE Asia Standard Time and scrape the TABLE using Pandas you can use the following Locator Strategies:
Code Block:
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
driver.get("https://fixturedownload.com/")
Select(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//select[@name='timezone']")))).select_by_value("SE Asia Standard Time" )
driver.find_element(By.XPATH, "//input[@value='Set Timezone']").click()
data = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table[@class='fixture']"))).get_attribute("outerHTML")
df = pd.read_html(data)
print(df)
Console Output:
[ 0 1 ... 4 5
0 Full fixture Preview fixture ... Download fixture for ICAL View JSON
1 Teams Teams ... Teams NaN
2 Adelaide Crows Preview fixture ... Download fixture for ICAL View JSON
3 Brisbane Lions Preview fixture ... Download fixture for ICAL View JSON
4 Carlton Preview fixture ... Download fixture for ICAL View JSON
5 Collingwood Preview fixture ... Download fixture for ICAL View JSON
6 Essendon Preview fixture ... Download fixture for ICAL View JSON
7 Fremantle Preview fixture ... Download fixture for ICAL View JSON
8 Geelong Cats Preview fixture ... Download fixture for ICAL View JSON
9 Gold Coast Suns Preview fixture ... Download fixture for ICAL View JSON
10 GWS Giants Preview fixture ... Download fixture for ICAL View JSON
11 Hawthorn Preview fixture ... Download fixture for ICAL View JSON
12 Melbourne Preview fixture ... Download fixture for ICAL View JSON
13 North Melbourne Preview fixture ... Download fixture for ICAL View JSON
14 Port Adelaide Preview fixture ... Download fixture for ICAL View JSON
15 Richmond Preview fixture ... Download fixture for ICAL View JSON
16 St Kilda Preview fixture ... Download fixture for ICAL View JSON
17 Sydney Swans Preview fixture ... Download fixture for ICAL View JSON
18 West Coast Eagles Preview fixture ... Download fixture for ICAL View JSON
19 Western Bulldogs Preview fixture ... Download fixture for ICAL View JSON
[20 rows x 6 columns]]