Hello to all this is my first post hope you are good!
I try to extract the content of the table located on this site: https://pamestoixima.opap.gr/ . I want to extract the table as is. I have tried several ways such as beautifulSoup , pandas, and selenium but with no success! The latest code I have tried is this:
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome( executable_path=r'C:path to /chromedriver.exe')
driver.get('https://pamestoixima.opap.gr/')
soup = BeautifulSoup(driver.page_source,'lxml')
author_element = soup.find("table", class_="results-table")
print(author_element.text)
driver.quit()
The error message I get is this:
USB: usb_device_handle_win.cc:1049 Failed to read descriptor from node connection:
Thank you in advance for any help!
CodePudding user response:
Do you specifically want to use Selenium? I haven't looked into detail but I think you can find all the data in this file: https://api.opap.gr/sb/sport/soccer/coupon?locale=el&onlyLive=false&marketIds=1,2,31,21,18,18,18,14&fromDate=2022-05-03&toDate=2022-05-03
CodePudding user response:
You can use selenium with pandas to grab complete table data as follows:
import time
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
option = webdriver.ChromeOptions()
option.add_argument("start-maximized")
#chrome to stay open
option.add_experimental_option("detach", True)
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()),options=option)
driver.get('https://pamestoixima.opap.gr/')
time.sleep(2)
soup = BeautifulSoup(driver.page_source,'lxml')
df = pd.read_html(str(soup))[0]
print(df)
Output:
ΧΩΡΑΔΙΟΡΓ. ΩΡΑΕΝΑΡΞ. ... ΔΙΑΘ.ΣΤΟΙΧ. ΗΜΙ/ΤΕΛ
Unnamed: 0_level_1 Unnamed: 1_level_1 ... Unnamed: 32_level_1 Unnamed: 33_level_1
0 ΒΡΑ2 04:00 ... 74 NaN
1 NaN 04:15 ... 215 NaN
2 NaN 04:15 ... 216 NaN
3 NaN 04:15 ... 214 NaN
4 NaN 04:15 ... 184 NaN
5 NaN 04:15 ... 186 NaN
6 ΚΟΛ2 05:00 ... 15 NaN
7 ΚΟΛ2 05:00 ... 14 NaN
8 ΚΟΛ2 05:00 ... 14 NaN
9 ΚΟΛ2 05:00 ... 14 NaN
10 ΚΟΛ2 05:00 ... 14 NaN
11 ΚΟΛ2 05:00 ... 14 NaN
12 ΚΑΑΜ 05:15 ... 26 NaN
13 ΑΡΓΝ 06:10 ... 14 NaN
14 ΒΡΑ2 06:30 ... 75 NaN
15 NaN 06:30 ... 215 NaN
16 NaN 06:30 ... 211 NaN
17 NaN 06:30 ... 218 NaN
18 NaN 06:30 ... 183 NaN
19 NaN 06:30 ... 178 NaN
20 NaN 06:30 ... 184 NaN
21 ΚΡΙΚ 08:00 ... 26 NaN
22 ΙΑΠ2 10:00 ... 26 NaN
23 ΙΑΠ2 10:00 ... 26 NaN
24 ΙΑΠ3 10:00 ... 14 NaN
25 ΙΑΠ2 11:00 ... 26 NaN
26 ΙΑΠ2 11:00 ... 26 NaN
27 ΙΑΠ2 11:00 ... 26 NaN
28 ΙΑΠ2 11:00 ... 26 NaN
29 ΙΑΠ2 11:00 ... 26 NaN
30 ΙΑΠ1 11:00 ... 211 NaN
31 ΙΑΠ2 11:00 ... 26 NaN
32 ΙΑΠ2 11:00 ... 27 NaN
33 ΙΑΠ2 11:00 ... 28 NaN
34 ΙΑΠ2 13:00 ... 27 NaN
35 ΑΥΣΛ 15:05 ... 207 NaN
36 ΝΚΡ2 16:00 ... 26 NaN
37 ΝΚΡ2 16:30 ... 26 NaN
38 ΑΥΣΛ 17:05 ... 216 NaN
39 ΟΥΓ1 20:30 ... 76 NaN
40 ΟΥΓ1 21:00 ... 75 NaN
41 ΣΛΟ1 21:00 ... 26 NaN
42 ΕΛΛ1 22:00 ... 321 NaN
43 ΔΑΝΚ 22:00 ... 206 NaN
44 ΟΥΓ1 22:30 ... 76 NaN
45 ΤΣΕΚ 23:00 ... 76 NaN
46 ΚΡΙΚ 23:00 ... 26 NaN
47 ΦΙΝΚ 23:00 ... 72 NaN
48 ΙΤΑ3 00:30 ... 27 NaN
49 ΙΤΑ3 00:30 ... 27 NaN
50 ΙΤΑ3 00:30 ... 29 NaN
51 ΙΤΑ3 00:30 ... 27 NaN
52 ΙΤΑ3 00:30 ... 27 NaN
53 ΙΤΑ3 00:30 ... 27 NaN
54 ΣΚΩΤ 00:45 ... 74 NaN
55 NaN 01:00 ... 546 NaN
[56 rows x 34 columns]