I am trying to scrape the match table from this link:
I tried looking at the network tab and it eventually got me to datatables.net. I however can't seem to figure out a way to get the data from that website. It seems to make a post request with certain headers, but it's unfortunately not very clear to me what that does.
There is no api call
CodePudding user response:
The desired table data isn't populated by JavaScript meaing data is in static HTML DOM
and you can grab the table data using pandas DataFrame.
import pandas as pd
import requests
headers = {'user-agent':'Mozilla/5.0'}
url = 'https://www.kayak-polo.info/kpmatchs.php?lang=en&event=0&Saison=2022&Group=CM&Compet=*&J=*&Round=*&Css=&navGroup=1'
req= requests.get(url,headers=headers).text
df = pd.read_html(req)[0]
print(df)
Output:
# Date ... Referee 2 Games
0 501 2022-08-1610:20 ... THOMAS Mark (GBR) 08-16 10:20 - Pitch 1 Group UW ITA U21 Women...
1 503 2022-08-1610:20 ... BELISLE Ricky (AUS) 08-16 10:20 - Pitch 3 Group UW ESP U21 Women...
2 504 2022-08-1610:20 ... ANDZIAK-GINTER Marzena (POL) 08-16 10:20 - Pitch 4 Group UW NED U21 Women...
3 508 2022-08-1613:00 ... BELISLE Ricky (AUS) 08-16 13:00 - Pitch 4 Group UW ITA U21 Women...
4 506 2022-08-1813:15 ... BELISLE Ricky (AUS) 08-18 13:15 - Pitch 2 Group UW POL U21 Women...
.. ... ... ... ... ...
280 464 2022-08-1914:25 ... WOLFF Sandra (GER) 08-19 14:25 - Pitch 4 Classifying 13-16 CZE ...
281 545 2022-08-2016:00 ... NaN 08-20 16:00 - Pitch 5th place GBR U21 Women -
282 195 2022-08-2112:05 ... NaN 08-21 12:05 - Pitch 5 21th place UKR Men Aw...
283 546 2022-08-2016:00 ... NaN 08-20 16:00 - Pitch 8th place ITA U21 Women -
284 # Date ... Referee 2 NaN
[285 rows x 11 columns]