I'm trying to scrap data from an HTML table on this page https://www.letrot.com/stats/fiche-cheval/enjoy-the-game/ZGJaZgYEBQMW/courses/dernieres-performances#sub_sub_menu_fichecheval , but I get an empty tbody <tbody></tbody>.
I managed to retrieve another table from the same site but I don't understand why, with the same commands, I can't retrieve this table. Here's my code:
import pandas as pd
import requests
from bs4 import BeautifulSoup
headers = {
'Accept-Encoding': 'gzip, deflate, sdch',
'Accept-Language': 'en-US,en;q=0.8',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36',
'Accept': 'text/html,application/xhtml xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
}
url_perf = 'https://www.letrot.com/stats/fiche-cheval/enjoy-the-game/ZGJaZgYEBQMW/courses/dernieres-performances#sub_sub_menu_fichecheval'
response_perf = requests.get(url_perf, headers=headers)
html_perf = response_perf.text
soup_perf = BeautifulSoup(html_perf, 'html.parser')
last_perf_body = soup_perf.find('tbody')
print(last_perf_body)
That returns the following:
<tbody>
</tbody>
So, how can I retrieve the content of the table?
CodePudding user response:
To load the table to pandas DataFrame you can use next example:
import requests
import pandas as pd
from bs4 import BeautifulSoup
# the 'ZGJaZgYEBQMW' is from the base URL
api_url = "https://www.letrot.com/stats/fiche-cheval/enjoy-the-game/ZGJaZgYEBQMW/courses/dernieres-performances-paginate-2?search=&lenght=1000"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:107.0) Gecko/20100101 Firefox/107.0"
}
data = requests.get(api_url, headers=headers).json()
df = pd.DataFrame(data["data"])
df["allocation"] = df["allocation"].apply(
lambda x: BeautifulSoup(x, "html.parser").div.text
)
df = df.apply(lambda x: [BeautifulSoup(str(v), "html.parser").text for v in x])
print(df)
Prints:
dateCourse prix allocation dateCourseRaw rang nomCheval ferrure reduction nomDriver nomHippodrome nomEntraineur corde categorie specialite distance piste depart avisEntraineur video tracking avisEntraineurImg
0 2022-12-07 07/12/22 PRIX DE VILLERS COTTERETS 22 950 € 2022-12-07 000011 ENJOY THE GAME 3 11161'11"6 GELORMINIG. GELORMINI VINCENNES BONDOH.E. BONDO G CC A 2 100 GP AUT 2 \n
1 2022-11-24 24/11/22 PRIX DE LA CAMARGUE 0 € 2022-11-24 000088 ENJOY THE GAME 3 11421'14"2 GELORMINIG. GELORMINI VINCENNES BONDOH.E. BONDO G CC A 2 850 GP - 2 \n
2 2022-10-22 22/10/22 PRIX DE LA PORTE D'AUBERVILLIERS 22 950 € 2022-10-22 000011 ENJOY THE GAME 3 11261'12"6 GELORMINIG. GELORMINI ENGHIEN BONDOH.E. BONDO G CC A 2 150 - AUT 2 \n
3 2022-10-03 03/10/22 PRIX DE LA BOURSE 19 800 € 2022-10-03 000011 ENJOY THE GAME 3 11201'12"0 GELORMINIG. GELORMINI ENGHIEN BONDOH.E. BONDO G DD A 2 150 - AUT 2 \n
4 2022-09-22 22/09/22 PRIX DE MEHUN SUR YEVRE 8 260 € 2022-09-22 000033 ENJOY THE GAME 3 11121'11"2 GELORMINIG. GELORMINI VINCENNES BONDOH.E. BONDO G BB A 2 100 GP AUT 2 \n
5 2022-09-02 02/09/22 PRIX LAMPETIA 510 € 2022-09-02 000077 ENJOY THE GAME 3 11421'14"2 GELORMINIG. GELORMINI VINCENNES BONDOH.E. BONDO G CC A 2 700 GP - 2 \n
6 2022-08-20 20/08/22 PRIX DE MOLAY 20 700 € 2022-08-20 000011 ENJOY THE GAME 3 11101'11"0 RAFFINE. RAFFIN VINCENNES BONDOH.E. BONDO G EE A 2 100 GP AUT 2 \n
7 2022-08-11 11/08/22 PRIX DE LA HAUTE-SAONE 3 680 € 2022-08-11 000044 ENJOY THE GAME 3 11291'12"9 GELORMINIG. GELORMINI ENGHIEN BONDOH.E. BONDO G EE A 2 150 - AUT 2 \n
8 2022-07-07 07/07/22 PRIX LES BRUYERES CARRE 12 600 € 2022-07-07 000011 ENJOY THE GAME 3 11351'13"5 GELORMINIG. GELORMINI LISIEUX BONDOH.E. BONDO D EE A 2 675 - AUT 2 \n
9 2022-06-06 06/06/22 PRIX DE LA COTE ATLANTIQUE 540 € 2022-06-06 000066 ENJOY THE GAME 3 11501'15"0 FRECELLECL. FRECELLE CHATELAILLON-LA ROCHELLE BONDOH.E. BONDO G DD A 2 625 - - 2 \n
10 2022-05-14 14/05/22 PRIX DE CAMBREMER 1 500 € 2022-05-14 000055 ENJOY THE GAME 3 11301'13"0 LAMYA. LAMY CAEN BONDOH.E. BONDO D CC A 2 200 - AUT 2 \n
11 2022-04-29 29/04/22 PRIX SIRONE 1 020 € 2022-04-29 000066 ENJOY THE GAME 3 11281'12"8 LAMYA. LAMY VINCENNES (A MAUQUENCHY) BONDOH.E. BONDO G CC A 2 150 - AUT 2 \n
12 2022-04-21 21/04/22 PRIX DE BOULOGNE 440 € 2022-04-21 000077 ENJOY THE GAME 3 11321'13"2 LAMYA. LAMY ENGHIEN BONDOH.E. BONDO G DD A 2 150 - AUT 2 \n
13 2022-03-13 13/03/22 PRIX DE SYRACUSE 0 € 2022-03-13 00030DA ENJOY THE GAME 3 9'99"9 GELORMINIG. GELORMINI CAGNES-SUR-MER BONDOH.E. BONDO G EE A 2 700 - AUT 2 \n