When I try to parse a wiki page for its tables, the tables are read correctly except for the date of birth column, which comes back as empty. Is there a workaround for this? I've tried using beautiful soup but I get the same result. The code I've used is as follows:
url = 'https://en.wikipedia.org/wiki/2002_FIFA_World_Cup_squads'
pd.read_html(url)
Here's an image of one of the tables in question:
CodePudding user response:
One possible solution can be alter the page content with beautifulsoup
and then load it to pandas
:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "https://en.wikipedia.org/wiki/2002_FIFA_World_Cup_squads"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
# select correct table, here I select the first one:
tbl = soup.select("table")[0]
# remove the (aged XX) part:
for td in tbl.select("td:nth-of-type(3)"):
td.string = td.contents[-1].split("(")[0]
df = pd.read_html(str(tbl))[0]
print(df)
Prints:
No. Pos. Player Date of birth (age) Caps Club
0 1 GK Thomas Sørensen 12 June 1976 14 Sunderland
1 2 MF Stig Tøfting 14 August 1969 36 Bolton Wanderers
2 3 DF René Henriksen 27 August 1969 39 Panathinaikos
3 4 DF Martin Laursen 26 July 1977 15 Milan
4 5 DF Jan Heintze (c) 17 August 1963 83 PSV Eindhoven
5 6 DF Thomas Helveg 24 June 1971 67 Milan
6 7 MF Thomas Gravesen 11 March 1976 22 Everton
7 8 MF Jesper Grønkjær 12 August 1977 25 Chelsea
8 9 FW Jon Dahl Tomasson 29 August 1976 38 Feyenoord
9 10 MF Martin Jørgensen 6 October 1975 32 Udinese
10 11 FW Ebbe Sand 19 July 1972 44 Schalke 04
11 12 DF Niclas Jensen 17 August 1974 8 Manchester City
12 13 DF Steven Lustü 13 April 1971 4 Lyn
13 14 MF Claus Jensen 29 April 1977 13 Charlton Athletic
14 15 MF Jan Michaelsen 28 November 1970 11 Panathinaikos
15 16 GK Peter Kjær 5 November 1965 4 Aberdeen
16 17 MF Christian Poulsen 28 February 1980 3 Copenhagen
17 18 FW Peter Løvenkrands 29 January 1980 4 Rangers
18 19 MF Dennis Rommedahl 22 July 1978 19 PSV Eindhoven
19 20 DF Kasper Bøgelund 8 October 1980 2 PSV Eindhoven
20 21 FW Peter Madsen 26 April 1978 4 Brøndby
21 22 GK Jesper Christiansen 24 April 1978 0 Vejle
22 23 MF Brian Steen Nielsen 28 December 1968 65 Malmö FF
CodePudding user response:
Try setting the parse_dates parameter to True inside read_html method.