so when im tried to read life expectancy at birth in indonesia(https://data.worldbank.org/indicator/SP.DYN.LE00.IN?locations=ID this is the link if you wanna check it out) simply i can`t, here its my code
import pandas as pd
import matplotlib.pyplot as plt
lifeexpectacion = pd.read_csv("API_SP.DYN.LE00.IN_DS2_en_csv_v2_4770434.csv")
print(lifeexpectacion)
and the error is
File "D:\programaizar\data economy\main.py", line 4, in <module>
lifeexpectacion = pd.read_csv("API_SP.DYN.LE00.IN_DS2_en_csv_v2_4770434.csv")
CodePudding user response:
The first 4 rows of the CSV has information such as title, last updated date etc. You need to skip the first 4 rows of your datafile. Use pd.read_csv("API_SP.DYN.LE00.IN_DS2_en_csv_v2_4770434.csv", skiprows=4)
CodePudding user response:
I downloaded the linked file to see if I could recreate the error. This post had a similar issue.
The first four lines of that csv are:
"Data Source","World Development Indicators",
"Last Updated Date","2022-12-22",
If you remove these lines it works as expected. It's metadata that confuses pandas into thinking there should only be two columns when there's actually sixty-seven.
CodePudding user response:
works for me
df = pd.read_csv(r'D:\temp\API_SP.DYN.LE00.IN_DS2_en_csv_v2_4770434.csv',skiprows=4)
df
Out[149]:
Country Name Country Code ... 2021 Unnamed: 66
0 Aruba ABW ... NaN NaN
1 Africa Eastern and Southern AFE ... NaN NaN
2 Afghanistan AFG ... NaN NaN
3 Africa Western and Central AFW ... NaN NaN
4 Angola AGO ... NaN NaN
.. ... ... ... ... ...
261 Kosovo XKX ... NaN NaN
262 Yemen, Rep. YEM ... NaN NaN
263 South Africa ZAF ... NaN NaN
264 Zambia ZMB ... NaN NaN
265 Zimbabwe ZWE ... NaN NaN
[266 rows x 67 columns]
df.columns
Out[150]:
Index(['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code',
'1960', '1961', '1962', '1963', '1964', '1965', '1966', '1967', '1968',
'1969', '1970', '1971', '1972', '1973', '1974', '1975', '1976', '1977',
'1978', '1979', '1980', '1981', '1982', '1983', '1984', '1985', '1986',
'1987', '1988', '1989', '1990', '1991', '1992', '1993', '1994', '1995',
'1996', '1997', '1998', '1999', '2000', '2001', '2002', '2003', '2004',
'2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013',
'2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021',
'Unnamed: 66'],
dtype='object')