Home > Software engineering >  pandas read_csv fails when trying to read csv
pandas read_csv fails when trying to read csv

Time:01-01

so when im tried to read life expectancy at birth in indonesia(https://data.worldbank.org/indicator/SP.DYN.LE00.IN?locations=ID this is the link if you wanna check it out) simply i can`t, here its my code

import pandas as pd
import matplotlib.pyplot as plt

lifeexpectacion = pd.read_csv("API_SP.DYN.LE00.IN_DS2_en_csv_v2_4770434.csv")
print(lifeexpectacion)

and the error is

 File "D:\programaizar\data economy\main.py", line 4, in <module>
lifeexpectacion = pd.read_csv("API_SP.DYN.LE00.IN_DS2_en_csv_v2_4770434.csv")

CodePudding user response:

The first 4 rows of the CSV has information such as title, last updated date etc. You need to skip the first 4 rows of your datafile. Use pd.read_csv("API_SP.DYN.LE00.IN_DS2_en_csv_v2_4770434.csv", skiprows=4)

CodePudding user response:

I downloaded the linked file to see if I could recreate the error. This post had a similar issue.

The first four lines of that csv are:

"Data Source","World Development Indicators",

"Last Updated Date","2022-12-22",

If you remove these lines it works as expected. It's metadata that confuses pandas into thinking there should only be two columns when there's actually sixty-seven.

CodePudding user response:

works for me

df = pd.read_csv(r'D:\temp\API_SP.DYN.LE00.IN_DS2_en_csv_v2_4770434.csv',skiprows=4)
    df
    Out[149]: 
                        Country Name Country Code  ... 2021 Unnamed: 66
    0                          Aruba          ABW  ...  NaN         NaN
    1    Africa Eastern and Southern          AFE  ...  NaN         NaN
    2                    Afghanistan          AFG  ...  NaN         NaN
    3     Africa Western and Central          AFW  ...  NaN         NaN
    4                         Angola          AGO  ...  NaN         NaN
    ..                           ...          ...  ...  ...         ...
    261                       Kosovo          XKX  ...  NaN         NaN
    262                  Yemen, Rep.          YEM  ...  NaN         NaN
    263                 South Africa          ZAF  ...  NaN         NaN
    264                       Zambia          ZMB  ...  NaN         NaN
    265                     Zimbabwe          ZWE  ...  NaN         NaN
    [266 rows x 67 columns]

df.columns
Out[150]: 
Index(['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code',
       '1960', '1961', '1962', '1963', '1964', '1965', '1966', '1967', '1968',
       '1969', '1970', '1971', '1972', '1973', '1974', '1975', '1976', '1977',
       '1978', '1979', '1980', '1981', '1982', '1983', '1984', '1985', '1986',
       '1987', '1988', '1989', '1990', '1991', '1992', '1993', '1994', '1995',
       '1996', '1997', '1998', '1999', '2000', '2001', '2002', '2003', '2004',
       '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013',
       '2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021',
       'Unnamed: 66'],
      dtype='object')
  • Related