Home > Software engineering >  NaN in single column while importing data from URL
NaN in single column while importing data from URL

Time:10-01

I am trying to import all 9 columns of the popular MPG dataset from UCI from a URL. The problem is , instead of the string values showing, Carname (the ninth column) is populated by NaN.

What is going wrong and how can one fix this? The link to the repository shows that the original dataset has 9 columns, so this should work.

From the URL and we find that the data looks like

18.0   8   307.0      130.0      3504.      12.0   70  1    "chevrolet chevelle malibu"
15.0   8   350.0      165.0      3693.      11.5   70  1    "buick skylark 320"

with unique string values on the Carname but when we import it as

import pandas as pd
# Import raw dataset from URL
url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data'
column_names = ['MPG', 'Cylinders', 'Displacement', 'Horsepower', 
                'Weight', 'Acceleration', 'Model Year', 'Origin',  'Carname']

data = pd.read_csv(url, names=column_names,
                      na_values='?', comment='\t',
                      sep=' ', skipinitialspace=True)
data.head(3)

yielding (with NaN values on Carname)

    MPG     Cylinders   Displacement    Horsepower  Weight  Acceleration    Model Year  Origin  Carname
0   18.0    8           307.0           130.0       3504.0   12.0           70          1       NaN
1   15.0    8           350.0           165.0       3693.0   11.5           70          1       NaN

CodePudding user response:

It’s literally in your read_csv call: comment='\t'. The only tabs are before the Carname field, which means the way you read the fle explicitely ignores that column.

You can remove the comment parameter and use the more generic separator \s instead to split on any whitespace (one or more spaces, a tab, etc.):

>>> pd.read_csv(url, names=column_names, na_values='?', sep='\s ')
      MPG  Cylinders  Displacement  Horsepower  Weight  Acceleration  Model Year  Origin                    Carname
0    18.0          8         307.0       130.0  3504.0          12.0          70       1  chevrolet chevelle malibu
1    15.0          8         350.0       165.0  3693.0          11.5          70       1          buick skylark 320
2    18.0          8         318.0       150.0  3436.0          11.0          70       1         plymouth satellite
3    16.0          8         304.0       150.0  3433.0          12.0          70       1              amc rebel sst
4    17.0          8         302.0       140.0  3449.0          10.5          70       1                ford torino
..    ...        ...           ...         ...     ...           ...         ...     ...                        ...
393  27.0          4         140.0        86.0  2790.0          15.6          82       1            ford mustang gl
394  44.0          4          97.0        52.0  2130.0          24.6          82       2                  vw pickup
395  32.0          4         135.0        84.0  2295.0          11.6          82       1              dodge rampage
396  28.0          4         120.0        79.0  2625.0          18.6          82       1                ford ranger
397  31.0          4         119.0        82.0  2720.0          19.4          82       1                 chevy s-10

[398 rows x 9 columns]
  • Related