Selective columns not working(showing error) in CSV file (with Python)-CodePudding

The following is my CSV file with the name runs.csv:

PLAYER NAME, TEST 1, TEST 2, TEST 3, TEST 4, TEST 5
Sachin Tendulkar, 167, 134, 108, 100, 89
Rohit Sharma, 147, 78, 101, 36, 23
Mayank Aggarwal, 230, 143, 67, 90, 21
Virendar Sehwag, 75, 44, 12, 8, 98
M.S. Dhoni, 176, 234, 106, 86, 33
Yuvraj Singh, 445, 239, 123, 215, 67
KL Rahul, 290,  128, 76, 111, 336
Kapil Dev, 104, 87, 65, 90, 200
Sunil Gavaskar, 202, 103, 65, 21, 460
K. Srikanth, 222, 110, 97, 34, 02
Mahendar Amarnath, 12, 43, 87, 267, 341
Ajinkya Rahane, 123, 38, 01, 17, 66

The following is my program written in jupyter notebook:

import pandas as pd
cols = ["PLAYER NAME", "TEST 4"]
runs = pd.read_csv("runs.csv")
print(runs[cols])

I am getting the following error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-28-1eb63cfedd09> in <module>
      2 cols = ["PLAYER NAME", "TEST 4"]
      3 runs = pd.read_csv("runs.csv")
----> 4 print(runs[cols])

~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2906             if is_iterator(key):
   2907                 key = list(key)
-> 2908             indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1]
   2909 
   2910         # take() does not accept boolean indexers

~\anaconda3\lib\site-packages\pandas\core\indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
   1252             keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
   1253 
-> 1254         self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
   1255         return keyarr, indexer
   1256 

~\anaconda3\lib\site-packages\pandas\core\indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
   1302             if raise_missing:
   1303                 not_found = list(set(key) - set(ax))
-> 1304                 raise KeyError(f"{not_found} not in index")
   1305 
   1306             # we skip the warning on Categorical

KeyError: "['TEST 4'] not in index"

I don't what problem is going on with my code. I think that it's written correctly but still I'm encountering an error. Please resolve this. Thanks in advance.

CodePudding user response：

If you check the column names:

runs.columns

You'll notice that there's a white space in front of your 'TEST' column names.

Index(['PLAYER NAME', ' TEST 1', ' TEST 2', ' TEST 3', ' TEST 4', ' TEST 5'], dtype='object')

You'll need to either clear the white space by applying the strip() string method:

runs.columns = runs.columns.str.strip()

or change your cols variable to:

cols = ["PLAYER NAME", " TEST 4"]

to match the column name.

CodePudding user response：

You can either do this:

runs = pd.read_csv("runs.csv", sep=",\s*", engine="python")

Or this:

runs.columns = runs.columns.str.strip()