I am trying to follow chapter 3 of Hands-On Machine Learning with Scikit-Learn and TensorFlow for classification of MNIST data. The command runs as follows in Jupyter notebook:
>>> from sklearn.datasets import fetch_openml
>>> mnist = fetch_openml('mnist_784', version=1)
>>> mnist.keys()
dict_keys(['data', 'target', 'feature_names', 'DESCR', 'details',
'categories', 'url'])
>>> X, y = mnist["data"], mnist["target"]
>>> X.shape
(70000, 784)
>>> y.shape
(70000,)
The following command throws error
>>> some_digit = X[0]
Error message:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3360 try:
-> 3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
~/anaconda3/lib/python3.7/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
~/anaconda3/lib/python3.7/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 0
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-43-348a6e96ae02> in <module>
----> 1 some_digit = X[0]
~/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
3456 if self.columns.nlevels > 1:
3457 return self._getitem_multilevel(key)
-> 3458 indexer = self.columns.get_loc(key)
3459 if is_integer(indexer):
3460 indexer = [indexer]
~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
-> 3363 raise KeyError(key) from err
3364
3365 if is_scalar(key) and isna(key) and not self.hasnans:
KeyError: 0
It is hard for me understand what the actual error is as I have not come across similar one for such a simple assignment. What is causing the issue?
CodePudding user response:
X
is a dataframe so if you use X[0]
, it means you are looking for a column named "0". If you want the first row (index) of your dataframe, you have to use .loc
or .iloc
. In your case both methods are equivalent (only) because the index is numeric, start from 0 and continuous:
# Extract the first row as a Series
>>> X.loc[0]
pixel1 0.0
pixel2 0.0
pixel3 0.0
pixel4 0.0
pixel5 0.0
...
pixel780 0.0
pixel781 0.0
pixel782 0.0
pixel783 0.0
pixel784 0.0
Name: 0, Length: 784, dtype: float64
# Extract a pixel by label
>>> X.loc[0, 'pixel7']
0.0
# Extract the same pixel by position
>>> X.iloc[0, 6]
0.0
Update
Probably iloc would be more appropriate here
If you want to use iloc
, prefer use numpy
instead of pandas
and convert data
and target
columns as array:
X, y = mnist["data"].to_numpy(), mnist["target"].to_numpy()