I have seen a lot of people convert the classic "iris" dataset of scikit-learn to pandas DataFrame since exploratory data analysis is easier with pandas. But I would like to know if there was any way of converting the "digits" dataset in a similar manner. Using the scikit learn dataset as it is, is a little bit harder for me. Can someone please help me with this issue?
CodePudding user response:
Try:
from sklearn.datasets import load_digits
digits = load_digits()
df = pd.DataFrame(np.column_stack([digits['data'], digits['target']]), columns=digits['feature_names'] ['target'])
CodePudding user response:
We can load this dataset like the below. (We can read this information at the end of this dataset) this dataset has Number of Instances: 1797
and for each num has Number of Attributes: 64
Or Attribute Information: 8x8 image of integer pixels in the range 0..16
from sklearn.datasets import load_digits
digits = load_digits()
digits
{'data': array([[ 0., 0., 5., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 10., 0., 0.],
[ 0., 0., 0., ..., 16., 9., 0.],
...,
[ 0., 0., 1., ..., 6., 0., 0.],
[ 0., 0., 2., ..., 12., 0., 0.],
[ 0., 0., 10., ..., 12., 1., 0.]]),
'target': array([0, 1, 2, ..., 8, 9, 8]),
...
'images': array([[[ 0., 0., 5., ..., 1., 0., 0.],
[ 0., 0., 13., ..., 15., 5., 0.],
[ 0., 3., 15., ..., 11., 8., 0.],
...,
[ 0., 4., 11., ..., 12., 7., 0.],
[ 0., 2., 14., ..., 12., 0., 0.],
[ 0., 0., 6., ..., 0., 0., 0.]],
...
}
We can create pandas.dataframe
for 64 feature of each image and label like below:
import pandas as pd
df = pd.DataFrame(digits['data'])
df['label'] = digits['target']
print(df)
0 1 2 3 4 5 6 7 8 9 ... 55 56 \
0 0.0 0.0 5.0 13.0 9.0 1.0 0.0 0.0 0.0 0.0 ... 0.0 0.0
1 0.0 0.0 0.0 12.0 13.0 5.0 0.0 0.0 0.0 0.0 ... 0.0 0.0
2 0.0 0.0 0.0 4.0 15.0 12.0 0.0 0.0 0.0 0.0 ... 0.0 0.0
3 0.0 0.0 7.0 15.0 13.0 1.0 0.0 0.0 0.0 8.0 ... 0.0 0.0
4 0.0 0.0 0.0 1.0 11.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ...
1792 0.0 0.0 4.0 10.0 13.0 6.0 0.0 0.0 0.0 1.0 ... 0.0 0.0
1793 0.0 0.0 6.0 16.0 13.0 11.0 1.0 0.0 0.0 0.0 ... 0.0 0.0
1794 0.0 0.0 1.0 11.0 15.0 1.0 0.0 0.0 0.0 0.0 ... 0.0 0.0
1795 0.0 0.0 2.0 10.0 7.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0
1796 0.0 0.0 10.0 14.0 8.0 1.0 0.0 0.0 0.0 2.0 ... 0.0 0.0
57 58 59 60 61 62 63 label
0 0.0 6.0 13.0 10.0 0.0 0.0 0.0 0
1 0.0 0.0 11.0 16.0 10.0 0.0 0.0 1
2 0.0 0.0 3.0 11.0 16.0 9.0 0.0 2
3 0.0 7.0 13.0 13.0 9.0 0.0 0.0 3
4 0.0 0.0 2.0 16.0 4.0 0.0 0.0 4
... ... ... ... ... ... ... ... ...
1792 0.0 2.0 14.0 15.0 9.0 0.0 0.0 9
1793 0.0 6.0 16.0 14.0 6.0 0.0 0.0 0
1794 0.0 2.0 9.0 13.0 6.0 0.0 0.0 8
1795 0.0 5.0 12.0 16.0 12.0 0.0 0.0 9
1796 1.0 8.0 12.0 14.0 12.0 1.0 0.0 8
[1797 rows x 65 columns]
We can show multiple images for specific num like the below:
import matplotlib.pyplot as plt
num_for_show = 6
for row in df[df['label'].eq(num_for_show)].values:
plt.imshow(row[:64].reshape(8,8))
plt.show()
We can show one image from digits['images']
like the below: (Shape of this data is 8x8 and we don't need to reshape(8,8)
.)
plt.imshow(digits['images'][10])