Home > Mobile >  How convert digits dataset of scikit-learn to pandas DataFrame?
How convert digits dataset of scikit-learn to pandas DataFrame?

Time:08-22

I have seen a lot of people convert the classic "iris" dataset of scikit-learn to pandas DataFrame since exploratory data analysis is easier with pandas. But I would like to know if there was any way of converting the "digits" dataset in a similar manner. Using the scikit learn dataset as it is, is a little bit harder for me. Can someone please help me with this issue?

CodePudding user response:

Try:

from sklearn.datasets import load_digits

digits = load_digits()
df = pd.DataFrame(np.column_stack([digits['data'], digits['target']]), columns=digits['feature_names']   ['target'])

CodePudding user response:

We can load this dataset like the below. (We can read this information at the end of this dataset) this dataset has Number of Instances: 1797 and for each num has Number of Attributes: 64 Or Attribute Information: 8x8 image of integer pixels in the range 0..16

from sklearn.datasets import load_digits
digits = load_digits()
digits

{'data': array([[ 0.,  0.,  5., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ..., 10.,  0.,  0.],
        [ 0.,  0.,  0., ..., 16.,  9.,  0.],
        ...,
        [ 0.,  0.,  1., ...,  6.,  0.,  0.],
        [ 0.,  0.,  2., ..., 12.,  0.,  0.],
        [ 0.,  0., 10., ..., 12.,  1.,  0.]]),
 'target': array([0, 1, 2, ..., 8, 9, 8]),
...
'images': array([[[ 0.,  0.,  5., ...,  1.,  0.,  0.],
         [ 0.,  0., 13., ..., 15.,  5.,  0.],
         [ 0.,  3., 15., ..., 11.,  8.,  0.],
         ...,
         [ 0.,  4., 11., ..., 12.,  7.,  0.],
         [ 0.,  2., 14., ..., 12.,  0.,  0.],
         [ 0.,  0.,  6., ...,  0.,  0.,  0.]],
 ...
}

We can create pandas.dataframe for 64 feature of each image and label like below:

import pandas as pd
df = pd.DataFrame(digits['data'])
df['label'] = digits['target']
print(df)

        0    1     2     3     4     5    6    7    8    9  ...   55   56  \
0     0.0  0.0   5.0  13.0   9.0   1.0  0.0  0.0  0.0  0.0  ...  0.0  0.0   
1     0.0  0.0   0.0  12.0  13.0   5.0  0.0  0.0  0.0  0.0  ...  0.0  0.0   
2     0.0  0.0   0.0   4.0  15.0  12.0  0.0  0.0  0.0  0.0  ...  0.0  0.0   
3     0.0  0.0   7.0  15.0  13.0   1.0  0.0  0.0  0.0  8.0  ...  0.0  0.0   
4     0.0  0.0   0.0   1.0  11.0   0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0   
...   ...  ...   ...   ...   ...   ...  ...  ...  ...  ...  ...  ...  ...   
1792  0.0  0.0   4.0  10.0  13.0   6.0  0.0  0.0  0.0  1.0  ...  0.0  0.0   
1793  0.0  0.0   6.0  16.0  13.0  11.0  1.0  0.0  0.0  0.0  ...  0.0  0.0   
1794  0.0  0.0   1.0  11.0  15.0   1.0  0.0  0.0  0.0  0.0  ...  0.0  0.0   
1795  0.0  0.0   2.0  10.0   7.0   0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0   
1796  0.0  0.0  10.0  14.0   8.0   1.0  0.0  0.0  0.0  2.0  ...  0.0  0.0   

       57   58    59    60    61   62   63  label  
0     0.0  6.0  13.0  10.0   0.0  0.0  0.0      0  
1     0.0  0.0  11.0  16.0  10.0  0.0  0.0      1  
2     0.0  0.0   3.0  11.0  16.0  9.0  0.0      2  
3     0.0  7.0  13.0  13.0   9.0  0.0  0.0      3  
4     0.0  0.0   2.0  16.0   4.0  0.0  0.0      4  
...   ...  ...   ...   ...   ...  ...  ...    ...  
1792  0.0  2.0  14.0  15.0   9.0  0.0  0.0      9  
1793  0.0  6.0  16.0  14.0   6.0  0.0  0.0      0  
1794  0.0  2.0   9.0  13.0   6.0  0.0  0.0      8  
1795  0.0  5.0  12.0  16.0  12.0  0.0  0.0      9  
1796  1.0  8.0  12.0  14.0  12.0  1.0  0.0      8  

[1797 rows x 65 columns]

We can show multiple images for specific num like the below:

import matplotlib.pyplot as plt
num_for_show = 6
for row in df[df['label'].eq(num_for_show)].values:
    plt.imshow(row[:64].reshape(8,8))
    plt.show()

enter image description here

enter image description here


We can show one image from digits['images'] like the below: (Shape of this data is 8x8 and we don't need to reshape(8,8).)

plt.imshow(digits['images'][10])

enter image description here

  • Related