I am loading a dataset of handwritten images
import numpy as np
import matplotlib.pyplot as plt
from tensorflow import keras
from tensorflow.keras import layers
train_data= np.loadtxt('train.txt')
print('train:',train_data.shape) ##train (7291, 257)
The first digit of each row is a digit from 0-9(labels), and the rest 256 are images. How can I separate these labels from the images? What I am thinking is to make a new tensor with every first digit of each row, and another one with the rest of the digits. Since I am a beginner I am not sure how to do it or if my approach is correct.
CodePudding user response:
You need to learn numpy indexing: https://numpy.org.cn/en/user/basics/indexing.html
In your case, just do
labels = train_data[:, 0]
images = train_data[:, 1:]
CodePudding user response:
The first digit is label. i.e The first column is label, so if you see it like this
col1 | col2 | .... | coln
label| ................
label| ..................
Now you want to separate labels from the rest, so you want 1st column. To do so in Numpy, you need to index. The syntax is simple,
train_data= np.loadtxt('train.txt')
y_train = train_data[:, 0] # All Rows, col 0 (0 is 1st col since indexing starts from 0)
# y_train is commonly referred as training labels
x_train = train_data[:, 1:] # All rows, col 1 included and onwards.
Hope this is clear.