I am trying to subsample the cifar100 dataset to train and test on one subclass from each superclass. I have it set up so that if a value in y_full (the subclass label for each image) matches my list of subclasses that I want, the index of that element is used to grab a value from X_full (the images) with the same index.
This is my code so far:
from sklearn.model_selection import train_test_split
cifar100 = keras.datasets.cifar100
(X_full, y_full), (X_test_full, y_test_full) = cifar100.load_data(label_mode="fine")
classes = [0,1,2,3,4,5,6,8,9,12,15,22,23,26,27,34,36,41,47,54]
X_tr_full = []
y_tr_full = []
X_test = []
y_test = []
for i in y_full:
if i in classes:
X_tr_full.append(X_full[np.where(y_full==i)])
y_tr_full.append(i)
for i in y_test_full:
if i in classes:
X_test.append(X_test_full[np.where(y_test_full==i)])
y_test.append(i)
The problem with my code is in the np.where(y_full==i)
. This sends back a tuple of ALL of the indices in y_full that have a value that matches a class in my list, which then adds ALL images from X_full with those indices into one entry. Instead I want to iterate through the entirety of y_full, if the class label matches my class list, I want the index of that element to be used to append the value from X_full with that same index for every value in y_full.
Sorry if I'm not clear enough, it's hard to explain what I'm trying to do, but hopefully someone can help!
CodePudding user response:
I think I got it figured out. It was pretty simple once I figured out how to call each index separate from each other:
for n in range(y_full.size):
if y_full[n] in classes:
X_tr_full.append(X_full[n])
for i in y_full:
if i in classes:
y_tr_full.append(i)
for n in range(y_test_full.size):
if y_test_full[n] in classes:
X_test.append(X_test_full[n])
for i in y_test_full:
if i in classes:
y_test.append(i)
CodePudding user response:
To illustrate my comment, I'll use a simple example of modulus testing
In [224]: arr = np.arange(10); alist = []
In [225]: for i in [2,3]:
...: alist.append(arr[arr%i>0])
...:
In [226]: alist
Out[226]: [array([1, 3, 5, 7, 9]), array([1, 2, 4, 5, 7, 8])]
I get a list of arrays, which can be joined into one array with:
In [227]: np.hstack(alist)
Out[227]: array([1, 3, 5, 7, 9, 1, 2, 4, 5, 7, 8])
Alternatively with extend
:
In [228]: arr = np.arange(10); alist = []
In [229]: for i in [2,3]:
...: alist.extend(arr[arr%i>0])
...:
In [230]: alist
Out[230]: [1, 3, 5, 7, 9, 1, 2, 4, 5, 7, 8]
In [231]: np.array(alist)
Out[231]: array([1, 3, 5, 7, 9, 1, 2, 4, 5, 7, 8])
extend
replaces your iterative append
.