I found similar questions however the few that had accepted answers did not work for me. The following is my code for a binary classifier:

from google.colab import drive
drive.mount('/content/drive')

df = pd.read_csv('/content/drive/My Drive/dielectron.csv')
df = df.drop('Run', axis=1); df = df.drop('M', axis=1)
df.info()
df.head()

scaler = MinMaxScaler()

index = df.index.to_list() 
columns = df.columns.tolist()

scaler = MinMaxScaler()

df_scaled = scaler.fit_transform(df)
Df = pd.DataFrame(df_scaled , index=index , columns=columns)
Df.info()

Df= Df.drop('Event', axis=1)
x = Df.drop('Q2', axis=1).to_numpy()
y = Df['Q2']
y = np.asarray(y).astype('float32').reshape((-1,1))

model = tf.keras.Sequential([
                                   tf.keras.layers.Dense(16, activation='relu'),
                                   tf.keras.layers.Dense(16, activation='relu'),
                                   tf.keras.layers.Dense(16, activation='sigmoid')
])

epochs = 20


es = tf.keras.callbacks.EarlyStopping(monitor='val loss',
                                      patience = 3,
                                      mode = 'min',
                                      restore_best_weights=True)

model.compile(loss= tf.keras.losses.BinaryCrossentropy(),
              optimizer= tf.optimizers.Adam(),
              metrics= [tf.keras.metrics.BinaryAccuracy()]
)

history = model.fit(x, y, epochs=epochs, validation_split=0.3, callbacks=[es])

running x.shape and y.shape for the x and y that being fed into the mode.fit() returns these values:

x.shape:(100000, 15)
y.shape:(100000, 1)

Im sorry if there's any blatant mistakes, I'm relatively inexperienced with ML and DL and tf keras.

running this code returns the following error:

ValueError: in user code:

    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1021, in train_function  *
        return step_function(self, iterator)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1010, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1000, in run_step  **
        outputs = model.train_step(data)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 860, in train_step
        loss = self.compute_loss(x, y, y_pred, sample_weight)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 919, in compute_loss
        y, y_pred, sample_weight, regularization_losses=self.losses)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/compile_utils.py", line 201, in __call__
        loss_value = loss_obj(y_t, y_p, sample_weight=sw)
    File "/usr/local/lib/python3.7/dist-packages/keras/losses.py", line 141, in __call__
        losses = call_fn(y_true, y_pred)
    File "/usr/local/lib/python3.7/dist-packages/keras/losses.py", line 245, in call  **
        return ag_fn(y_true, y_pred, **self._fn_kwargs)
    File "/usr/local/lib/python3.7/dist-packages/keras/losses.py", line 1932, in binary_crossentropy
        backend.binary_crossentropy(y_true, y_pred, from_logits=from_logits),
    File "/usr/local/lib/python3.7/dist-packages/keras/backend.py", line 5247, in binary_crossentropy
        return tf.nn.sigmoid_cross_entropy_with_logits(labels=target, logits=output)

    ValueError: `logits` and `labels` must have the same shape, received ((None, 16) vs (None, 1)).

The dataset that im using can be found at This website also can anyone explain to me what exactly a logit is? I was going off the context and guessing that its something to do with the features however looking it up yielded conflicting answers

CodePudding user response：

Logits and classification

For classification, you usually need to convert a vector of raw values to a probability distribution, i.e., to a vector whose elements are in [0,1] and sum up to 1. In this context, "logits" refer to the raw values before the conversion.

For classification between two classes (binary classification), the converted vector only needs one element to represent the probability p₀ of the input belonging to class 0 as p₁ is implicitly 1 - p₀. Conversion from logits to distribution in this case is done using Sigmoid function, and the loss function is usually Binary Cross Entropy (BCE).
For classification between more than two classes, you will need to one-hot encode the distribution. That is, you'll want the number of elements in the converted vector to be the same as the number of classes. The n-th element then represents p_n, i.e., probability of the input belonging to the n-th class. In this case, conversion from logits to distribution will be done using Softmax function, and the loss function is usually Categorical Cross Entropy (CCE).

Note that nothing prevents you from one-hot encoding a binary distribution, i.e., having a converted vector with two elements representing p₀ and p₁ separately. However, Tensorflow implementation of BCE loss assumes that the binary distribution is not one-hot encoded.

Answer

Since your dataset has y.shape:(100000, 1), it is a binary classification dataset. This requires the output of your network to be vectors of size 1 instead of 16.

Furthermore, if you use Tensorflow BCE loss function, you also have an option to specify (via the from_logits argument) whether the size-1 prediction vector fed to the function contains the raw logits or the distribution. When from_logits=True, the function will first apply sigmoid on the prediction vector, then calculate the usual BCE.

So, simply specify your model and loss function as (ignore the arrow marks)

model = tf.keras.Sequential([
                                   tf.keras.layers.Dense(16, activation='relu'),
                                   tf.keras.layers.Dense(16, activation='relu'),
                                   tf.keras.layers.Dense(1, activation='sigmoid') <---
])

model.compile(loss=tf.keras.losses.BinaryCrossentropy(),
              optimizer= tf.optimizers.Adam(),
              metrics=[tf.keras.metrics.BinaryAccuracy()]
)

model = tf.keras.Sequential([
                                   tf.keras.layers.Dense(16, activation='relu'),
                                   tf.keras.layers.Dense(16, activation='relu'),
                                   tf.keras.layers.Dense(1) <---
])

model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True) <---,
              optimizer= tf.optimizers.Adam(),
              metrics=[tf.keras.metrics.BinaryAccuracy()]
)