how made cross-validation with python?-CodePudding

Hi i made a neural network and i need to do a cross validation. I don't know how made that, specifically how train or made that.

if someone knows made that please write or give me some indications.

here is my code:

###Division Train / Test
X = df.drop('Peso secado',axis=1)  #Variables de entrada, menos la variable de salida
y = df['Peso secado']              #Variable de salida

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3,random_state=101)

###

from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_train= scaler.fit_transform(X_train)
X_train
X_test = scaler.transform(X_test)
X_test



###Creacion del modelo###
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.optimizers import Adam
import tensorflow as tf

model = Sequential()
num_neuronas = 50
model.add(tf.keras.layers.Dense(units=6, activation='sigmoid', input_shape=(6, )))
model.add(Dense(num_neuronas,activation='relu'))
model.add(tf.keras.layers.Dense(units=1, activation='linear')) 

#Buscar mejor funcion de activacion para capa de salida sigmoid? o linear?
model.summary()
model.compile(optimizer='adam',loss='mse')

###Entrenamiento###
model.fit(x = X_train, y = y_train.values,
          validation_data=(X_test,y_test.values), batch_size=10, epochs=1000) 

losses = pd.DataFrame(model.history.history)  
losses
losses.plot()
   
###Evaluacion###
from sklearn.metrics import mean_squared_error,mean_absolute_error,explained_variance_score,mean_absolute_percentage_error
X_test
predictions = model.predict(X_test)
mean_absolute_error(y_test,predictions)
mean_absolute_percentage_error(y_test,predictions)

mean_squared_error(y_test,predictions)
explained_variance_score(y_test,predictions)  

mean_absolute_error(y_test,predictions)/df['Peso secado'].mean() 
mean_absolute_error(y_test,predictions)/df['Peso secado'].median()

Some recomendation for training or validation would be helpful

CodePudding user response：

My first observation is that the code is pretty ugly and unstructured. You should import the modules on the top part of your code

For performing cross validation first import the module from sklearn (and all other modules that you need)

from sklearn.model_selection import StratifiedKFold

I'd put the model definition in a separate function as such:

def get_model():
  model = Sequential()
  model.add(Dense(4, input_dim=8, activation='relu'))
  model.add(Dense(1, activation='sigmoid'))
  model.compile(loss='binary_crossentropy', optimizer='adam')
  return model

Define your variables and if you are working with tensorflow / Keras, do something like this:

BATCH_SIZE = 64  #  128
EPOCHS = 100

k = 10
# Use stratified k-fold if the data is imbalanced
kf = StratifiedKFold(n_splits=k, shuffle=False, random_state=None)

# here comes the Cross validation
fold_index = 1
for train_index, test_index in kf.split(X, y):
            X_train = X[train_index]
            y_train = y[train_index]

            X_test = X[test_index]
            y_test = y[test_index]

            # fit the model on the training set
            model = get_model()

            model.fit(
                X_train,
                y_train,
                batch_size=BATCH_SIZE,
                epochs=EPOCHS,
                verbose=0,
                validation_data=(X_test, y_test),
            )

            # predict values
            # pred_values = model.predict(X_test)
            pred_values_prob = np.array(model(X_test))

Note: when working with tensorflow you need to define a new model every time in the loop. This is not the case with sklearn as sklearn starts with fresh initialized weights when called. Here you need to do that separately.