Neural network: predictions are just the same prediction despite changes in the model parameters-CodePudding

I'd like to fit a neural network using brulee but despite the several changes in the model parameters (changes in all the parameters), I always have the almost same value in the predictions. In my case:

# Open the data set
data_train_sub <- read.csv("https://raw.githubusercontent.com/Leprechault/trash/main/cc_test_ds.csv")

# Model parameters
hidden_units <-c(4)
epochs <-c(50)
dropout <-c(0.01)
learn_rate <- c(0.01)
activation <- c("relu")
penalty <- c(0.01)
validation <-c(0.80)

# Training data set
data_train <- data_train_sub[1:1250,]

# Validation data set
data_test <-  data_train_sub[1251:1500,]

# Model fitting
fit <- brulee_mlp(x = as.matrix(data_train[, 2:ncol(data_train)]),
               y = data_train$cc,
               hidden_units = hidden_units,
               epochs = epochs, dropout = dropout, learn_rate = learn_rate, activation = activation,
               penalty = penalty,validation=validation)
#Plot                
predict(fit, data_test) %>%
   bind_cols(data_test) %>%
   ggplot(aes(x = .pred, y = cc))  
   geom_abline(col = "green")  
   geom_point(alpha = .3)  
   lims(x = c(0, 1.0), y = c(0, 1.0))  
   coord_fixed(ratio = 1)

This sounds strange to me. I would appreciate any help.

Thanks in advance!

CodePudding user response：

The main issues were the outliers mentioned above and that you needed to standardize your predictors to be on the same scale.

Although the model doesn't fit great, here is a modified version with more complexity (but gives different predicted values). I also added PCA which helps a small amount (but you could leave that step out of the recipe).

library(tidymodels)
library(brulee)


tidymodels_prefer()
theme_set(theme_bw())
options(pillar.advice = FALSE, pillar.min_title_chars = Inf)


data_train_sub <- read.csv("https://raw.githubusercontent.com/Leprechault/trash/main/cc_test_ds.csv")

# Model parameters
hidden_units <-c(50) # more hidden units
epochs <-c(500) # more iterations
dropout <-c(0) # since we are using penalization, no dropout
learn_rate <- c(0.01)
activation <- c("relu")
penalty <- c(0.01)
validation <-c(0.20) # hold out 20%

# Training data set
data_train <- data_train_sub[1:1250,]

# There are two extreme outliers: 
data_train_2 <- data_train %>% slice(-c(64, 162))

# Validation data set
data_test <-  data_train_sub[1251:1500,]

rec <- 
  recipe(cc ~ ., data = data_train_2) %>% 
  step_normalize(all_predictors()) %>% 
  step_pca(all_predictors()) 

set.seed(1)
# Model fitting
fit <- brulee_mlp(rec, data = data_train_2,
                  hidden_units = hidden_units,
                  epochs = epochs, dropout = dropout, learn_rate = learn_rate, activation = activation,
                  penalty = penalty,validation=validation)

# check convergence
autoplot(fit)


#Plot                
predict(fit, data_test) %>%
  bind_cols(data_test) %>%
  ggplot(aes(x = .pred, y = cc))  
  geom_abline(col = "green")  
  geom_point(alpha = .3)  
  lims(x = c(0, 1.0), y = c(0, 1.0))  
  coord_fixed(ratio = 1)

^{Created on 2023-01-04 by the reprex package (v2.0.1)}