Home > database >  How to increase the model accuracy of logistic regression in sklearn.linearP_model?
How to increase the model accuracy of logistic regression in sklearn.linearP_model?

Time:10-28

I am trying to predict the y values using LogisticRegression. enter image description here

Here is a sample of the that is train.

x = data[["A1", "A2", "A3","A4", "A5", "A6", "A7", "A8", "A9", "A10", "A11"]]
y = data["y"]

x_train, x_test, y_train, y_test = train_test_split(df, y, test_size=0.4, random_state=42)

log_model = LogisticRegression(solver='lbfgs', max_iter=1000)
log_model.fit(x_train,y_train)
predictions = log_model.predict(x_test)
accuracy_score(y_test,predictions)

However my accuracy score is only 0.712. Is there any feature engineering or anything that I can do to increase the score?

CodePudding user response:

There are multiple ways to improve your model, such as:

  1. Data Transformation:

    • Calculate the logrithmic on the original data and see if the data distrition becomes more obivious
    • Data bining: Split data into different bins, it may make your data more ridged(easier to split)
    • ...
  2. Change your model: There are a lot of different models such as LinearRegression, SVM, Decision Tree, etc. If you have more computing power, you may try Neural Networks.

  3. Learning Rate: Learning rate defines how much youre parameters changes each iteration, if the learning rate is too small, it may leads to local optimal, but if the learning rate is too big, it can cause underfit.

  4. Normalization: ...

There are still a lot of ways to improve your model, those are just several tips.

CodePudding user response:

Your should try 3 things.

  1. Try sklearn's min-max scaler or standard scaler to normalize/standardize the data.

  2. Try plotting Roc-Auc curve and try to find the perfect threshold for your probelem statement. [ If you try this you need to change log_model.predict() to log_model.predict_proba() or something syntax may differ). and then should apply that threshold value from Roc-Auc plot and check if accuracy has increased or not.

IF NOT .... THEN GO FOR 3RD STEP.

  1. This problem cant be solved by logistic regression (cause your accuracy is too low) try Random forest, SVM, Naïve Bayes or some other algorithm (boosting/bagging) and check if you get any good results
  • Related