Home > Enterprise >  Accuracy not increasing when running multiple LinearRegressions tests
Accuracy not increasing when running multiple LinearRegressions tests


I made a very simple program, that takes columns of data from a csv file, here is a short preview of the file data:


I drop the unnecessary columns and run tests with 30% data used as test data to predict the accuracy of blue team winning the game:

import pandas as pd
import numpy as np
import sklearn
from sklearn import linear_model

df = pd.read_csv('MatchTimelinesFirst15.csv', delimiter=',')

predict = "blue_win"

df = df.drop('Unnamed: 0', axis=1)
df = df.drop('redDragonKills', axis=1)
df = df.drop('blueDragonKills', axis=1)
# print(df.describe())

x = np.array(df.drop([predict], axis=1))
y = np.array(df[predict])

for _ in range(500):
    x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x, y, test_size=0.30)

    # print('{0}, {1}'.format(type(x_train), x_train))

    linear = linear_model.LinearRegression()

    # trains model
    linear.fit(x_train, y_train)

    acc = linear.score(x_test, y_test)

    print('Accuracy: {0}'.format(acc))

But my accuracy wont increase even tho training it through a loop 500 times? I keep getting the same range of results:

Accuracy: 0.39030223064480596
Accuracy: 0.3980014684661366
Accuracy: 0.3840247556358104
Accuracy: 0.3939949181269252
Accuracy: 0.38657487661026535
Accuracy: 0.3950506154649621
Accuracy: 0.3925506648304995

Any help will be greatly appreciated, also on improvements since i am very new to python and machine learning.

CodePudding user response:

You are not training the model any further by using your loop. You start fresh every 500 times, only difference is the random initialisation of you train-test split.

As for improvements of your classifier, I would steer away from Linear Regression. Regression is not the same thing as classification. Classification will predict categorical class labels and regression predicts a continuous quantity.

Since you want to find out when the blue team wins, you have a binary classification problem. Either the blue team wins or it doesn't.

Try classification models like an SVM.

Good luck!

  • Related