Home > Back-end >  Predict features/result
Predict features/result

Time:09-05

In my dataset, there are statistics of every match played by a basketball team during a season (points scored, three-point...). what I want is to predict whether it will win (i.e. 1 or 0) with statistics such as how many points and threes in the next game it will play.

I have defined my variable y as the game result in my dataset (0 or 1), The variable I define as x_data is the directory where the team's statistics are kept.

y = data.Result.values
x_data = data.drop(["Result"],axis='columns')

I first normalize the data and split it as train/test.

# Normalization
x = (x_data - np.min(x_data))/(np.max(x_data) - np.min(x_data)).values
# train test split

x_train, x_test, y_train, y_test = train_test_split(x,y, test_size = 0.2, random_state=42)

x_train = x_train.T
x_test = x_test.T
y_train = y_train.T
y_test = y_test.T

and then I try to guess

mlr = LinearRegression()
mlr.fit(x_train.T,y_train.T)
mlr.predict(x_test.T)
print("mlr test accuracy: {}".format(mlr.score(x_test.T,y_test.T)))

The result I get is just probability. What I want to achieve is to find an answer to the question of whether the team will win or not in the next match.

What should I do in this situation?

CodePudding user response:

The result you are talking about is the last print line I see with mlr test accuracy?

Because that is just the test performance of your linear model.

To have a proper prediction you need to assign the result of predict to a variable.

y_test_hat = model.predict(x_test.T)

which will be an array of proper game predictions of your input data.

Last but not least, I highly suggest to change your regression model to a more suitable classification one (logistic regression, svm, etc), as your outcome is just a binary output (either the player wins or not).

CodePudding user response:

First of all your normalization should be done separately for train and test set. At the present you are leaking your training parameters to your test set. So you need to correct that.

Second, transpose of transpose is the same matrix. e.g. X.T.T = X

Thirdly, since this looks like a classification problem (either win or lose), LogisticRegression is what you want (instead of LinearRegression)

  • Related