I have a csv of size 12500 X 3. The first two columns (A and B) are inputs and the the final column (C) is the sum of the two columns.
I wanted to build a prediction model to get the value of C for a given A and B. This is just a basic model to imporve my understanding of machine learning.
The accuracy score is almost zero (0.00032) and the model is way to simple to get the predictions wrong. The code is below:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
data = pd.read_csv('Dataset.csv') #importing dataset
X = data.drop(columns=['C'])
y = data['C']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = DecisionTreeClassifier()
model.fit(X_train,y_train)
predictions = model.predict(X_test)
score = accuracy_score(y_test, predictions)
score
I did not even include outlier into the data and I create the csv using excel formulae. I used jupyter notebook to build this prediction model. Can someone please point out if/what I'm doing wrong?
CodePudding user response:
Before you build your model, you should understand the behavior of the model and its main function. Decision Tree is used to classify data base on the criterias extracted from data. For this purpose, you should just choose the simple Linear Regression model, not the Decision Tree.