While using Jupyter notebook I never had this problem with the fit()
function.
But with this code I do:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
data = pd.read_csv('train.csv')
test_data = pd.read_csv('test.csv')
X = data.drop(columns=['Survived'])
y = data['Survived']
model = DecisionTreeClassifier
model.fit(X, y)
prediction = model.predict(test_data)
prediction
The train.csv and test.csv files were successfully read by pandas (I visualized X and Y in Jupyter).
The output:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_23200\3416706318.py in <module>
9
10 model = DecisionTreeClassifier
---> 11 model.fit(X, y)
12 prediction = model.predict(test_data)
13 prediction
TypeError: fit() missing 1 required positional argument: 'y'
How do I fix this bug?
The data used: https://www.kaggle.com/competitions/titanic/data?select=train.csv
CodePudding user response:
Fix for the error (syntax error):
First of all the error you have encountered can be fixed by adding parenthesis when calling the model to use model = DecisionTreeClassifier()
After adding the parenthesis in the model the code again will encounter an error since your X
data have multiple columns with string value.When training a model the algorithm (DecisionTreeClassifier) will only accept numerical values for X and y. Please see this link for more details.
CodePudding user response:
Either your dataset do not have the column 'Survived' at that level or data.drop() does an in-place removal, or possibly a 3rd spooky alternative. Either way this behaviour is caused by supplying a None argument to a function which is not prepared for that, python just disregards it as not being supplied at all.