Please help to fix it: TypeError: predict_proba() missing 1 required positional argument: 'X&#0-CodePudding

I was building a binary classifier using the random forest classifier. Before it, I did a feature selection based on the high AUC score. However, when I wanted to get AUC for this model I couldn't. Here is the code below. Sorry for the lack of the dataset.


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, roc_auc_score
from sklearn.feature_selection import VarianceThreshold

df_process_label1 = 'AAA'
X = df_process.iloc[:,200:500]
y = df_process[df_process_label1].values

import sklearn
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.2, random_state = 0)

constant_filter = VarianceThreshold(threshold = 0.01)
constant_filter.fit(X_train)
X_train_filter = constant_filter.transform(X_train)
X_test_filter = constant_filter.transform(X_test)


roc_auc = []
for features in X_train.columns:
    clf = RandomForestClassifier(n_estimators = 100, random_state=0)
    clf.fit(X_train[features].to_frame(), y_train)
    y_pred = clf.predict(X_test[features].to_frame())
    roc_auc.append(roc_auc_score(y_test, y_pred))


roc_values = pd.Series(roc_auc)
roc_values.index = X_train.columns
roc_values.sort_values(ascending = False, inplace =True)


sel = roc_values[roc_values>0.5]
sel


X_train_roc = X_train[sel.index]
X_test_roc = X_test[sel.index]

def run_randomForest(X_train, X_test, y_train, y_test):
    clf = RandomForestClassifier(n_estimators=100, random_state=0, n_jobs=1)
    clf.fit(X_train, y_train)
    y_pred1 = clf.predict(X_test)
    print('Accuracy on test set: ', accuracy_score(y_test, y_pred))
    print(roc_auc_score(y_test, RandomForestClassifier.predict_proba(X_test)[:,1]))

%time
run_randomForest(X_train_roc, X_test_roc, y_train, y_test)

However, one error keep appearing over and over again.

TypeError: predict_proba() missing 1 required positional argument: 'X'

Do you know how to fix it? Thanks in advance!

CodePudding user response：

You should use clf.predict_proba(X_test) instead, and also I think you need to fix this part too:

y_pred1 = clf.predict(X_test)
print('Accuracy on test set: ', accuracy_score(y_test, y_pred))

you are declaring y_pred1, but using y_pred