I have a dataframe for which I predicted the result
using XGBoost (all the necessary imports are made and I will not write them anymore):
studentId testId result Length Words picture
s1 t1 0 10 8.50 0
s1 t2 0 11 9.80 1
s1 t3 1 11 10.40 1
s2 t2 0 11 9.80 1
s2 t4 1 60 9.99 0
s3 t7 1 40 6.45 0
cols_to_drop = ['testId', 'studentId']
df.drop(cols_to_drop, axis=1, inplace=True)
X = df.drop('result', axis=1)
y = df['result']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=5)
model = XGBClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
I have a part of this dataframe for which I can also predict the result
in a different way using surprise
, not using all the above features:
studentId testId result
s1 t1 0
s1 t2 0
s1 t3 1
s2 t2 0
s2 t4 1
s3 t7 1
reader = Reader(rating_scale=(0, 1))
data = Dataset.load_from_df(df_small[['studentId', 'testId', 'result']], reader)
trainset, testset = train_test_split(data, test_size=0.25)
algo = KNNWithMeans()
algo.fit(trainset)
test = algo.test(testset)
test = pd.DataFrame(test)
test.drop("details", inplace=True, axis=1)
test.columns = ['userId', 'questionId', 'actual', 'cf_predictions']
Now, I want to create a model that combines the two and assigns different weights to each model. I tried to write the things above as functions and then everything as a big function:
def model_1(df):
cols_to_drop = ['testId', 'studentId']
new_df=df.drop(cols_to_drop, axis=1, inplace=True)
X = new_df.drop('result', axis=1)
y = new_df['result']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=5)
model = XGBClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
return y_test, y_pred
def model_2(df):
reader = Reader(rating_scale=(0, 1))
data = Dataset.load_from_df(df[['studentId', 'testId', 'result']], reader)
trainset, testset = train_test_split(data, test_size=0.25)
algo = KNNWithMeans()
algo.fit(trainset)
test = algo.test(testset)
test = pd.DataFrame(test)
test.drop("details", inplace=True, axis=1)
test.columns = ['studentId', 'testId', 'actual', 'cf_predictions']
return test
def merged_models(df):
first_model = model_1(df)
second_model = model_2(df)
prediction = 0.5 * first_model 0.5 * second_model # weights example
return prediction
The first two work, but merged_models(df)
doesn't even get to apply model_1
because AttributeError: 'NoneType' object has no attribute 'drop'
at X = new_df.drop('result', axis=1)
. The code is probably a mess, but is there any way of combining such two different models and being able to also evaluate this "hybrid"?
CodePudding user response:
df.drop
does not return anything when inplace
is set to True
. It modifies the DataFrame in place and returns None
. You don't need to create new names for them.
CodePudding user response:
As @TimRoberts pointed out, new_df.drop
with inplace=True
does not return anything (in other words, returns None
). You can either leave inplace=False
, or not reassign to new_df
.
This will work:
new_df = df.drop(cols_to_drop, axis=1)
And so will this:
new_df = df.copy()
new_df.drop(cols_to_drop, axis=1, inplace=True)