I have created a logistic regression model to predict acceptance rate of a campaign, where 0 = not accepted and 1 = accepted. Now, I need to put together three specific columns: person_id
, the actual acceptance score
(1 or 0), and the output from sklearn's predict_proba()
.
Both person_id
and its respective actual acceptance score
I can get from the test set, so that is solved. However, if I want to merge the output from predict_proba()
with person_id
and score
using their row index, then I have an issue. This is because predict_proba()
resets the index. So, I cannot guarantee that if I were to concatenate it with the person_id
and acceptance score
, they would match its respective row. These are my questions:
Is there any way I can return the
predict_proba()
keeping the original row index from X_test? Below is the line of code for predict_proba() on the X_test set.df_proba = pd.DataFrame(model.predict_proba(X_test)[:,1], columns=['proba'])
Does
predict_proba()
maintain row order despite reseting the index? Therefore, I could simply concatenate by column (axis=1)?
CodePudding user response:
predict_proba()
method does not shuffle the data. The row index of your X_test is maintained after you apply the aforementioned method. In other words, the first entry of predictions corresponds to the first row of X_test. You can simply concatenate the person_id
, score
and prediction
.