How to keep row index after running predict_proba() in Scikit-learn?-CodePudding

I have created a logistic regression model to predict acceptance rate of a campaign, where 0 = not accepted and 1 = accepted. Now, I need to put together three specific columns: person_id, the actual acceptance score (1 or 0), and the output from sklearn's predict_proba().

Both person_id and its respective actual acceptance score I can get from the test set, so that is solved. However, if I want to merge the output from predict_proba() with person_id and score using their row index, then I have an issue. This is because predict_proba() resets the index. So, I cannot guarantee that if I were to concatenate it with the person_id and acceptance score, they would match its respective row. These are my questions:

Is there any way I can return the predict_proba() keeping the original row index from X_test? Below is the line of code for predict_proba() on the X_test set.
```
df_proba = pd.DataFrame(model.predict_proba(X_test)[:,1], columns=['proba'])
```
Does predict_proba() maintain row order despite reseting the index? Therefore, I could simply concatenate by column (axis=1)?

CodePudding user response：

predict_proba() method does not shuffle the data. The row index of your X_test is maintained after you apply the aforementioned method. In other words, the first entry of predictions corresponds to the first row of X_test. You can simply concatenate the person_id, score and prediction.