How to loop and .apply a lambda function on a DataFrame?-CodePudding

I'm building a ML model. I would like to run the prediction bit a few times and then calculate the mean of the accuracy scores.

My code looks like this:

predictions = test_df[['histor', 'philosoph', 'cook', 'roman', 'bibl']].apply(lambda x: baseline.predict(*x), axis=1)

y_true = test_df["label"].values

print("Accuracy: ", accuracy_score(y_true, predictions))

Is there a way to loop the predictions? The desired results would be: let's say n=10. Predictions are run 10 times, I get all the accuracies printed for each run and also the mean of all of them at the end.

Hope this makes sense.

CodePudding user response：

I would use sklearns cross_val_score for this:

from sklearn.model_selection import cross_val_score
X = test_df[['histor', 'philosoph', 'cook', 'roman', 'bibl']]
y = test_df["label"].values
cross_val_score(baseline, X, y, cv=10)

CodePudding user response：

You can store the accuracy scores in a list, and then use that list to calculate the mean accuracy at the end

import numpy as np
 n = 10
accuracies = np.zeros(n)
for i in range(n):
    predictions = test_df[['histor', 'philosoph', 'cook', 'roman', 'bibl']].apply(lambda x: baseline.predict(*x), axis=1)
    accuracy = accuracy_score(y_true, predictions)
    accuracies[i] = accuracy
    print("Run ", i 1, " Accuracy: ", accuracy)

mean_accuracy = np.mean(accuracies)
print("Mean Accuracy: ", mean_accuracy)

n = 10
accuracies = []
for i in range(n):
    predictions = test_df[['histor', 'philosoph', 'cook', 'roman', 'bibl']].apply(lambda x: baseline.predict(*x), axis=1)
    accuracy = accuracy_score(y_true, predictions)
    accuracies.append(accuracy)
    print("Run ", i 1, " Accuracy: ", accuracy)

mean_accuracy = sum(accuracies) / n
print("Mean Accuracy: ", mean_accuracy)