Home > Enterprise >  How to use BIC and AIC score for Lasso-GridSearchCV in sklearn?
How to use BIC and AIC score for Lasso-GridSearchCV in sklearn?

Time:10-06

I want use AIC & BIC to select the parameter alpha for lasso. However sklearn only has LassoLarsIC to do this which does not accept sparse matrix and thus does not fit my case. As a result I decide to use GridSearchCV and create a customized scorer. Below is my try:

def bic_error_func(y_true, y_pred, coefs):
    n_samples = len(y_true)
    mse = np.average((y_true - y_pred) ** 2, axis=0)
    sigma2 = np.var(y_true)
    eps64 = np.finfo("float64").eps
    K = log(n_samples)
    mask = np.abs(coefs) > np.finfo(coefs.dtype).eps
    df = np.sum(mask)
    
    score = (n_samples * mse / (sigma2   eps64)   K * df)
    return score

from sklearn.metrics import make_scorer
bic_scorer = make_scorer(bic_error_func, greater_is_better=False)

However, unlike the example in Defining your scoring strategy from metric functions, I need the additional argument coefs to calculate this score. How can I make the wrapped scoring function work in this case?

CodePudding user response:

The output of make_scorer (and the expected form of a scoring method for a grid search) is a callable with signature estimator, X, y; you should skip make_scorer and define such a callable directly. Then you can use the estimator's fitted attribute coefs_ directly. (The greater_is_better=False option of make_scorer just negates the score, so you should probably define this alternate custom scorer as negative BIC.)

Note however that in a GridSearchCV, you'll always be computing the score on the test folds, which deviates from the intention behind BIC.

  • Related