How to improve NDCG score for a learning to rank project using LightGBM?
Currently working on a school project that requires a learning to rank functionality to rank documents per query, I have trained my model with the following parameters:
objective="lambdarank",
metric="ndcg",
to be used with LGBMRanker.
Initially my NDCG scores were quite high, however by running the predicted ranking against a correct validation set from the teacher the NDCG score drops considerably (0.78 to 0.5). I tweaked my parameters to this to reduce overfitting and I've also run a series of F-score tests, mutual information tests and random forest importance from sklearn to select features. however my NDCG score is still quite low, I'm finding it difficult to predict the correct NDCG without overfitting and also to improve the accuracy of my model. current parameter I am using:
objective="rank_xendcg",
metric="ndcg",
max_bin = 63,
learning_rate = 0.03,
num_iterations = 100,
num_leaves = 31,
path_smooth = 50,
lambda_l1 = 10,
min_gain_to_split = 10
CodePudding user response:
Evidently, you have overfitted your model. You do not share with us how you initially evaluated your model and achieved 0.78 NDCG, but I hope you did everything as you should.
You do not share a lot of information concerning your data. For example, do you have enough samples? How many features do you have? Maybe you have more features than samples and that is why you try to perform Feature Selection. You could also check how different is your validation set (the one your teacher provided) and your training set. Also check what happens if you use this validation set as part of your training set by training the model using Cross-Validation. I would check what are the performances across folds and the variance of those performances. If they vary a lot then the problem might stem from the data.
Despite this, I would advise you not to perform hyper-parameter tuning manually and on a single validation set. The main reason is because you will simply overfit on this validation set and when the test set comes along your performance will not be as you have anticipated.
For that reason, you can use Randomised Search using Cross-Validation after you carefully set your hyper-parameter space.sklearn
has a really nice and easy to use implementation. You can checkout other techniques like Halving Randomised Search; also implemented by sklearn
.
Even if you perform hyper-parameter tuning correctly, the performance improvement will not be as high as you are hoping. Hyper-parameter tuning normally boosts your performance by 1-5%. Therefore, I would recommend you to check your features. Maybe you can generate new ones from the current feature space or create cross-features, discard collinear features etc.