Home > Net >  Prioritize later observations in the scikit models in python
Prioritize later observations in the scikit models in python

Time:11-14

Given information of the following form:

target  f3  f2  f1  date
1   3   2   1   01/02/2000
0   6   5   4   02/02/2001
1   9   8   7   04/02/2002
1   12  11  10  06/02/2003
1   15  14  13  08/02/2004
1   18  17  16  09/02/2005
0   21  20  19  11/02/2006
1   24  23  22  13/02/2007
0   27  26  25  15/02/2008
1   30  29  28  16/02/2009
1   33  32  31  18/02/2010
1   36  35  34  20/02/2011
1   39  38  37  22/02/2012
1   42  41  40  23/02/2013
1   45  44  43  25/02/2014

and I know from the project domain that the world distribution is closer to the later observations but I still want to learn from the earlier observations. There is a way to prioritize later observations in a model classification task?

CodePudding user response:

Yes there is by passing sample_weight to the fit methods. Have a look at the documentation for some of the classifiers, e.g. here or here. In your case you would assign higher weights to recent observations.

There is also a short illustration for the SVM classifier available in this example.

  • Related