How do I create a weighted feature in a dataframe?-CodePudding

I have a data frame that creates a final score based on other scores stored in column values. Along with these other scored columns, there is a column that shows how long in months a player has been active.

These columns are integers that are scaled from 1-100 and then are used to create a new feature, FS (Final Score). This feature is the sum of these column values, but the column values are also weighted. Say, column 1 is multiplied by 0.15 so that it makes up 15% of the resulting feature.

What I'm trying to figure out is, say one of the players hasn't been playing for very long (indicated by the month's column). In this case, I'd want S1 to count less towards their final score. S1 making up 10% of their score instead of 15% if 'months' < 6.

How can I make these grading weights flexible to fit such a case?

The code:

scaler=MinMaxScaler(feature_range=(0, 100))

df_final['S1']=scaler.fit_transform(df_final[['S1']])
df_final['S2']=scaler.fit_transform(df_final[['S2']])
df_final['S3']=scaler.fit_transform(df_final[['S3']])
df_final['S4']=scaler.fit_transform(df_final[['S4']])

s1 = df_final['S1']
s2 = df_final['S2']
s3 = df_final['S3']
s4 = df_final['S4']

df_final['FS'] = (s1 * .15)   (s2 * .15)   (s3 * .50)   (s4 * .20)

The resulting df:

    S1  S2  S3  S4  Months FS
0   49  66  44  9   4      50
1   36  66  44  10  11     49
2   28  77  33  17  17     52
3   39  66  44  4   2      48
4   32  44  44  17  4      35

CodePudding user response：

Try to allocate the weights with np.where:

feature_df = df.iloc[:,:4]

weights = np.where(df.Months.lt(6).to_numpy()[...,None], (.1,.2,.3,.4), (.15,.15,.50, .20))

df['FS'] = feature_df.mul(weights).sum(1)

Output:

   S1  S2  S3  S4  Months     FS
0  49  66  44   9       4  34.90
1  36  66  44  10      11  39.30
2  28  77  33  17      17  35.65
3  39  66  44   4       2  31.90
4  32  44  44  17       4  32.00

CodePudding user response：

Create a scoring function with all of your conditional logic that takes in a row:

def score(row):
  if ....
     row['S1'] * ??

  return result

This will allow you to add arbitrary complexity to handle the nuances of your scoring composition

Then map it onto your df

df['FS'] = df.apply(score, axis=1)