Give weights to rows of dataframe-CodePudding

I have a database for distinct sources (taggers) I want to assign weight for each source.

# source 1
df1 = pd.DataFrame({
      'feature': ['a', 'b', 'c'],
      'label': [1, 0, 0]
      })

# source 2
df2 = pd.DataFrame({
      'feature': ['d', 'e', 'f'],
      'label': [1, 0, 1]
      })

df = pd.concat([df1, df2]) # Here I want to assign distic weigths to df1 and df2

such that when I train a model, the model consider this weights of the feature, i.e. if the feature is df1 is somehow different (more or less 'important') that if it is in df2.

clf = model()
X = df['feature']
y = df['label']
clf.fit(X, y, weight) # But here the weight is not the class_weight 
                      # but a weight in the feature

CodePudding user response：

You could assign a weight column, similarly to what @Quang suggested, but with defined weights:

weights = [3, 4]
dfs = [df1, df2]
df = pd.concat([d.assign(weight=w) for w,d in zip(weights, dfs)])

Output:

  feature  label  weight
0       a      1       3
1       b      0       3
2       c      0       3
0       d      1       4
1       e      0       4
2       f      1       4

Then you should be able to do:

clf = model()
X = df['feature']
y = df['label']
w = df['weight']
clf.fit(X, y, w)