I have a database for distinct sources (taggers) I want to assign weight for each source.
# source 1
df1 = pd.DataFrame({
'feature': ['a', 'b', 'c'],
'label': [1, 0, 0]
})
# source 2
df2 = pd.DataFrame({
'feature': ['d', 'e', 'f'],
'label': [1, 0, 1]
})
df = pd.concat([df1, df2]) # Here I want to assign distic weigths to df1 and df2
such that when I train a model, the model consider this weights of the feature, i.e. if the feature is df1 is somehow different (more or less 'important') that if it is in df2.
clf = model()
X = df['feature']
y = df['label']
clf.fit(X, y, weight) # But here the weight is not the class_weight
# but a weight in the feature
CodePudding user response:
You could assign
a weight column, similarly to what @Quang suggested, but with defined weights:
weights = [3, 4]
dfs = [df1, df2]
df = pd.concat([d.assign(weight=w) for w,d in zip(weights, dfs)])
Output:
feature label weight
0 a 1 3
1 b 0 3
2 c 0 3
0 d 1 4
1 e 0 4
2 f 1 4
Then you should be able to do:
clf = model()
X = df['feature']
y = df['label']
w = df['weight']
clf.fit(X, y, w)