I have a sample data:
df = pd.DataFrame(columns=['X1', 'X2', 'X3'], data=[
[1,16,9],
[4,36,16],
[1,16,9],
[2,9,8],
[3,36,15],
[2,49,16],
[4,25,14],
[5,36,17]])
I want to create two complementary columns in my df based on x2 ad X3 and include it in the pipeline.
I am trying to follow the code:
def feat_comp(x):
x1 = 100-x
return x1
pipe_text = Pipeline([('col_test', FunctionTransformer(feat_comp, 'X2',validate=False))])
X = pipe_text.fit_transform(df)
It gives me an error:
TypeError: 'str' object is not callable
How can I apply the function transformer on selected columns and how can I use them in the pipeline?
CodePudding user response:
If I understand you correctly, you want to add a new column based on a given column, e.g. X2
. You need to pass this column as an additional argument to the function using kw_args
:
import pandas as pd
from sklearn.preprocessing import FunctionTransformer
from sklearn.pipeline import Pipeline
df = pd.DataFrame(columns=['X1', 'X2', 'X3'], data=[
[1,16,9],
[4,36,16],
[1,16,9],
[2,9,8],
[3,36,15],
[2,49,16],
[4,25,14],
[5,36,17]])
def feat_comp(x, column):
x[f'100-{column}'] = 100 - x[column]
return x
pipe_text = Pipeline([('col_test', FunctionTransformer(feat_comp, validate=False, kw_args={'column': 'X2'}))])
pipe_text.fit_transform(df)
Result:
X1 X2 X3 100-X2
0 1 16 9 84
1 4 36 16 64
2 1 16 9 84
3 2 9 8 91
4 3 36 15 64
5 2 49 16 51
6 4 25 14 75
7 5 36 17 64
(in your example FunctionTransformer(feat_comp, 'X2',validate=False)
X2
would be the inverse_func
and the string X2
is not callalble, hence the error)