Home > Net >  FunctionTransformer & creating new columns in pipeline
FunctionTransformer & creating new columns in pipeline

Time:06-13

I have a sample data:

df = pd.DataFrame(columns=['X1', 'X2', 'X3'], data=[
                                               [1,16,9],
                                               [4,36,16],
                                               [1,16,9],
                                               [2,9,8],
                                               [3,36,15],
                                               [2,49,16],
                                               [4,25,14],
                                               [5,36,17]])

I want to create two complementary columns in my df based on x2 ad X3 and include it in the pipeline.

I am trying to follow the code:

def feat_comp(x):
 x1 = 100-x
 return x1

pipe_text = Pipeline([('col_test', FunctionTransformer(feat_comp, 'X2',validate=False))])
X = pipe_text.fit_transform(df)

It gives me an error:

TypeError: 'str' object is not callable

How can I apply the function transformer on selected columns and how can I use them in the pipeline?

CodePudding user response:

If I understand you correctly, you want to add a new column based on a given column, e.g. X2. You need to pass this column as an additional argument to the function using kw_args:

import pandas as pd
from sklearn.preprocessing import FunctionTransformer
from sklearn.pipeline import Pipeline

df = pd.DataFrame(columns=['X1', 'X2', 'X3'], data=[
                                               [1,16,9],
                                               [4,36,16],
                                               [1,16,9],
                                               [2,9,8],
                                               [3,36,15],
                                               [2,49,16],
                                               [4,25,14],
                                               [5,36,17]])

def feat_comp(x, column):
   x[f'100-{column}'] = 100 - x[column]
   return x

pipe_text = Pipeline([('col_test', FunctionTransformer(feat_comp, validate=False, kw_args={'column': 'X2'}))])
pipe_text.fit_transform(df)

Result:

   X1  X2  X3  100-X2
0   1  16   9      84
1   4  36  16      64
2   1  16   9      84
3   2   9   8      91
4   3  36  15      64
5   2  49  16      51
6   4  25  14      75
7   5  36  17      64

(in your example FunctionTransformer(feat_comp, 'X2',validate=False) X2 would be the inverse_func and the string X2 is not callalble, hence the error)

  • Related