I have a pandas dataframe and I want to add to it new features, like this:
Say I have features X_1,X_2,X_3 and X_4
, then I want to add X_1 * X_2, X_1 * X_3, X_1 * X_4
, and similarly X_2 * X_3, X_2 * X_4
and X_3 * X_4
. I want to add them, not replace the original features.
How do I do that?
CodePudding user response:
for c1, c2 in combinations(df.columns, r=2):
df[f"{c1} * {c2}"] = df[c1] * df[c2]
you can take every r = 2 combination of the columns, multiply them and assign.
Example run:
In [66]: df
Out[66]:
x1 y1 x2 y2
0 20 5 22 10
1 25 8 27 2
In [67]: from itertools import combinations
In [68]: for c1, c2 in combinations(df.columns, r=2):
...: df[f"{c1} * {c2}"] = df[c1] * df[c2]
...:
In [69]: df
Out[69]:
x1 y1 x2 y2 x1 * y1 x1 * x2 x1 * y2 y1 * x2 y1 * y2 x2 * y2
0 20 5 22 10 100 440 200 110 50 220
1 25 8 27 2 200 675 50 216 16 54
Another way via sklearn.preprocessing.PolynomialFeatures
:
In [74]: df
Out[74]:
x1 y1 x2 y2
0 20 5 22 10
1 25 8 27 2
In [75]: from sklearn.preprocessing import PolynomialFeatures
In [76]: poly = PolynomialFeatures(degree=2,
interaction_only=True,
include_bias=False)
In [77]: poly.fit_transform(df)
Out[77]:
array([[ 20., 5., 22., 10., 100., 440., 200., 110., 50., 220.],
[ 25., 8., 27., 2., 200., 675., 50., 216., 16., 54.]])
In [78]: new_columns = df.columns.tolist() [*map(" * ".join,
combinations(df.columns, r=2))]
In [79]: df = pd.DataFrame(poly.fit_transform(df), columns=new_columns)
In [80]: df
Out[80]:
x1 y1 x2 y2 x1 * y1 x1 * x2 x1 * y2 y1 * x2 y1 * y2 x2 * y2
0 20.0 5.0 22.0 10.0 100.0 440.0 200.0 110.0 50.0 220.0
1 25.0 8.0 27.0 2.0 200.0 675.0 50.0 216.0 16.0 54.0
CodePudding user response:
Let's say all are integers X_1,X_2,X_3 and X_4. You can create new nan columns and could add what do you want in there.
df['X_1multipleX_2'] = np.nan
df['X_1multipleX_2'] = df['X_1']*df['X_2'] #You can do it without first step.