How to multiply all columns with each other-CodePudding

I have a pandas dataframe and I want to add to it new features, like this:

Say I have features X_1,X_2,X_3 and X_4, then I want to add X_1 * X_2, X_1 * X_3, X_1 * X_4, and similarly X_2 * X_3, X_2 * X_4 and X_3 * X_4. I want to add them, not replace the original features.

How do I do that?

CodePudding user response：

for c1, c2 in combinations(df.columns, r=2):
    df[f"{c1} * {c2}"] = df[c1] * df[c2]

you can take every r = 2 combination of the columns, multiply them and assign.

Example run:

In [66]: df
Out[66]:
   x1  y1  x2  y2
0  20   5  22  10
1  25   8  27   2

In [67]: from itertools import combinations

In [68]: for c1, c2 in combinations(df.columns, r=2):
    ...:     df[f"{c1} * {c2}"] = df[c1] * df[c2]
    ...:

In [69]: df
Out[69]:
   x1  y1  x2  y2  x1 * y1  x1 * x2  x1 * y2  y1 * x2  y1 * y2  x2 * y2
0  20   5  22  10      100      440      200      110       50      220
1  25   8  27   2      200      675       50      216       16       54

Another way via sklearn.preprocessing.PolynomialFeatures:

In [74]: df
Out[74]:
   x1  y1  x2  y2
0  20   5  22  10
1  25   8  27   2

In [75]: from sklearn.preprocessing import PolynomialFeatures

In [76]: poly = PolynomialFeatures(degree=2,
                                   interaction_only=True, 
                                   include_bias=False)

In [77]: poly.fit_transform(df)
Out[77]:
array([[ 20.,   5.,  22.,  10., 100., 440., 200., 110.,  50., 220.],
       [ 25.,   8.,  27.,   2., 200., 675.,  50., 216.,  16.,  54.]])

In [78]: new_columns = df.columns.tolist()   [*map(" * ".join,
                                                   combinations(df.columns, r=2))]

In [79]: df = pd.DataFrame(poly.fit_transform(df), columns=new_columns)

In [80]: df
Out[80]:
     x1   y1    x2    y2  x1 * y1  x1 * x2  x1 * y2  y1 * x2  y1 * y2  x2 * y2
0  20.0  5.0  22.0  10.0    100.0    440.0    200.0    110.0     50.0    220.0
1  25.0  8.0  27.0   2.0    200.0    675.0     50.0    216.0     16.0     54.0

CodePudding user response：

Let's say all are integers X_1,X_2,X_3 and X_4. You can create new nan columns and could add what do you want in there.

df['X_1multipleX_2'] = np.nan
df['X_1multipleX_2'] = df['X_1']*df['X_2'] #You can do it without first step.