Python: Multiplying a dataframe with another dataframe-CodePudding

Hi I currently have 2 dataframe with different shapes

df11 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
                   columns=['a', 'b', 'c'])
    a   b   c
0   1   2   3
1   4   5   6
2   7   8   9

df12 = pd.DataFrame(np.array([[7, 8, 9]]),
                   columns=['a', 'b', 'c'])

a   b   c
0   7   8   9

I would like to multiply each row in df11 by df12. So the resulting dataframe should show

df13 = pd.DataFrame(np.array([[7, 16, 27], [28, 40, 54], [49, 64, 81]]),
                   columns=['a', 'b', 'c'])

    a   b   c
0   7   16  27
1   28  40  54
2   49  64  81

CodePudding user response：

I recommend using numpy multiplication

df13 = pd.DataFrame(df11.to_numpy()*df12.to_numpy(), columns=df11.columns)

Or you can use pandas mul operator like this,

df11.mul({'a': 7, 'b': 8, 'c': 9})

CodePudding user response：

One-liner

df_3 = df_1 * df_2.iloc[0]

Code

import pandas as pd

data_1 = {'a': [1, 4, 7],
          'b': [2, 5, 8],
          'c': [3, 6, 9]}
data_2 = {'a': [7], 'b': [8], 'c': [9]}
df_1 = pd.DataFrame(data_1)
df_2 = pd.DataFrame(data_2)

df_3 = df_1 * df_2.iloc[0]
print(df_3)

Output

    a   b   c
0   7  16  27
1  28  40  54
2  49  64  81

Timings A few timings for this input.

# Paul_O's numpy approach
25.9 µs ± 440 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

# iloc approach
172 µs ± 962 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

# mozway's approach 
194 µs ± 254 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

# Paul_O's mul approach
308 µs ± 1.38 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Making data_1 a 10000 x 3 DataFrame of random integers between 1 and 10000 we get very similar results.

# Paul_O's numpy approach
39 µs ± 396 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

# iloc approach
188 µs ± 1.94 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

# mozway's approach
206 µs ± 2.86 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

# Paul_O's mul approach
312 µs ± 1.95 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Of course, these are only two sets of timings for two very specific sets of input on one system so I would not advise on generating hard conclusions from these but it seems if your problem is very similar to this one then the numpy approach is best. The best way may differ in other circumstances, e.g., if the form of your input differs.

CodePudding user response：

You can use squeeze:

df13 = df11*df12.squeeze()

The potential advantage is that it would perform a 2D multiplication if df12 has more than 2 rows.

output:

    a   b   c
0   7  16  27
1  28  40  54
2  49  64  81