Hi I currently have 2 dataframe with different shapes
df11 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
columns=['a', 'b', 'c'])
a b c
0 1 2 3
1 4 5 6
2 7 8 9
df12 = pd.DataFrame(np.array([[7, 8, 9]]),
columns=['a', 'b', 'c'])
a b c
0 7 8 9
I would like to multiply each row in df11 by df12. So the resulting dataframe should show
df13 = pd.DataFrame(np.array([[7, 16, 27], [28, 40, 54], [49, 64, 81]]),
columns=['a', 'b', 'c'])
a b c
0 7 16 27
1 28 40 54
2 49 64 81
CodePudding user response:
I recommend using numpy multiplication
df13 = pd.DataFrame(df11.to_numpy()*df12.to_numpy(), columns=df11.columns)
Or you can use pandas
mul operator like this,
df11.mul({'a': 7, 'b': 8, 'c': 9})
CodePudding user response:
One-liner
df_3 = df_1 * df_2.iloc[0]
Code
import pandas as pd
data_1 = {'a': [1, 4, 7],
'b': [2, 5, 8],
'c': [3, 6, 9]}
data_2 = {'a': [7], 'b': [8], 'c': [9]}
df_1 = pd.DataFrame(data_1)
df_2 = pd.DataFrame(data_2)
df_3 = df_1 * df_2.iloc[0]
print(df_3)
Output
a b c
0 7 16 27
1 28 40 54
2 49 64 81
Timings A few timings for this input.
# Paul_O's numpy approach
25.9 µs ± 440 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
# iloc approach
172 µs ± 962 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
# mozway's approach
194 µs ± 254 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
# Paul_O's mul approach
308 µs ± 1.38 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Making data_1
a 10000 x 3 DataFrame
of random integers between 1 and 10000 we get very similar results.
# Paul_O's numpy approach
39 µs ± 396 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
# iloc approach
188 µs ± 1.94 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
# mozway's approach
206 µs ± 2.86 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# Paul_O's mul approach
312 µs ± 1.95 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Of course, these are only two sets of timings for two very specific sets of input on one system so I would not advise on generating hard conclusions from these but it seems if your problem is very similar to this one then the numpy
approach is best. The best way may differ in other circumstances, e.g., if the form of your input differs.
CodePudding user response:
You can use squeeze
:
df13 = df11*df12.squeeze()
The potential advantage is that it would perform a 2D multiplication if df12 has more than 2 rows.
output:
a b c
0 7 16 27
1 28 40 54
2 49 64 81