My current data is organised into two data frames of the same shape. Id then like to sum all columns into a single column after calculation.
I am doing this using:
df = df1_kwh.multiply(df2np).sum(axis=1)
However when I use df.shape i get a shape of "(347,)" meaning no columns and I am then unable to add additional columns to the "sum" value column using df.insert.
What can I do to make the output of df.sum able to be manipulated by other functions?
CodePudding user response:
If you want a DataFrame you can convert from Series using to_frame
:
df = df1_kwh.multiply(df2np).sum(axis=1).to_frame()
By default the column name is 0
, to change it (for example to "sum"), use:
df = df1_kwh.multiply(df2np).sum(axis=1).to_frame('sum')
Example:
np.random.seed(0)
df1_kwh = pd.DataFrame(np.random.random(size=(5,5)))
df2np = 2
df = df1_kwh.multiply(df2np).sum(axis=1).to_frame('sum')
df.insert(0, 'new', 'x')
output:
new sum
0 x 5.670608
1 x 6.644717
2 x 5.770594
3 x 5.176273
4 x 6.276120
CodePudding user response:
UPDATED:
If I understand your question, it is saying that df1_kwh
and df2np
are "two data frames of the same shape".
Assuming they have identical column labels, your code for multiply()
should work. If the column labels differ and the index labels are identical, then df1_kwh.multiply(df2np, axis=0).sum(axis=1)
should work.
Since the example in your question does not use axis=0
within multiply()
, I'll assume your dataframes have identical column labels.
Here's a way to create a DataFrame using the result of your sum()
and then use insert()
to add a column:
import pandas as pd
df1_kwh = pd.DataFrame({'a':range(5), 'b':range(5, 10)})
df2np = pd.DataFrame({'a':range(10, 15), 'b':range(15, 20)})
df = df1_kwh.multiply(df2np).sum(axis=1).to_frame()
print(df)
df.insert(loc=1, column='additional_column', value='test')
print(df)
Input:
df1_kwh
a b
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9
df2np
a b
0 10 15
1 11 16
2 12 17
3 13 18
4 14 19
Output:
0 additional_column
0 75 test
1 107 test
2 143 test
3 183 test
4 227 test