Home > Mobile >  How to calculate sumproduct in pandas by column?
How to calculate sumproduct in pandas by column?

Time:05-21

I have a dataframe:

ID  2000-01 2000-02 2000-03 2000-04 2000-05 val
1   2847    2861    2875    2890    2904    94717
2   1338    1343    1348    1353    1358    70105
3   3301    3311    3321    3331    3341    60307
4   1425    1422    1419    1416    1413    79888

I want to add a new row to the table with the sumproduct formula (excel) =sumproduct(array $val, array 2000-xx). The first value in the new row is computed as 2847x94717 1338x70105 3301x60307 1425x79888 = 676373596 (in Excel terms, B2xG2 B3xG3 B4xG4 B5xG5)

Output:

ID  2000-01    2000-02   2000-03     2000-04    2000-05    val
1   2847       2861      2875        2890       2904       94717
2   1338       1343      1348        1353       1358       70105
3   3301       3311      3321        3331       3341       60307
4   1425       1422      1419        1416       1413       79888
5   676373596  678413565 680453534   682588220  684628189  

How do I go about this?

CodePudding user response:

You could do the dot product @ and merge back to the original dataframe:

df.merge(pd.DataFrame(df.iloc[:,1:-1].T @ df['val']).T, how='outer')
 
    ID    2000-01    2000-02    2000-03    2000-04    2000-05      val
0  1.0       2847       2861       2875       2890       2904  94717.0
1  2.0       1338       1343       1348       1353       1358  70105.0
2  3.0       3301       3311       3321       3331       3341  60307.0
3  4.0       1425       1422       1419       1416       1413  79888.0
4  NaN  676373596  678413565  680453534  682588220  684628189      NaN

CodePudding user response:

You can do this, assuming ID is not in the index:

 df.loc[5, :] = df.iloc[:,1:-1].mul(df['val'], axis=0).sum()

Output:

    ID      2000-01      2000-02      2000-03      2000-04      2000-05      val
0  1.0       2847.0       2861.0       2875.0       2890.0       2904.0  94717.0
1  2.0       1338.0       1343.0       1348.0       1353.0       1358.0  70105.0
2  3.0       3301.0       3311.0       3321.0       3331.0       3341.0  60307.0
3  4.0       1425.0       1422.0       1419.0       1416.0       1413.0  79888.0
5  NaN  676373596.0  678413565.0  680453534.0  682588220.0  684628189.0      NaN

Use pandas.DataFrame.mul with axis=0 then sum and let pandas intrinsic data alignment put the values in the correct column based on indexing.

CodePudding user response:

You can do:

row = [sum([v[0]*v[1] for v in list(zip(df[col],df['val']))]) for col in df.columns.drop(['ID','val'])]
row.insert(0, len(df) 1)
row.insert(len(row), 0)
df.loc[len(df) 1] = row
df.loc[len(df),'val'] = ''

CodePudding user response:

Other options for the same result

columns_to_multiply = df.columns.drop(['ID', 'val'])
df1 = df.copy()
for x in columns_to_multiply:
    df1[x] *= df1['val']
prod_sum_list = [len(df)]   df1[columns_to_multiply].sum().tolist()   [np.nan]
df.loc[len(df.index)] = prod_sum_list
df
  • Related