I have a pandas dataframe df
:-
ID | COST | 1F | 2F | 3F | 4G |
---|---|---|---|---|---|
1 | 362 | 0 | 1 | 1 | 1 |
2 | 269 | 0 | 1 | 0 | 0 |
3 | 346 | 1 | 1 | 1 | 1 |
4 | 342 | 0 | 0 | 0 | 0 |
I have a total_cost
dictionary :
total_cost ={'1F' : 0.047,'2F' : 0.03,'3F': 0.023,'4G': 0.025}
I want to add a TOTAL_COST
column such that wherever 1
is present, COST
*(value from total_cost dictionary) for that col is to be multiplied and added together.
The dataframe has around a milion records, what would be the most efficient way to do this?
Expected df
:-
ID | COST | 1F | 2F | 3F | 4G | TOTAL_COST |
---|---|---|---|---|---|---|
1 | 362 | 0 | 1 | 1 | 1 | 28.236 |
2 | 269 | 0 | 1 | 0 | 0 | 8.07 |
3 | 346 | 1 | 1 | 1 | 1 | 43.25 |
4 | 342 | 0 | 0 | 0 | 0 | 0 |
CodePudding user response:
Use slicing to only get the columns present as keys in your dictionary, then multiply by it, aggregate as sum
per row and multiply
by the COST:
df['TOTAL_COST'] = df[list(total_cost)].mul(total_cost).sum(axis=1).mul(df['COST'])
Or, cast the dictionary as Series (the non matching columns will become NaN):
df['TOTAL_COST'] = df.mul(pd.Series(total_cost)).sum(axis=1).mul(df['COST'])
Output:
ID COST 1F 2F 3F 4G TOTAL_COST
0 1 362 0 1 1 1 28.236
1 2 269 0 1 0 0 8.070
2 3 346 1 1 1 1 43.250
3 4 342 0 0 0 0 0.000