Find the column name of the highest weighted average-CodePudding

I have a DF that looks like this. In the investor columns, 1 means invested and 0 mean no. If there are more than 1 investors invested in a project, we can assume that they share the investment equally (eg: Investor A and B each invested 50000 in project "something"). I want to calculate the total amount that each investor put in and find out who invested the most.

Project	Invested Amount	Investor A	Investor B	Investor C
Something	100000	1	1	0
Another	5000000	0	0	1
Last	25000000	1	1	1

Right now I am thinking of filtering by each investor that has a 1 and then divide by them sum of all the investor column. Here is what I tried but I am still missing something:

Investor_A = df[df['Investor A'] == 1]
test = Investor_A['Invested Amount'] / (df.iloc[:,3:5].sum())

Expected output:

Investor A would have put in a total of 100000/2 25000000/3 = 8383333.33333. Investor B would have put in a total of 100000/2 25000000/3 = 8383333.33333. Investor C would have put in a total of 5000000 25000000/3 = 13333333.3333 --> Investor C invested the most amount of money.

CodePudding user response：

Call filter() on the Investor columns and divide the Invested Amount column by the row-wise sum of the Investor columns. Then multiply it by the investors columns again to get the total share of each investor for each item. Then calling sum() would find the total investment of each investor and idxmax() would fetch the name of the investor.

investors = df.filter(like='Investor')
avg_invested_amount = df['Invested Amount'] / investors.sum(1)
investment_shares = investors.mul(avg_invested_amount, axis=0)
investment_per_investor = investment_shares.sum()
investment_per_investor.idxmax()
#'Investor C'

FYI, this code can be written in 2 lines (but much less legible):

investors = df.filter(like='Investor')
investors.mul(investors.sum(1).rdiv(df['Invested Amount']), axis=0).sum().idxmax()

CodePudding user response：

@NewEnglandcottontail's way is straights forward... Another way could be this;

import pandas as pd 

df = pd.DataFrame({"Project":["Project1","Project2","Project3"],
                   "Invested Amount":[100000,5000000,25000000],
                   "Investor A":[1,0,1],
                   "Investor B":[1,0,1],
                   "Investor C":[0,1,1]})

df1 = df.copy()
col_lst = [col for col in df1.columns if "Investor" in col]
df1[col_lst] = df1.apply(lambda x: x[col_lst] * x["Invested Amount"]/sum(x[col_lst]),axis=1)

lst = df1[col_lst].sum().to_list()
print("Investor who invested maximum ammount:",col_lst[lst.index(max(lst))])

Output;

Investor who invested maximum ammount: Investor C