please advice how to get the following output:
df1 = pd.DataFrame([['1, 2', '2, 2','3, 2','1, 1', '2, 1','3, 1']])
df2 = pd.DataFrame([[1, 2, 100, 'x'], [3, 4, 200, 'y'], [5, 6, 300, 'x']])
import numpy as np
df22 = df2.rename(index = lambda x: x 1).set_axis(np.arange(1, len(df2.columns) 1), inplace=False, axis=1)
f = lambda x: df22.loc[tuple(map(int, x.split(',')))]
df = df1.applymap(f)
print (df)
Output:
0 1 2 3 4 5
0 2 4 6 1 3 5
df1 is 'address' of df2 in row, col format (1,2 is first row, second column which is 2, 2,2 is 4 3,2 is 6 etc.)
I need to add values from the 3rd and 4th columns to get something like (2*100x, 4*200y, 6*300x, 1*100x, 3*200y, 5*300x)
the output should be 5000(sum of x's and y's), 0.28 (1400/5000 - % of y's)
CodePudding user response:
It's not clear to me why you need df1 and df... Maybe your question is lacking some details?
You can compute your values directly:
df22['val'] = (df22[1] df22[2])*df22[3]
Output:
1 2 3 4 val
1 1 2 100 x 300
2 3 4 200 y 1400
3 5 6 300 x 3300
From there it's straightforward to compute the sums (total and grouped by column 4):
total = df22['val'].sum() # 5000
y_sum = df22.groupby(4).sum().loc['y', 'val'] # 1400
print(y_sum/total) # 0.28
Edit: if df1 doesn't necessarily contain all members of columns 1 and 2, you could loop through it (it's not clear in your question why df1 is a Dataframe or if it can have more than one row, therefore I flattened it):
df22['val'] = 0
for c in df1.to_numpy().flatten():
i, j = map(int, c.split(','))
df22.loc[i, 'val'] = df22.loc[i, j]*df22.loc[i, 3]
This gives you the same output as above for your example but will ignore values that are not in df1.