I have two tables like this:
import pandas as pd
import numpy as np
data1 = [[1, 2000], [2, 3000]]
df_players = pd.DataFrame(data1, columns = ['cluster', 'users'])
df_players
cluster | users |
---|---|
1 | 2000 |
2 | 3000 |
data2 = [[1, 2], [2, 4]]
df_multiplyer = pd.DataFrame(data2, columns = ['cluster', 'multiplyer'])
df_multiplyer
cluster | multiplyer |
---|---|
1 | 2 |
2 | 4 |
And I have an array
coef = np.arange(.05, 0.16, 0.05)
coef
0.05
0.1
0.15
A want to create the table as shown below with calculated field users*myltiplyer*coeff
Desired output:
cluster | num_users | multiplyer | coeffs | calculated_field |
---|---|---|---|---|
1 | 2000 | 2 | 0.05 | 200 |
1 | 2000 | 2 | 0.1 | 400 |
1 | 2000 | 2 | 0.15 | 600 |
2 | 3000 | 4 | 0.05 | 600 |
2 | 3000 | 4 | 0.1 | 1200 |
2 | 3000 | 4 | 0.15 | 1800 |
Thanks for helping me!
CodePudding user response:
Create helper DataFrame, then use merge
by cluster
and then cross join by new DataFrame
:
df1 = pd.DataFrame({'coeffs':coef, 'a':1})
df = (df_players.merge(df_multiplyer, on='cluster')
.assign(a = 1)
.merge(df1, on='a')
.drop('a', axis=1))
df['calculated_field'] = df[['users','multiplyer','coeffs']].prod(axis=1)
print (df)
cluster users multiplyer coeffs calculated_field
0 1 2000 2 0.05 200.0
1 1 2000 2 0.10 400.0
2 1 2000 2 0.15 600.0
3 2 3000 4 0.05 600.0
4 2 3000 4 0.10 1200.0
5 2 3000 4 0.15 1800.0