I want to populate values of df2['VALUE']
in a new column in df1 df1['New']
when category and location match for both dataframes.
When
df1['category'] = df2['CATEGORY']
AND df1['location'] = df2['LOCATION']
populate values of df2['VALUE']
into a new row in df1 so that using df1['v1']
I can calculate df1['calculatedfield'] = df1['v1']/df1['new']
df1
category | location | type | v1 |
---|---|---|---|
A | loc1 | 1 | 2 |
A | loc1 | 2 | 4 |
A | loc2 | 1 | 6 |
A | loc2 | 2 | 8 |
B | loc1 | 1 | 10 |
B | loc1 | 2 | 12 |
B | loc2 | 1 | 14 |
B | loc2 | 2 | 16 |
df2
CATEGORY | LOCATION | VALUE |
---|---|---|
A | loc1 | 50 |
A | loc2 | 30 |
B | loc1 | 70 |
B | loc2 | 90 |
output
category | location | type | v1 | new(df2['VALUE']) | calculatedfield(v1/new) |
---|---|---|---|---|---|
A | loc1 | 1 | 2 | 50 | 0.04 |
A | loc1 | 2 | 4 | 50 | 0.08 |
A | loc2 | 1 | 6 | 30 | 0.2 |
A | loc2 | 2 | 8 | 30 | 0.27 |
B | loc1 | 1 | 10 | 70 | 0.14 |
B | loc1 | 2 | 12 | 70 | 0.17 |
B | loc2 | 1 | 14 | 90 | 0.16 |
B | loc2 | 2 | 16 | 90 | 0.18 |
df1
has more rows than df2
, that is why I didn't go with joining the two dataframes. I need to populate df2['new']
with the values from df2['VALUE']
wherever the combination of category and location come up and regardless of the value of df2['type']
. I can't drop rows.
I tried
df1['New'] = np.where((df1['category'] == df2['CATEGORY']) & (df1['location'] == df2['location']), df2['VALUE'], None)
and this came up:
ValueError: Can only compare identically-labeled Series objects
CodePudding user response:
You can use pandas.set_index()
function to create a multiIndex dataframe for obtaining values from df2
and use the pandas.reset.index()
to reset the dataframe. Please see below my sample code. I have replicated your dataframes df1
and df2
.
code:
import pandas as pd
data1 = {'category': ['A', 'A', 'A', 'A','B', 'B', 'B', 'B'],
'location': ['loc1', 'loc1', 'loc2', 'loc2','loc1', 'loc1', 'loc2', 'loc2'],
'type': [1,2,1,2,1,2,1,2,],
'v1': [2,4,6,8,10,12,14,16]}
data2 = {'category': ['A', 'A','B', 'B'],
'location': ['loc1', 'loc2', 'loc1','loc2'],
'VALUE': [50,30,70,90]}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
df_temp = df1.set_index(['category','location'])
df_temp['New'] = df2.set_index(['category','location'])['VALUE']
df1 = df_temp.reset_index()
df1['Calculated_Field'] = df1['v1']/df1['New']
df1
Output:
References:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.set_index.html