I have two dataframes that look similar and I want to divide one column from df1
by a column from df2
.
Some sample data is below:
dict1 = {'category': {0: 0.0, 1: 1.0, 2: 0.0, 3: 0.0, 4: 1.0},
'Id': {0: 24108, 1: 24307, 2: 24307, 3: 24411, 4: 24411},
'count': {0: 3, 1: 2, 2: 33, 3: 98, 4: 33}}
df1 = pd.DataFrame(dict1)
dict2 = {'Id': {0: 24108, 1: 24307, 2: 24411},
'count': {0: 3, 1: 35, 2: 131}}
df2 = pd.DataFrame(dict2)
I am trying to create a new column in the first dataframe (df1
) called weights
by dividing df1['count']
by df2['count']
. Except for the column category
and count
in both dfs, the values are identical in the other columns.
I have the following piece of code, but I cannot seem to understand where the error is:
df1['weights'] = (df1['count']
.div(df1.merge(df2, on = 'Id', how = 'left')
['count'].to_numpy())
)
I get the following error when I run the code:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/opt/conda/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3360 try:
-> 3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
/opt/conda/lib/python3.8/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
/opt/conda/lib/python3.8/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'count'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
/tmp/ipykernel_354/1318629977.py in <module>
1 complete['weights'] = (complete['count']
----> 2 .div(complete.merge(totals, on = 'companyId', how = 'left')['count'].to_numpy())
3 )
Any ideas why this is happening?
CodePudding user response:
Since you end up with count_x
and count_y
after your merge, you need to specify which one you want:
df1['weights'] = (df1['count'].div(df1.merge(df2, on = 'Id', how = 'left')['count_y'].to_numpy()))