I have two dataframes. One has tens of thousands of rows. The other has hundreds of rows.
df_1(tens of thousands of rows) (Inventory)
Color | Size | Shape | rating |
---|---|---|---|
Red | 2 | pear | 82 |
Purple | 4 | heart | 85 |
Blue | 5 | oval | 99 |
Black | 2 | round | 91 |
Red | 1 | heart | 67 |
df_2(hundreds of rows) (Table of standard ratings of gems with different colors, size and shape)
| Color | Size | Shape(pear) |Shape(heart) |Shape(oval)
| standard rating |standard rating |standard rating
| -------- | -------- | -------- | -------- |----------
| Red | 1 | 85 | 85 | 85
| Red | 2 | 87 | 86 | 85
| Red | 3 | 85 | 88 | 86
| Red | 4 | 86 | 83 | 86
| Red | 5 | 85 | 84 | 86
| Blue | 1 | 90 | 90 | 90
I need to add a new column to df_1 called standard rating. It gives the standard rating from df_2 to each row in df_1.
For row 1 in df_1 which is a red gem, size 2 with pear shape. I'd like to fetch the standard rating of the same color gem, with the same size and shape from df_2 which is 87 and put it in a new column in df_1.
df_1
Color | Size | Shape | ratings | standard ratings |
---|---|---|---|---|
Red | 2 | pear | 82 | 87 |
Purple | 4 | heart | 85 | 85 |
Blue | 5 | oval | 99 | 90 |
Black | 2 | round | 91 | 92 |
Red | 1 | heart | 67 | 85 |
Any suggestions how I can accomplish it?
CodePudding user response:
You need first to merge:
df_1 = df_1.merge(df_2, on=['Color','Size'], how='left')
Then populate a new column name 'standard rating' based on the 'shape':
import numpy as np
df_1['standard rating'] = np.where(df_1['Shape']=='pear', df_1['Shape(pear) standard rating'],0)
df_1['standard rating'] = np.where(df_1['Shape']=='heart', df_1['Shape(heart) standard rating'],df_1['standard rating'])
df_1['standard rating'] = np.where(df_1['Shape']=='oval', df_1['Shape(oval) standard rating'],df_1['standard rating'])
df_1['standard rating'] = np.where(df_1['Shape']=='round', df_1['Shape(round) standard rating'],df_1['standard rating'])
CodePudding user response:
First
melt df_2
df2 = (df_2.melt(['Color', 'Size'], var_name='Shape', value_name='Standard ratings')
.assign(Shape=lambda x: x['Shape'].str.split('[()]').str[1]))
output: df2.head()
Color Size Shape Standard ratings
0 Red 1 pear 85
1 Red 2 pear 87
2 Red 3 pear 85
3 Red 4 pear 86
4 Red 5 pear 85
Second
merge df_1 and df2
df_1.merge(df2, how='left')