Home > Enterprise >  Pandas - Need to extract data from one dataframe and add to a new column in another dataframe with d
Pandas - Need to extract data from one dataframe and add to a new column in another dataframe with d

Time:11-19

I have two dataframes. One has tens of thousands of rows. The other has hundreds of rows.

df_1(tens of thousands of rows) (Inventory)

Color Size Shape rating
Red 2 pear 82
Purple 4 heart 85
Blue 5 oval 99
Black 2 round 91
Red 1 heart 67

df_2(hundreds of rows) (Table of standard ratings of gems with different colors, size and shape)

| Color    | Size     | Shape(pear)       |Shape(heart)    |Shape(oval)
                      | standard rating   |standard rating |standard rating
| -------- | -------- | --------          | --------       |----------
| Red      | 1        | 85                | 85             | 85
| Red      | 2        | 87                | 86             | 85
| Red      | 3        | 85                | 88             | 86
| Red      | 4        | 86                | 83             | 86
| Red      | 5        | 85                | 84             | 86
| Blue     | 1        | 90                | 90             | 90

I need to add a new column to df_1 called standard rating. It gives the standard rating from df_2 to each row in df_1.

For row 1 in df_1 which is a red gem, size 2 with pear shape. I'd like to fetch the standard rating of the same color gem, with the same size and shape from df_2 which is 87 and put it in a new column in df_1.

df_1

Color Size Shape ratings standard ratings
Red 2 pear 82 87
Purple 4 heart 85 85
Blue 5 oval 99 90
Black 2 round 91 92
Red 1 heart 67 85

Any suggestions how I can accomplish it?

CodePudding user response:

You need first to merge:

df_1 = df_1.merge(df_2, on=['Color','Size'], how='left')

Then populate a new column name 'standard rating' based on the 'shape':

import numpy as np
df_1['standard rating'] = np.where(df_1['Shape']=='pear', df_1['Shape(pear) standard rating'],0) 
df_1['standard rating'] = np.where(df_1['Shape']=='heart', df_1['Shape(heart) standard rating'],df_1['standard rating']) 
df_1['standard rating'] = np.where(df_1['Shape']=='oval', df_1['Shape(oval) standard rating'],df_1['standard rating']) 
df_1['standard rating'] = np.where(df_1['Shape']=='round', df_1['Shape(round) standard rating'],df_1['standard rating']) 

CodePudding user response:

First

melt df_2

df2 = (df_2.melt(['Color', 'Size'], var_name='Shape', value_name='Standard ratings')
       .assign(Shape=lambda x: x['Shape'].str.split('[()]').str[1]))

output: df2.head()

    Color   Size    Shape   Standard ratings
0   Red     1       pear    85
1   Red     2       pear    87
2   Red     3       pear    85
3   Red     4       pear    86
4   Red     5       pear    85



Second

merge df_1 and df2

df_1.merge(df2, how='left')
  • Related