I am trying to merge two dataframes so that each instance of an item ID in DF3 displays the pricing data associated with the matching ID from DF1.
DF3 (what I am trying to accomplish)
recipeID | itemID_out | qty_out | buy_price | sell_price | buy_quantity | sell_quantity | id_1_in | qty_id1 | buy_price | sell_price | buy_quantity | sell_quantity | id_2_in | qty_id2 | buy_price | sell_price | buy_quantity | sell_quantity | id_3_in | qty_id3 | buy_price | sell_price | buy_quantity | sell_quantity | id_4_in | qty_id4 | buy_price | sell_price | buy_quantity | sell_quantity | id_5_in | qty_id5 | buy_price | sell_price | buy_quantity | sell_quantity |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1986 | 1 | 129 | 167 | 67267 | 21637 | 123 | 1 | 10 | 15 | 1500 | 3000 | 124 | 1 | 12 | 14 | 550 | 800 | 125 | 1 | 8 | 12 | 124 | 254 | 126 | 1 | 22 | 25 | 1251 | 890 | 127 | 1 | 64 | 72 | 12783 | 1251515 |
2 | 1987 | 1 | 1521 | 1675 | 654 | 1245 | 123 | 2 | 10 | 15 | 1500 | 3000 | ||||||||||||||||||||||||
3 | 1988 | 1 | 128376 | 131429 | 47 | 23 | 123 | 10 | 10 | 15 | 1500 | 3000 | 124 | 3 | 12 | 14 | 550 | 800 |
These are the two dataframes I am trying to merge from.
DF1: Contains 26863 rows; master list of item names, IDs, and price data. Pulled from API, new items can be added and will appear as new rows after an update request from the user.
itemID | name | buy_price | sell_price | buy_quantity | sell_quantity |
---|---|---|---|---|---|
1986 | XYZ | 129 | 167 | 67267 | 21637 |
123 | ABC | 10 | 15 | 1500 | 3000 |
124 | DEF | 12 | 14 | 550 | 800 |
DF2 (contains 12784 rows; recipes that combine from items in the master list. Pulled from API, new recipes can be added and will appear as new rows after an update request from the user.)
recipeID | itemID_out | qty_out | id_1_in | qty_id1 | id_2_in | qty_id2 | id_3_in | qty_id3 | id_4_in | qty_id4 | id_5_in | qty_id5 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1986 | 1 | 123 | 1 | 124 | 1 | 125 | 1 | 126 | 1 | 127 | 1 |
2 | 1987 | 1 | 123 | 2 | ||||||||
3 | 1988 | 1 | 123 | 10 | 124 | 3 |
Recipes can contain a combination of 1 to 5 items (null values occur) that consist of IDs from DF1 and/or the itemID_out column in DF2.
The "id_#_in" columns in DF2 can contain item IDs from the "itemID_out" column, due to that recipe using the item that is being output from another recipe.
I have tried to merge it using:
pd.merge(itemlist_modified, recipelist_modified, left_on='itemID', right_on='itemID_out')
But this only ever results in a single column of ideas receiving the pricing data as intended.
I feel like I'm trying to use the wrong function for this, any help would be very much appreciated!
Thanks in advance!
CodePudding user response:
Not a pretty approach, but it first melts the ingredient table into long form and then merges it on the itemlist table
import pandas as pd
import numpy as np
itemlist_modified = pd.DataFrame({
'itemID': [1986, 123, 124],
'name': ['XYZ', 'ABC', 'DEF'],
'buy_price': [129, 10, 12],
'sell_price': [167, 15, 14],
'buy_quantity': [67267, 1500, 550],
'sell_quantity': [21637, 3000, 800],
})
recipelist_modified = pd.DataFrame({
'RecipeID': [1, 2, 3],
'itemID_out': [1986, 1987, 1988],
'qty_out': [1, 1, 1],
'id_1_in': [123, 123, 123],
'qty_id1': [1, 2, 10],
'id_2_in': [124.0, np.nan, 124.0],
'qty_id2': [1.0, np.nan, 3.0],
'id_3_in': [125.0, np.nan, np.nan],
'qty_id3': [1.0, np.nan, np.nan],
'id_4_in': [126.0, np.nan, np.nan],
'qty_id4': [1.0, np.nan, np.nan],
'id_5_in': [127.0, np.nan, np.nan],
'qty_id5': [1.0, np.nan, np.nan],
})
#columns which are not qty or input id cols
id_vars = ['RecipeID','itemID_out','qty_out']
#prepare dict to map column name to ingredient number
col_renames = {}
col_renames.update({'id_{}_in'.format(i 1):'ingr_{}'.format(i 1) for i in range(5)})
col_renames.update({'qty_id{}'.format(i 1):'ingr_{}'.format(i 1) for i in range(5)})
#melt reciplist into longform
long_recipelist = recipelist_modified.melt(
id_vars=id_vars,
var_name='ingredient',
).dropna()
#add a new column to specify whether each row is a qty or an id
long_recipelist['kind'] = np.where(long_recipelist['ingredient'].str.contains('qty'),'qty_in','id_in')
#convert ingredient names
long_recipelist['ingredient'] = long_recipelist['ingredient'].map(col_renames)
#pivot on the new ingredient column
reshape_recipe_list = long_recipelist.pivot(
index=['RecipeID','itemID_out','qty_out','ingredient'],
columns='kind',
values='value',
).reset_index()
#merge with the itemlist
priced_ingredients = pd.merge(reshape_recipe_list, itemlist_modified, left_on='id_in', right_on='itemID')
#pivot on the priced ingredients
priced_ingredients = priced_ingredients.pivot(
index = ['RecipeID','itemID_out','qty_out'],
columns = 'ingredient',
)
#flatten the hierarchical columns
priced_ingredients.columns = ["_".join(a[::-1]) for a in priced_ingredients.columns.to_flat_index()]
priced_ingredients.columns.name = ''
priced_ingredients = priced_ingredients.reset_index()
priced_ingredients
partial output: