Home > other >  merge dataframes and add price data for each instance of an item ID
merge dataframes and add price data for each instance of an item ID

Time:05-19

I am trying to merge two dataframes so that each instance of an item ID in DF3 displays the pricing data associated with the matching ID from DF1.

DF3 (what I am trying to accomplish)

recipeID itemID_out qty_out buy_price sell_price buy_quantity sell_quantity id_1_in qty_id1 buy_price sell_price buy_quantity sell_quantity id_2_in qty_id2 buy_price sell_price buy_quantity sell_quantity id_3_in qty_id3 buy_price sell_price buy_quantity sell_quantity id_4_in qty_id4 buy_price sell_price buy_quantity sell_quantity id_5_in qty_id5 buy_price sell_price buy_quantity sell_quantity
1 1986 1 129 167 67267 21637 123 1 10 15 1500 3000 124 1 12 14 550 800 125 1 8 12 124 254 126 1 22 25 1251 890 127 1 64 72 12783 1251515
2 1987 1 1521 1675 654 1245 123 2 10 15 1500 3000
3 1988 1 128376 131429 47 23 123 10 10 15 1500 3000 124 3 12 14 550 800

These are the two dataframes I am trying to merge from.

DF1: Contains 26863 rows; master list of item names, IDs, and price data. Pulled from API, new items can be added and will appear as new rows after an update request from the user.

itemID name buy_price sell_price buy_quantity sell_quantity
1986 XYZ 129 167 67267 21637
123 ABC 10 15 1500 3000
124 DEF 12 14 550 800

DF2 (contains 12784 rows; recipes that combine from items in the master list. Pulled from API, new recipes can be added and will appear as new rows after an update request from the user.)

recipeID itemID_out qty_out id_1_in qty_id1 id_2_in qty_id2 id_3_in qty_id3 id_4_in qty_id4 id_5_in qty_id5
1 1986 1 123 1 124 1 125 1 126 1 127 1
2 1987 1 123 2
3 1988 1 123 10 124 3

Recipes can contain a combination of 1 to 5 items (null values occur) that consist of IDs from DF1 and/or the itemID_out column in DF2.

The "id_#_in" columns in DF2 can contain item IDs from the "itemID_out" column, due to that recipe using the item that is being output from another recipe.

I have tried to merge it using:

pd.merge(itemlist_modified, recipelist_modified, left_on='itemID', right_on='itemID_out')

But this only ever results in a single column of ideas receiving the pricing data as intended.

I feel like I'm trying to use the wrong function for this, any help would be very much appreciated!

Thanks in advance!

CodePudding user response:

Not a pretty approach, but it first melts the ingredient table into long form and then merges it on the itemlist table

import pandas as pd
import numpy as np

itemlist_modified = pd.DataFrame({
    'itemID': [1986, 123, 124],
    'name': ['XYZ', 'ABC', 'DEF'],
    'buy_price': [129, 10, 12],
    'sell_price': [167, 15, 14],
    'buy_quantity': [67267, 1500, 550],
    'sell_quantity': [21637, 3000, 800],
})

recipelist_modified = pd.DataFrame({
    'RecipeID': [1, 2, 3],
    'itemID_out': [1986, 1987, 1988],
    'qty_out': [1, 1, 1],
    'id_1_in': [123, 123, 123],
    'qty_id1': [1, 2, 10],
    'id_2_in': [124.0, np.nan, 124.0],
    'qty_id2': [1.0, np.nan, 3.0],
    'id_3_in': [125.0, np.nan, np.nan],
    'qty_id3': [1.0, np.nan, np.nan],
    'id_4_in': [126.0, np.nan, np.nan],
    'qty_id4': [1.0, np.nan, np.nan],
    'id_5_in': [127.0, np.nan, np.nan],
    'qty_id5': [1.0, np.nan, np.nan],
})
    
#columns which are not qty or input id cols
id_vars = ['RecipeID','itemID_out','qty_out']

#prepare dict to map column name to ingredient number
col_renames = {}
col_renames.update({'id_{}_in'.format(i 1):'ingr_{}'.format(i 1) for i in range(5)})
col_renames.update({'qty_id{}'.format(i 1):'ingr_{}'.format(i 1) for i in range(5)})

#melt reciplist into longform
long_recipelist = recipelist_modified.melt(
    id_vars=id_vars,
    var_name='ingredient',
).dropna()

#add a new column to specify whether each row is a qty or an id
long_recipelist['kind'] = np.where(long_recipelist['ingredient'].str.contains('qty'),'qty_in','id_in')

#convert ingredient names
long_recipelist['ingredient'] = long_recipelist['ingredient'].map(col_renames)

#pivot on the new ingredient column
reshape_recipe_list = long_recipelist.pivot(
    index=['RecipeID','itemID_out','qty_out','ingredient'],
    columns='kind',
    values='value',
).reset_index()

#merge with the itemlist
priced_ingredients = pd.merge(reshape_recipe_list, itemlist_modified, left_on='id_in', right_on='itemID')

#pivot on the priced ingredients
priced_ingredients = priced_ingredients.pivot(
    index = ['RecipeID','itemID_out','qty_out'],
    columns = 'ingredient',
)

#flatten the hierarchical columns
priced_ingredients.columns = ["_".join(a[::-1]) for a in priced_ingredients.columns.to_flat_index()]
priced_ingredients.columns.name = ''

priced_ingredients = priced_ingredients.reset_index()

priced_ingredients partial output:

enter image description here

  • Related