I have a dataframe in Python called "Item_Table" and it looks like the below:
--------------------------
item_id | item_description
--------------------------
N4 | Steel
M3 | Oil
B1 | Water
X9 | Coal
Z5 | Plastic
--------------------------
I also have a nested orders list which looks like below. Each list inside the nested list represents one order. One order can have multiple items
orders = [[Z5], [X9, Z5], [B1, Z5, N4], [B1, X9]]
Ideally, I would like to return a nested list with the respective item_descriptions from the "Item_Table" which looks like the below nested list:
orders_descriptions = [[Plastic], [Coal, Plastic], [Water, Plastic, Steel], [Water, Coal]]
I tried a solution where I converted orders list to a dataframe and tried to merge that with item_table to get item_descriptions but because one order has multiple items, I am unable to do the join.
CodePudding user response:
You can try export the item_id
and item_description
to dictionary then loop the orders
d = df.set_index('item_id')['item_description'].to_dict()
orders_descriptions = [[d[o] for o in os] for os in orders]
print(orders_descriptions)
[['Plastic'], ['Coal', 'Plastic'], ['Water', 'Plastic', 'Steel'], ['Water', 'Coal']]
CodePudding user response:
Here's the brute force lookup method, but I think the "export to a dictionary" idea is a much better concept.
import pandas as pd
data = [
['N4','Steel'],
['M3','Oil'],
['B1','Water'],
['X9','Coal'],
['Z5','Plastic']
]
df = pd.DataFrame(data, columns=('item_id','item_description'))
print(df)
orders = [['Z5'], ['X9','Z5'], ['B1', 'Z5', 'N4'], ['B1', 'X9']]
orders_descriptions = []
for order in orders:
orders_descriptions.append( [df[df['item_id'].eq(o)]['item_description'].iloc[0] for o in order] )
print(orders_descriptions)
CodePudding user response:
Here is a slightly different answer (it might be faster for large dataframes, but still to be checked) that still makes use of pandas indexes, but it does not loop through a dictionary.
item_frame.set_index("item_id", inplace=True)
order_descriptions = [item_frame.loc[o]["item_description"].tolist() for o in orders]
And if you want to create a DataFrame with the orders in a single column:
pd.DataFrame({"order_descriptions": order_descriptions})