Home > Software design >  Get values from a dataframe based on a nested list in Python
Get values from a dataframe based on a nested list in Python

Time:05-12

I have a dataframe in Python called "Item_Table" and it looks like the below:

--------------------------
item_id | item_description
--------------------------
N4      | Steel
M3      | Oil
B1      | Water
X9      | Coal
Z5      | Plastic
--------------------------

I also have a nested orders list which looks like below. Each list inside the nested list represents one order. One order can have multiple items

orders = [[Z5], [X9, Z5], [B1, Z5, N4], [B1, X9]]

Ideally, I would like to return a nested list with the respective item_descriptions from the "Item_Table" which looks like the below nested list:

orders_descriptions = [[Plastic], [Coal, Plastic], [Water, Plastic, Steel], [Water, Coal]]

I tried a solution where I converted orders list to a dataframe and tried to merge that with item_table to get item_descriptions but because one order has multiple items, I am unable to do the join.

CodePudding user response:

You can try export the item_id and item_description to dictionary then loop the orders

d = df.set_index('item_id')['item_description'].to_dict()

orders_descriptions = [[d[o] for o in os] for os in orders]
print(orders_descriptions)

[['Plastic'], ['Coal', 'Plastic'], ['Water', 'Plastic', 'Steel'], ['Water', 'Coal']]

CodePudding user response:

Here's the brute force lookup method, but I think the "export to a dictionary" idea is a much better concept.

import pandas as pd
data = [
    ['N4','Steel'],
    ['M3','Oil'],
    ['B1','Water'],
    ['X9','Coal'],
    ['Z5','Plastic']
]
df = pd.DataFrame(data, columns=('item_id','item_description'))
print(df)

orders = [['Z5'], ['X9','Z5'], ['B1', 'Z5', 'N4'], ['B1', 'X9']]

orders_descriptions = []
for order in orders:
    orders_descriptions.append( [df[df['item_id'].eq(o)]['item_description'].iloc[0] for o in order] )

print(orders_descriptions)

CodePudding user response:

Here is a slightly different answer (it might be faster for large dataframes, but still to be checked) that still makes use of pandas indexes, but it does not loop through a dictionary.

item_frame.set_index("item_id", inplace=True)
order_descriptions = [item_frame.loc[o]["item_description"].tolist() for o in orders]

And if you want to create a DataFrame with the orders in a single column:

pd.DataFrame({"order_descriptions": order_descriptions})
  • Related