Home > Blockchain >  Update pandas cell in one dataframe from looked up value in second dataframe
Update pandas cell in one dataframe from looked up value in second dataframe

Time:10-24

Update pandas cell in one Dataframe from looked up value in second Dataframe

I have a case where I need to update a cell in one Dataframe, 'Stock', which holds records of stock on-hand, looking up its value in a second Dataframe, 'Items', which is the table of all items. This is a simplified example of the Dataframes with relevant fields.

Stock

    item_no qty
0   9H.111  101
1   9H.222  230
2   MODEL_B 136
3   9H.444  344
4   MODEL_E 505
5   9H.666  332

Items

    item_no model_no
0   9H.111  MODEL_A
1   9H.222  MODEL_B
2   9H.333  MODEL_B
3   9H.444  MODEL_C
4   9H.555  MODEL_D
5   9H.666  MODEL_E
6   9H.777  MODEL_D
7   9H.888  MODEL_F

The challenge

I have previously done this in PostgreSQL but would like to see if I can do all the processing in Pandas (I plan to link to the PostgreSQL table of items). If we look at the Stock table the item_no column should only have item numbers, see Items Dataframe (table), but sometimes the users put in the model_no instead of the item number. So in the Stock dataframe, row 2 incorrectly has the value MODEL_B.

What's needed

What is needed to be done is to:

  1. get the value MODEL_B from the item_no column in the Stock dataframe
  2. find that in the model_no column of the Items dataframe
  3. then get the value from the item_no field of the Items dataframe
  4. use that value to replace the (incorrect) model number value in the item_no column of the Stock dataframe

It gets a little more challenging... a model may have more than one part number:

1   9H.222  MODEL_B
2   9H.333  MODEL_B

In this case the 'highest' part number, in this case 9H.333, needs to be used. In SQL I use the MAX() operator.

I would like to perform this using 'set' operations in pandas (not looping), similar to running a query in SQL. So this would mean (?) joining the two dataframes on the fields stock.item_no <-> items.model_no (?) - I'm not sure how to go about it hence the question marks.

Generate Dataframes

This code will generate the dataframes discussed above.

stock = pd.DataFrame({
    'item_no': ['9H.111', '9H.222', 'MODEL_B', '9H.444', 'MODEL_E', '9H.666'],
    'qty': [101, 230, 136, 344, 505, 332],
})
items = pd.DataFrame({
    'item_no': ['9H.111', '9H.222', '9H.333', '9H.444', '9H.555', '9H.666', '9H.777', '9H.888'],
    'model_no': ['MODEL_A', 'MODEL_B', 'MODEL_B', 'MODEL_C', 'MODEL_D', 'MODEL_E', 'MODEL_D', 'MODEL_F']
})

display(stock)
display(items)

CodePudding user response:

You can use:

# keep max item_no per model_no
# convert to mapping Series
mapper = (items.sort_values(by='item_no')
               .drop_duplicates(subset='model_no', keep='last')
               .set_index('model_no')['item_no']
         )

# identify rows with model_no
m = stock['item_no'].isin(mapper.index)

# replace the values in place
stock.loc[m, 'item_no'] = stock.loc[m, 'item_no'].map(mapper)

updated stock:

  item_no  qty
0  9H.111  101
1  9H.222  230
2  9H.333  136
3  9H.444  344
4  9H.666  505
5  9H.666  332
  • Related