Home > Blockchain >  Get last purchase price from a pandas dataframe according the order date
Get last purchase price from a pandas dataframe according the order date

Time:10-24

# This is my purchase dataframe

data = {
    'order_date': ['2022-01-02' , '2022-05-03' , '2022-02-01' , '2022-09-30'],
    'order_id' : ['001' , '002' , '003' , '004'],
    'item_code': ['12-05', '13-05', '13-05' , '12-05' ], 
    'price': [20, 21, 19,17], }

df_purchase = pd.DataFrame (data)

df_purchase
     order_date    order_id   item_code    price
     2022-01-02    001        12-05        20
     2022-05-03    002        13-05        21
     2022-02-01    003        13-05        19
     2022-09-30    004        12-05        17

I have wanted to get the last purchase price of the item according the order_date like below

     order_date    order_id   item_code    price
     2022-05-03    002        13-05        21
     2022-09-30    004        12-05        17

I have tried in many ways but couldn't achieve the result.

CodePudding user response:

Try this:

(
    df_purchase
    .sort_values(['item_code', 'order_date'])
    .groupby('item_code', as_index=False)
    .last()
)
  item_code  order_date order_id  price
0     12-05  2022-09-30      004     17
1     13-05  2022-05-03      002     21

CodePudding user response:

enter image description here

Please be sure to answer the question. Provide details and share your research!

CodePudding user response:

enter image description here

Pl1ease be sure to answer the question. Provide details and share your research!

CodePudding user response:

One approach could be as follows:

res = df_purchase.sort_values('order_date').drop_duplicates(
    subset='item_code', keep='last').reset_index(drop=True)

print(res)

   order_date order_id item_code  price
0  2022-05-03      002     13-05     21
1  2022-09-30      004     12-05     17

Alternative method could be:

  • First, turn your order_date values into "datetime", using pd.to_datetime. Doing so allows you to use df.groupby without having to sort the df first (a rather costly operation).
  • For the groups, get idxmax, i.e. the index for the max value for each group. Chain Series.to_numpy to use the result to select from df_purchase with df.iloc.
df_purchase['order_date'] = pd.to_datetime(df_purchase['order_date'])
res = df_purchase.iloc[df_purchase.groupby('item_code')\
                  ['order_date'].idxmax().to_numpy()].reset_index(drop=True)

print(res)

  order_date order_id item_code  price
0 2022-09-30      004     12-05     17
1 2022-05-03      002     13-05     21

Note that this will get you 13-05 before 12-05. So, if you are set on the particular order in res, you will still need to sort, of course.

  • Related