# This is my purchase dataframe
data = {
'order_date': ['2022-01-02' , '2022-05-03' , '2022-02-01' , '2022-09-30'],
'order_id' : ['001' , '002' , '003' , '004'],
'item_code': ['12-05', '13-05', '13-05' , '12-05' ],
'price': [20, 21, 19,17], }
df_purchase = pd.DataFrame (data)
df_purchase
order_date order_id item_code price
2022-01-02 001 12-05 20
2022-05-03 002 13-05 21
2022-02-01 003 13-05 19
2022-09-30 004 12-05 17
I have wanted to get the last purchase price of the item according the order_date like below
order_date order_id item_code price
2022-05-03 002 13-05 21
2022-09-30 004 12-05 17
I have tried in many ways but couldn't achieve the result.
CodePudding user response:
Try this:
(
df_purchase
.sort_values(['item_code', 'order_date'])
.groupby('item_code', as_index=False)
.last()
)
item_code order_date order_id price
0 12-05 2022-09-30 004 17
1 13-05 2022-05-03 002 21
CodePudding user response:
Please be sure to answer the question. Provide details and share your research!
CodePudding user response:
Pl1ease be sure to answer the question. Provide details and share your research!
CodePudding user response:
One approach could be as follows:
- First, use
df.sort_values
to sort the df onorder_date
. - Next, use
df.drop_duplicates
on columnitem_code
, setting thekeep
parameter tolast
. - Finally, reset the index using
df.reset_index
.
res = df_purchase.sort_values('order_date').drop_duplicates(
subset='item_code', keep='last').reset_index(drop=True)
print(res)
order_date order_id item_code price
0 2022-05-03 002 13-05 21
1 2022-09-30 004 12-05 17
Alternative method could be:
- First, turn your
order_date
values into "datetime", usingpd.to_datetime
. Doing so allows you to usedf.groupby
without having to sort thedf
first (a rather costly operation). - For the groups, get
idxmax
, i.e. the index for the max value for each group. ChainSeries.to_numpy
to use the result to select fromdf_purchase
withdf.iloc
.
df_purchase['order_date'] = pd.to_datetime(df_purchase['order_date'])
res = df_purchase.iloc[df_purchase.groupby('item_code')\
['order_date'].idxmax().to_numpy()].reset_index(drop=True)
print(res)
order_date order_id item_code price
0 2022-09-30 004 12-05 17
1 2022-05-03 002 13-05 21
Note that this will get you 13-05
before 12-05
. So, if you are set on the particular order in res
, you will still need to sort, of course.