Home > database >  Pandas groupby value and get value of max date and min date
Pandas groupby value and get value of max date and min date

Time:11-01

I have a pandas DataFrame df looking like this :

item year value
A    2010  20
A    2011  25
A    2012  32
B    2016  20
B    2019  40
B    2018  50

My goal is to be able, for each item to calculate the difference of value between each date. Then for example, I want to find for item A : 12 (32 -20 because year max is 2012 and year min is 2010) and for item B : 20 (40 - 20, because year max is 2019 and year min is 2016).

I use the following code to get, for each item, year max and year min :

df.groupby("item").agg({'year':[np.min, np.max]})

Then, I find the year min and year max for each item. However, I stuck to make what I want.

CodePudding user response:

Try sort_values by year, then you can groupby and select first for min and last for max:

g = df.sort_values('year').groupby('item')
out = g['value'].last() - g['value'].first()

Output:

item
A    12
B    20
Name: value, dtype: int64

CodePudding user response:

Use:

def fun(x):
    return x[x.index.max()] - x[x.index.min()]


res = df.set_index("year").groupby("item").agg(fun)
print(res)

Output

      value
item       
A        12
B        20

CodePudding user response:

Use loc accessor in agg to calculate value difference and also, you can also concat the first and last year in an item to give you a clear indication of the range.

df.sort_values(by=['item','year']).groupby('item').agg( year=('year', lambda x: str(x.iloc[0])  '-' str(x.iloc[-1])),value=('value', lambda x: x.iloc[-1]-x.iloc[0]))



      year    value
item                  
A     2010-2012     12
B     2016-2019     20
  • Related