Calculate value difference using min and max dates-CodePudding

I am trying to calculate value growth/decline using the minimum date and maximum date. My data currently looks like this:

    Code      Date        Value
0   A         2020-12-31  80122.0
1   A         2019-12-31  45472.0
2   A         2018-12-31  31917.0
3   A         2017-12-31  23432.0
4   B         2020-12-31      0.0

For Code A I need to keep the max (2020-12-31) and min (2017-12-31) dates as well as the values so I can calculate the difference later on. I have multiple codes and need to be able to apply the same logic to each one. Any suggestions on the best way to approach this?

Thanks

CodePudding user response：

In your case, you want to sort the date, then groupby and extract first, last:

 df.sort_values(['Code','Date']).groupby('Code')['Value'].agg(['first','last'])

Output:

        first     last
Code                  
A     23432.0  80122.0
B         0.0      0.0

CodePudding user response：

I would first sort_values then you can drop_duplicates on 'Code'. Using different logic for keep this allows you to get the first and last row (based on Date) within each 'Code', which you can then subtract to get the day difference and Value difference for each code.

df = df.sort_values(['Code', 'Date'])

(df.drop_duplicates('Code', keep='last').set_index('Code')
 - df.drop_duplicates('Code', keep='first').set_index('Code'))

#          Date    Value
#Code                   
#A    1096 days  56690.0
#B       0 days      0.0

Alternatively if you don't just need the difference and actually need the rows, then I would concat those together instead of subtracting. The main reason to avoid the .first aggregation is because it does not guarantee data come from the same rows (without specifying dropna) in the case of null values.

pd.concat([df.drop_duplicates('Code', keep='last').set_index('Code'),
           df.drop_duplicates('Code', keep='first').set_index('Code')],
          keys=['Last', 'First'], axis=1)

#           Last               First         
#           Date    Value       Date    Value
#Code                                        
#A    2020-12-31  80122.0 2017-12-31  23432.0
#B    2020-12-31      0.0 2020-12-31      0.0

CodePudding user response：

since you

need to keep the max (2020-12-31) and min (2017-12-31) dates as well as the values...

, you can try:

df = pd.DataFrame({'Code':['A','A','A','A','B'], 
                   'Date': ['2020-12-31', '2019-12-31', '2018-12-31', '2017-12-31', '2020-12-31'],
                   'Value': [80122.0, 45472.0, 31917.0, 23432.0, 0.0] 
                  }, )

df.loc[:, 'Date'] = pd.to_datetime(df.loc[:, 'Date'])

is the df mentioned:

    Code    Date    Value
0   A   2020-12-31  80122.0
1   A   2019-12-31  45472.0
2   A   2018-12-31  31917.0
3   A   2017-12-31  23432.0
4   B   2020-12-31  0.0

so another way can be:

dictionary = {}

for code in df.loc[:, 'Code'].unique():
    dictionary[code] = {'Date min': df.loc[df.loc[:, 'Code']==code,'Date'].min(),
                        'Value min': df.loc[(df.loc[:, 'Code']==code)& (df.loc[:,'Date'] == df.loc[df.loc[:, 'Code']==code,'Date'].min()), 'Value'].values[0],
                        'Date max': df.loc[df.loc[:, 'Code']==code,'Date'].max(),
                        'Value max':df.loc[(df.loc[:, 'Code']==code)&(df.loc[:,'Date'] == df.loc[df.loc[:, 'Code']==code,'Date'].max()), 'Value'].values[0]
                                     
                       }
resume = pd.DataFrame(dictionary)
resume = resume.transpose()
resume

that outputs:

    Date min    Value min   Date max    Value max
A   2017-12-31  23432.0   2020-12-31    80122.0
B   2020-12-31  0.0       2020-12-31    0.0