Calculating the difference between the first non-na value and the last na-value based on a grouped c-CodePudding

I am looking to calculate the percentage increase or decrease between the first and last non-na value for the following dataset:

Year	Company	Data
2019	X	341976.00
2020	X	1.000
2021	X	282872.00
2019	Y	NaN
2020	Y	NaN
2021	Y	NaN
2019	Z	4394.00
2020	Z	173.70
2021	Z	518478.00

As I want the relative change I would expect the formula to do something like:

(last non-na value)/(first non-na value)-1

This should return something like:

Year	Company	Data	Data
2019	X	341976.00	NaN
2020	X	1.000	NaN
2021	X	282872.00	-0.17
2019	Y	NaN	NaN
2020	Y	NaN	NaN
2021	Y	NaN	NaN
2019	Z	4394.00	NaN
2020	Z	173.70	NaN
2021	Z	518478.00	11.700

I have tried to combine groupby based on the company field with the first_valid_index but havent had any luck finding a solution. What is the most efficient way of calculating the relative change as above?

CodePudding user response：

If aggregate GroupBy.first and GroupBy.last it omit missing values, so is possible divide values and subtract 1:

s = df.groupby('Company')['Data'].agg(['last','first']).eval('last / first').sub(1)

Then found index values for last non missing values per Company:

idx = df.dropna(subset=['Data']).drop_duplicates(['Company'], keep='last').index

And mapping only matchded rows by Series.map:

df.loc[idx, 'Date'] = df.loc[idx, 'Company'].map(s)
print (df)

   Year Company      Data        Date
0  2019       X  341976.0         NaN
1  2020       X       1.0         NaN
2  2021       X  282872.0   -0.172831
3  2019       Y       NaN         NaN
4  2020       Y       NaN         NaN
5  2021       Y       NaN         NaN
6  2019       Z    4394.0         NaN
7  2020       Z     173.7         NaN
8  2021       Z  518478.0  116.996814

CodePudding user response：

To find first non-na value u can:

iterate from first to last element of column and break if u value is not np.nan,
use .dropna method on dataframe and gets 1st element from the result df.

To find last:

iterate from the last to first and (just like above),
use dropna and gets value from last row