Calculate the percentage change taking the earliest and last available value for different time seri-CodePudding

I've a time series of poverty values for different countries (countries are in rows, values over time in columns). Unfortunately I've many missing data, but I'd like to calculate the percentage change between the last available year and the earliest available year for each country in Python. So for the ZWE below that would be the pct change between 2019 and 2011 while for ZWB that would be taking 2015 and 2010. Is there a way to automate this? Maybe using iterrows?

Any help would be much appreciated. Thank you!

iso	YR2010	YR2011	YR2012	YR2013	YR2014	YR2015	YR2016	YR2017	YR2018	YR2019
ZWE	NaN	22.5	NaN	NaN	NaN	NaN	NaN	30.4	NaN	38.3
ZMB	54.7	NaN	NaN	NaN	NaN	54.4	NaN	NaN	NaN	NaN

CodePudding user response：

You can filter the related columns to work on by .filter(), then for each country (each row), drop the NaN entries by .dropna() and then divide the last (non-NaN) entry by the first (non-NaN) entry, minus 1 and multiply by 100 to get the percentages:

df['% change'] = df.filter(regex=r'YR\d ').apply(lambda x: 0 if len((z:=x.dropna())) == 0 else (z.iloc[-1]/z.iloc[0] -1) * 100, axis=1)

or, if your Python version is older than Python 3.8 and does not support the assignment operator :=, use:

df['% change'] = df.filter(regex=r'YR\d ').apply(lambda x: 0 if len(x.dropna()) == 0 else (x.dropna().iloc[-1]/x.dropna().iloc[0] -1) * 100, axis=1)

Result:

print(df)

   iso  YR2010  YR2011  YR2012  YR2013  YR2014  YR2015  YR2016  YR2017  YR2018  YR2019   % change
0  ZWE     NaN    22.5     NaN     NaN     NaN     NaN     NaN    30.4     NaN    38.3  70.222222
1  ZMB    54.7     NaN     NaN     NaN     NaN    54.4     NaN     NaN     NaN     NaN  -0.548446

CodePudding user response：

You can use ffill and bfill to get the last and first values respectively, then calculate the percentage change. Example:

txt ='''iso YR2010  YR2011  YR2012  YR2013  YR2014  YR2015  YR2016  YR2017  YR2018  YR2019
ZWE NaN 22.5    NaN NaN NaN NaN NaN 30.4    NaN 38.3
ZMB 54.7    NaN NaN NaN NaN 54.4    NaN NaN NaN NaN'''
df = pd.read_csv(StringIO(txt),sep='\t',index_col=0)

last_indices = df.ffill(axis=1).iloc[:,-1]
first_indices = df.bfill(axis=1).iloc[:,0]
percent_change = (last_indices - first_indices)/first_indices *100
print(percent_change)

Gives:

iso
ZWE    70.222222
ZMB    -0.548446
dtype: float64