Home > database >  Calculate the percentage change taking the earliest and last available value for different time seri
Calculate the percentage change taking the earliest and last available value for different time seri

Time:09-21

I've a time series of poverty values for different countries (countries are in rows, values over time in columns). Unfortunately I've many missing data, but I'd like to calculate the percentage change between the last available year and the earliest available year for each country in Python. So for the ZWE below that would be the pct change between 2019 and 2011 while for ZWB that would be taking 2015 and 2010. Is there a way to automate this? Maybe using iterrows?

Any help would be much appreciated. Thank you!

iso YR2010 YR2011 YR2012 YR2013 YR2014 YR2015 YR2016 YR2017 YR2018 YR2019
ZWE NaN 22.5 NaN NaN NaN NaN NaN 30.4 NaN 38.3
ZMB 54.7 NaN NaN NaN NaN 54.4 NaN NaN NaN NaN

CodePudding user response:

You can filter the related columns to work on by .filter(), then for each country (each row), drop the NaN entries by .dropna() and then divide the last (non-NaN) entry by the first (non-NaN) entry, minus 1 and multiply by 100 to get the percentages:

df['% change'] = df.filter(regex=r'YR\d ').apply(lambda x: 0 if len((z:=x.dropna())) == 0 else (z.iloc[-1]/z.iloc[0] -1) * 100, axis=1)

or, if your Python version is older than Python 3.8 and does not support the assignment operator :=, use:

df['% change'] = df.filter(regex=r'YR\d ').apply(lambda x: 0 if len(x.dropna()) == 0 else (x.dropna().iloc[-1]/x.dropna().iloc[0] -1) * 100, axis=1)

Result:

print(df)

   iso  YR2010  YR2011  YR2012  YR2013  YR2014  YR2015  YR2016  YR2017  YR2018  YR2019   % change
0  ZWE     NaN    22.5     NaN     NaN     NaN     NaN     NaN    30.4     NaN    38.3  70.222222
1  ZMB    54.7     NaN     NaN     NaN     NaN    54.4     NaN     NaN     NaN     NaN  -0.548446

CodePudding user response:

You can use ffill and bfill to get the last and first values respectively, then calculate the percentage change. Example:

txt ='''iso YR2010  YR2011  YR2012  YR2013  YR2014  YR2015  YR2016  YR2017  YR2018  YR2019
ZWE NaN 22.5    NaN NaN NaN NaN NaN 30.4    NaN 38.3
ZMB 54.7    NaN NaN NaN NaN 54.4    NaN NaN NaN NaN'''
df = pd.read_csv(StringIO(txt),sep='\t',index_col=0)

last_indices = df.ffill(axis=1).iloc[:,-1]
first_indices = df.bfill(axis=1).iloc[:,0]
percent_change = (last_indices - first_indices)/first_indices *100
print(percent_change)

Gives:

iso
ZWE    70.222222
ZMB    -0.548446
dtype: float64
  • Related