I've a time series of poverty values for different countries (countries are in rows, values over time in columns). Unfortunately I've many missing data, but I'd like to calculate the percentage change between the last available year and the earliest available year for each country in Python. So for the ZWE below that would be the pct change between 2019 and 2011 while for ZWB that would be taking 2015 and 2010. Is there a way to automate this? Maybe using iterrows?
Any help would be much appreciated. Thank you!
iso | YR2010 | YR2011 | YR2012 | YR2013 | YR2014 | YR2015 | YR2016 | YR2017 | YR2018 | YR2019 |
---|---|---|---|---|---|---|---|---|---|---|
ZWE | NaN | 22.5 | NaN | NaN | NaN | NaN | NaN | 30.4 | NaN | 38.3 |
ZMB | 54.7 | NaN | NaN | NaN | NaN | 54.4 | NaN | NaN | NaN | NaN |
CodePudding user response:
You can filter the related columns to work on by .filter()
, then for each country (each row), drop the NaN
entries by .dropna()
and then divide the last (non-NaN) entry by the first (non-NaN) entry, minus 1 and multiply by 100 to get the percentages:
df['% change'] = df.filter(regex=r'YR\d ').apply(lambda x: 0 if len((z:=x.dropna())) == 0 else (z.iloc[-1]/z.iloc[0] -1) * 100, axis=1)
or, if your Python version is older than Python 3.8 and does not support the assignment operator :=
, use:
df['% change'] = df.filter(regex=r'YR\d ').apply(lambda x: 0 if len(x.dropna()) == 0 else (x.dropna().iloc[-1]/x.dropna().iloc[0] -1) * 100, axis=1)
Result:
print(df)
iso YR2010 YR2011 YR2012 YR2013 YR2014 YR2015 YR2016 YR2017 YR2018 YR2019 % change
0 ZWE NaN 22.5 NaN NaN NaN NaN NaN 30.4 NaN 38.3 70.222222
1 ZMB 54.7 NaN NaN NaN NaN 54.4 NaN NaN NaN NaN -0.548446
CodePudding user response:
You can use ffill
and bfill
to get the last and first values respectively, then calculate the percentage change. Example:
txt ='''iso YR2010 YR2011 YR2012 YR2013 YR2014 YR2015 YR2016 YR2017 YR2018 YR2019
ZWE NaN 22.5 NaN NaN NaN NaN NaN 30.4 NaN 38.3
ZMB 54.7 NaN NaN NaN NaN 54.4 NaN NaN NaN NaN'''
df = pd.read_csv(StringIO(txt),sep='\t',index_col=0)
last_indices = df.ffill(axis=1).iloc[:,-1]
first_indices = df.bfill(axis=1).iloc[:,0]
percent_change = (last_indices - first_indices)/first_indices *100
print(percent_change)
Gives:
iso
ZWE 70.222222
ZMB -0.548446
dtype: float64