I am consuming dataframes of demographics data on a NUTS3 level and different regions report data at different time intervals. I always want the newest data, the year varies from region to region.
Data might look like below |
region|2015|2014|2013|
AT201 | 101| 100| |
AB301 | | 123| 456|
AB302 | | | 234|
How can I combine year columns into a single column holding the newest data, like
region|newest_data|
AT201 | 101|
AB301 | 123|
AB302 | 234|
Below a link to screenshot of actual data https://i.stack.imgur.com/AUP3A.png
Any tips or pointers are much appreciated!
CodePudding user response:
Convert not years columns to index by DataFrame.set_index
, back filling missing values, select first column, rename it and convert to DataFrame
:
df = df.set_index(['region']).bfill(axis=1).iloc[:, 0].rename('newest_data').reset_index()