Home > Software design >  Pandas DataFrame keep earliest column with values
Pandas DataFrame keep earliest column with values

Time:11-03

I am consuming dataframes of demographics data on a NUTS3 level and different regions report data at different time intervals. I always want the newest data, the year varies from region to region.

Data might look like below |

region|2015|2014|2013|
AT201 | 101| 100|    |
AB301 |    | 123| 456|
AB302 |    |    | 234|

How can I combine year columns into a single column holding the newest data, like

region|newest_data|
AT201 |        101|
AB301 |        123| 
AB302 |        234|

Below a link to screenshot of actual data https://i.stack.imgur.com/AUP3A.png

Any tips or pointers are much appreciated!

CodePudding user response:

Convert not years columns to index by DataFrame.set_index, back filling missing values, select first column, rename it and convert to DataFrame:

df = df.set_index(['region']).bfill(axis=1).iloc[:, 0].rename('newest_data').reset_index()
  • Related