My data frame looks like this
county_name state year rank county_population city_population
31 Fairfax County Virginia 2010.0 0.0 1086730.0 60300
32 Fairfax County Virginia 2011.0 0.0 1099603.0 60300
33 Fairfax County Virginia 2013.0 0.0 1130364.0 60300
34 Fairfax County Virginia 2014.0 0.0 1138123.0 60300
35 Fairfax County Virginia 2015.0 0.0 1142245.0 60300
I want to insert the missing row for year 2012 and assign it rank 7. For the values of county and city population, I want to take an average of the previous and next rows (2011 and 2013) and fill those values for the missing row.
Any pointers will be highly appreciated
EDIT 1: Expected data frame should be
county_name state year rank county_population city_population
31 Fairfax County Virginia 2010.0 0.0 1086730.0 60300
32 Fairfax County Virginia 2011.0 0.0 1099603.0 60300
33 Fairfax County Virginia 2012.0 7.0 1114984.0 60300
34 Fairfax County Virginia 2013.0 0.0 1130364.0 60300
35 Fairfax County Virginia 2014.0 0.0 1138123.0 60300
36 Fairfax County Virginia 2015.0 0.0 1142245.0 60300
CodePudding user response:
Create a new dataframe and merge them, sort by year and interpolate missing values:
data = [['Fairfax County', 'Virginia', 2012, 7, np.NaN, np.NaN]]
out = df.append(pd.DataFrame(data, columns=df.columns)) \
.sort_values('year').interpolate()
print(out)
Output result:
>>> out
county_name state year rank county_population city_population
31 Fairfax County Virginia 2010 0.0 1086730.0 60300.0
32 Fairfax County Virginia 2011 0.0 1099603.0 60300.0
0 Fairfax County Virginia 2012 7.0 1114983.5 60300.0
33 Fairfax County Virginia 2013 0.0 1130364.0 60300.0
34 Fairfax County Virginia 2014 0.0 1138123.0 60300.0
35 Fairfax County Virginia 2015 0.0 1142245.0 60300.0