I am trying to concord districts using my panel dataset for 12 years. some districts are newly carved out of older districts. For, example, a new district Agar Malwa was formed from the district Shajapur in 2014. I want to add Agar Malwa to Shajapur everytime it appears after 2014 and get only new combined Shajapur in my data set dropping Agar Malwa and Shajapur from it.
My dataset looks like following-
Year | District | State | values |
---|---|---|---|
2012 | Shajapur | x | 23 |
2013 | Shajapur | x | 21 |
2014 | Shajapur | x | 16 |
2014 | Agar Malwa | x | 8 |
2015 | Shajapur | x | 17 |
2015 | Agar Malwa | x | 10 |
upto year 2021. I need the new dataset in the following manner-
Year | District | State | values |
---|---|---|---|
2012 | Shajapur | x | 23 |
2013 | Shajapur | x | 21 |
2014 | Shajapur | x | 24 |
2015 | Shajapur | x | 27 |
up to year 2021. I have to do this for about 100 districts and which were formed, renamed, bifurcated between 2010 and 2021. Please suggest some loop code, or some basic code which can be used here.
CodePudding user response:
data = {'Year':['2012','2013','2014','2014','2015','2015'],
'District': ['Shajapur','Shajapur','Shajapur','Agar Malwa','Shajapur','Agar Malwa'],
'State':['x','x','x','x','x','x'], 'values':[23,21,16,8,17,10]}
df = pd.DataFrame(data)
df['District'].replace('Agar Malwa','Shajapur',inplace=True)
Year District State values
0 2012 Shajapur x 23
1 2013 Shajapur x 21
2 2014 Shajapur x 16
3 2014 Shajapur x 8
4 2015 Shajapur x 17
5 2015 Shajapur x 10
df = df.groupby(['Year','District','State']).sum()
Year District State
2012 Shajapur x 23
2013 Shajapur x 21
2014 Shajapur x 24
2015 Shajapur x 27