I have a dataset which looks like following-
| year | state | election year 1-yes|
|---------------|-------------|--------------------|
| 2010 | haryana | 1 |
| 2010 | haryana | 1 |
| 2010 | up | 0 |
| 2011 | goa | 1 |
| 2012 | haryana | 0 |
| 2012 | up | 1 |
| 2013 | up | 0 |
| 2013 | up | 0 |
| 2013 | haryana | 1 |
| 2015 | haryana | 0 |
| 2015 | up | 0 |
The third column tells if there is an election in that particular state in that particular year. Haryana has election in 2010, so 1, but goa and up do not have any election in 2010, so the election year is 0 for them.
I want a loop which can add another column which tells the number of years until next election. So, when there is a election in haryana in 2010, it will be 0 (as the election is scheduled in this year only) and in 2012, it will be 3 as next election is in 2015. So, the required column has a range from 0 to 4 for all the states.
The table below summarizes my desired output.
| year | state | election year 1-yes| years until next election|
|------|--------|--------------------|--------------------------|
| 2010 |haryana | 1 | 0 |
| 2010 |haryana | 1 | 0 |
| 2010 | up | 0 | 2 |
| 2011 | goa | 1 | 0 |
| 2012 |haryana | 0 | 3 |
| 2012 | up | 1 | 0 |
| 2013 | up | 0 | 4 |
| 2013 | up | 0 | 4 |
| 2013 |haryana | 0 | 2 |
| 2015 |haryana | 1 | 0 |
| 2015 | up | 0 | 2 |
I am looking for a simple code loop on year and state as my data has over 30 states and 12 years.
CodePudding user response:
Example
data = [[2010, 'haryana', 1], [2010, 'haryana', 1], [2010, 'up', 0],
[2011, 'goa', 1], [2012, 'haryana', 0], [2012, 'up', 1],
[2013, 'up', 0], [2013, 'up', 0], [2013, 'haryana', 0],
[2015, 'haryana', 1], [2015, 'up', 0]]
df = pd.DataFrame(data, columns=['year', 'state', 'election'])
df
year state election
0 2010 haryana 1
1 2010 haryana 1
2 2010 up 0
3 2011 goa 1
4 2012 haryana 0
5 2012 up 1
6 2013 up 0
7 2013 up 0
8 2013 haryana 0
9 2015 haryana 1
10 2015 up 0
Code
df1 = df.drop_duplicates().copy()
cond1 = df1['election'].eq(0)
s = (df1['election'].mask(cond1).mask(~cond1, df['year'])
.groupby(df1['state']).bfill())
df1['next'] = s.mask(~cond1, 0).mask(cond1, s.sub(df1['year']))
df1
year state election next
0 2010 haryana 1 0.0
2 2010 up 0 2.0
3 2011 goa 1 0.0
4 2012 haryana 0 3.0
5 2012 up 1 0.0
6 2013 up 0 NaN
8 2013 haryana 0 2.0
9 2015 haryana 1 0.0
10 2015 up 0 NaN
merge df and df1
df.merge(df1, how='left')
result:
year state election next
0 2010 haryana 1 0.0
1 2010 haryana 1 0.0
2 2010 up 0 2.0
3 2011 goa 1 0.0
4 2012 haryana 0 3.0
5 2012 up 1 0.0
6 2013 up 0 NaN
7 2013 up 0 NaN
8 2013 haryana 0 2.0
9 2015 haryana 1 0.0
10 2015 up 0 NaN