I have a pandas dataframe like this:
df= pd.DataFrame({'year':['2020','2030','2040','2030','2040','2030','2040'],
'group':['a', 'a', 'a', 'b', 'b', 'c', 'c'],
'values':[10,20,30,45,60,12,36]})
year group value
2020 a 10
2030 a 20
2040 a 30
2030 b 45
2040 b 60
2030 c 12
2040 c 36
I want to check if the 2020 value in each group is missing, then add the relevant row and fill it with the value from group a, i.e, see the following result:
year group value
2020 a 10
2030 a 20
2040 a 30
**2020 b 10**
2030 b 45
2040 b 60
**2020 c 10**
2030 c 12
2040 c 36
CodePudding user response:
Use DataFrame.pivot
, forward and back filling missing values and reshape back:
df1 = df.pivot('group','year','values').ffill().bfill().unstack().reset_index(name='value')
print (df1)
year group value
0 2020 a 10.0
1 2020 b 10.0
2 2020 c 10.0
3 2030 a 20.0
4 2030 b 45.0
5 2030 c 12.0
6 2040 a 30.0
7 2040 b 60.0
8 2040 c 36.0
Or:
df2 = (df.pivot('year','group','values')
.ffill(axis=1)
.bfill(axis=1)
.unstack(0)
.reset_index(name='value'))
print (df2)
group year value
0 a 2020 10.0
1 a 2030 20.0
2 a 2040 30.0
3 b 2020 10.0
4 b 2030 45.0
5 b 2040 60.0
6 c 2020 10.0
7 c 2030 12.0
8 c 2040 36.0
CodePudding user response:
You can reindex
and ffill
/bfill
per group:
# get combinations of year/group
idx = pd.MultiIndex.from_product([df['year'].unique(), df['group'].unique()], names=['year', 'group'])
# reindex and fill
out = (df
.set_index(['year', 'group'])
.reindex(idx).reset_index()
.groupby('year', group_keys=False)
.apply(lambda s: s.ffill().bfill())
)
output:
year group values
0 2020 a 10.0
1 2020 b 10.0
2 2020 c 10.0
3 2030 a 20.0
4 2030 b 45.0
5 2030 c 12.0
6 2040 a 30.0
7 2040 b 60.0
8 2040 c 36.0