How to fill missing year value with the existing one from other groups in pyton?-CodePudding

I have a pandas dataframe like this:

    df= pd.DataFrame({'year':['2020','2030','2040','2030','2040','2030','2040'], 
                      'group':['a', 'a', 'a', 'b', 'b', 'c', 'c'],
                      'values':[10,20,30,45,60,12,36]})

          year   group  value
          2020     a      10
          2030     a      20
          2040     a      30
          2030     b      45
          2040     b      60
          2030     c      12
          2040     c      36

I want to check if the 2020 value in each group is missing, then add the relevant row and fill it with the value from group a, i.e, see the following result:

          year   group  value
          2020     a      10
          2030     a      20
          2040     a      30
        **2020     b      10**
          2030     b      45
          2040     b      60
        **2020     c      10**
          2030     c      12
          2040     c      36

CodePudding user response：

Use DataFrame.pivot, forward and back filling missing values and reshape back:

df1 = df.pivot('group','year','values').ffill().bfill().unstack().reset_index(name='value')
print (df1)
   year group  value
0  2020     a   10.0
1  2020     b   10.0
2  2020     c   10.0
3  2030     a   20.0
4  2030     b   45.0
5  2030     c   12.0
6  2040     a   30.0
7  2040     b   60.0
8  2040     c   36.0

Or:

df2 = (df.pivot('year','group','values')
         .ffill(axis=1)
         .bfill(axis=1)
         .unstack(0)
         .reset_index(name='value'))
print (df2)
  group  year  value
0     a  2020   10.0
1     a  2030   20.0
2     a  2040   30.0
3     b  2020   10.0
4     b  2030   45.0
5     b  2040   60.0
6     c  2020   10.0
7     c  2030   12.0
8     c  2040   36.0

CodePudding user response：

You can reindex and ffill/bfill per group:

# get combinations of year/group
idx = pd.MultiIndex.from_product([df['year'].unique(), df['group'].unique()], names=['year', 'group'])

# reindex and fill
out = (df
   .set_index(['year', 'group'])
   .reindex(idx).reset_index()
   .groupby('year', group_keys=False)
   .apply(lambda s: s.ffill().bfill())
)

output:

   year group  values
0  2020     a    10.0
1  2020     b    10.0
2  2020     c    10.0
3  2030     a    20.0
4  2030     b    45.0
5  2030     c    12.0
6  2040     a    30.0
7  2040     b    60.0
8  2040     c    36.0