Home > Software engineering >  Pandas interpolation adding rows by group with different ranges for each group
Pandas interpolation adding rows by group with different ranges for each group

Time:12-15

I am trying to add rows to a DataFrame interpolating values in a column by group, and fill with missing all other columns. My data looks something like this:

import pandas as pd 
import random

random.seed(42)
data = {'group':['a', 'a', 'a', 'b', 'b', 'b', 'b', 'c', 'c', 'c' ],
        'value' : [1, 2, 5,  3, 4, 5, 7, 4, 7, 9],
        'other': random.sample(range(1, 100), 10)}

df = pd.DataFrame(data)
print(df)
  group  value  other
0     a      1     82
1     a      2     15
2     a      5      4
3     b      3     95
4     b      4     36
5     b      5     32
6     b      7     29
7     c      4     18
8     c      7     14
9     c      9     87

What I am trying to achieve is something like this:

   group  value  other
      a      1     82
      a      2     15
      a      3     NaN
      a      4     NaN
      a      5     NaN
      b      3     95
      b      4     36
      b      5     32
      b      6     NaN
      b      7     29
      c      4     18
      c      5     NaN
      c      6     NaN
      c      7     14
      c      8     NaN
      c      9     87

For example, group a has a range from 1 to 5, b from 3 to 7, and c from 4 to 9.

The issue I'm having is that each group has a different range. I found something that works assuming a single range for all groups. This could work using the global min and max and dropping extra rows in each group, but since my data is fairly large adding many rows per group quickly becomes unfeasible.

CodePudding user response:

>>> df.groupby('group').apply(lambda x: x.set_index('value').reindex(np.arange(x['value'].min(), x['value'].max()   1))).drop(columns='group').reset_index()
   group  value  other
0      a      1   82.0
1      a      2   15.0
2      a      3    NaN
3      a      4    NaN
4      a      5    4.0
5      b      3   95.0
6      b      4   36.0
7      b      5   32.0
8      b      6    NaN
9      b      7   29.0
10     c      4   18.0
11     c      5    NaN
12     c      6    NaN
13     c      7   14.0
14     c      8    NaN
15     c      9   87.0

We group on the group column and then re-index each group with the range from the min to the max of the value column

  • Related