I am trying to add rows to a DataFrame interpolating values in a column by group, and fill with missing all other columns. My data looks something like this:
import pandas as pd
import random
random.seed(42)
data = {'group':['a', 'a', 'a', 'b', 'b', 'b', 'b', 'c', 'c', 'c' ],
'value' : [1, 2, 5, 3, 4, 5, 7, 4, 7, 9],
'other': random.sample(range(1, 100), 10)}
df = pd.DataFrame(data)
print(df)
group value other
0 a 1 82
1 a 2 15
2 a 5 4
3 b 3 95
4 b 4 36
5 b 5 32
6 b 7 29
7 c 4 18
8 c 7 14
9 c 9 87
What I am trying to achieve is something like this:
group value other
a 1 82
a 2 15
a 3 NaN
a 4 NaN
a 5 NaN
b 3 95
b 4 36
b 5 32
b 6 NaN
b 7 29
c 4 18
c 5 NaN
c 6 NaN
c 7 14
c 8 NaN
c 9 87
For example, group a
has a range from 1 to 5, b
from 3 to 7, and c
from 4 to 9.
The issue I'm having is that each group has a different range. I found something that works assuming a single range for all groups. This could work using the global min and max and dropping extra rows in each group, but since my data is fairly large adding many rows per group quickly becomes unfeasible.
CodePudding user response:
>>> df.groupby('group').apply(lambda x: x.set_index('value').reindex(np.arange(x['value'].min(), x['value'].max() 1))).drop(columns='group').reset_index()
group value other
0 a 1 82.0
1 a 2 15.0
2 a 3 NaN
3 a 4 NaN
4 a 5 4.0
5 b 3 95.0
6 b 4 36.0
7 b 5 32.0
8 b 6 NaN
9 b 7 29.0
10 c 4 18.0
11 c 5 NaN
12 c 6 NaN
13 c 7 14.0
14 c 8 NaN
15 c 9 87.0
We group on the group
column and then re-index each group with the range from the min to the max of the value
column