I have the following pandas dataframe:
import pandas as pd
foo = pd.DataFrame({'id': [1,1,1,2,2,2,3,3,3,3,3], 'time': [2,3,5,1,3,4,1,2,6,7,8],
'val':['a','a','a','a','a','a','a','a','a','a','a']})
id time val
0 1 2 a
1 1 3 a
2 1 5 a
3 2 1 a
4 2 3 a
5 2 4 a
6 3 1 a
7 3 2 a
8 3 6 a
9 3 7 a
10 3 8 a
I would like for each id
, to add a row, for each missing time
, where the val
would be 'b'
. time
would start from 1
The resulting dataframe would look like this
foo = pd.DataFrame({'id': [1,1,1,1,1,2,2,2,2,3,3,3,3,3,3,3,3], 'time': [1,2,3,4,5,1,2,3,4,1,2,3,4,5,6,7,8],
'val':['b','a','a','b','a','a','b','a','a','a','a','b','b','b','a','a','a']})
id time val
0 1 1 b
1 1 2 a
2 1 3 a
3 1 4 b
4 1 5 a
5 2 1 a
6 2 2 b
7 2 3 a
8 2 4 a
9 3 1 a
10 3 2 a
11 3 3 b
12 3 4 b
13 3 5 b
14 3 6 a
15 3 7 a
16 3 8 a
Any ideas how I could do that in python ?
This answer does not work, because it does not take into account the groupby id
and also the fact that for id == 1
, i am missing the time == 1
CodePudding user response:
Set the index of dataframe to time
then reindex
the time
column per id
and fill the NaN
values in val
column with b
(
foo
.set_index('time').groupby('id')
.apply(lambda g: g.reindex(range(1, g.index.max() 1)))
.drop('id', axis=1).fillna({'val': 'b'}).reset_index()
)
If you want to try something :fancy:, here is another solution:
(
foo.groupby('id')['time'].max()
.map(range).explode().add(1).reset_index(name='time')
.merge(foo, how='left').fillna({'val': 'b'})
)
id time val
0 1 1 b
1 1 2 a
2 1 3 a
3 1 4 b
4 1 5 a
5 2 1 a
6 2 2 b
7 2 3 a
8 2 4 a
9 3 1 a
10 3 2 a
11 3 3 b
12 3 4 b
13 3 5 b
14 3 6 a
15 3 7 a
16 3 8 a
CodePudding user response:
One option is with complete from pyjanitor :
# pip install pyjanitor
import pandas as pd
import janitor
# build a range of numbers for each group, starting from 1
new_time = {'time': lambda df: range(1, df.max() 1)}
foo.complete(new_time, by = 'id', fill_value = 'b')
id time val
0 1 1 b
1 1 2 a
2 1 3 a
3 1 4 b
4 1 5 a
5 2 1 a
6 2 2 b
7 2 3 a
8 2 4 a
9 3 1 a
10 3 2 a
11 3 3 b
12 3 4 b
13 3 5 b
14 3 6 a
15 3 7 a
16 3 8 a