I have a data frame, I want to group the data based on the ids and then stick the values for each id together. However, since the number of the ids are not equal, I have to fill the missing values for those ids with same values of the day before (or the day after) of those ids. I previously ask this question and I got a response, however it is only for the day before. But, in my real data, since some times I don't have the first day, their values are Nan.
I have provided an example as below, for example, for id=1, id=2 there is two dates, but for id=3, there are three days.
df = pd.DataFrame()
df['id'] = [1, 1, 2,2, 3, 3, 3]
df['date'] = ['2019-01-02', '2019-01-03', '2019-01-01','2019-01-02', '2019-01-01', '2019-01-02','2019-01-03']
df['val1'] = [10, 100, 20, 30, 40, 50, 60]
df['val2'] = [30, 30, -20, -30, -40,-50, -60 ]
df['val3'] = [50, 10, 120, 300, 140, 150, 160]
The desired output is:
val0 val1 val2 val3 val4 val5 val6 val7 val8
id
1 10.0 30.0 50.0 10.0 30.0 50.0 100.0 30.0 10.0
2 20.0 -20.0 120.0 30.0 -30.0 300.0 30.0 -30.0 300.0
3 40.0 -40.0 140.0 50.0 -50.0 150.0 60.0 -60.0 160.0
Here is the response for the case where it is for the day before.
mi = pd.MultiIndex.from_product([df['id'].unique(), df['date'].unique()], names=['id', 'date'])
df = df.set_index(['id', 'date']).reindex(mi).sort_index().ffill()
df = df.groupby(level='id').apply(np.ravel).apply(pd.Series).add_prefix('val')
If you run the code, you can see that the first values for the id=1 are Nan. And also I don't want to sort the id.
Can anybody help me with that? Thanks
CodePudding user response:
One option is to pivot
the DataFrame, then use groupby
across columns to fill in the missing values:
val_cols = [c for c in df.columns if c.startswith('val')]
out = (df.pivot('id', 'date', val_cols)
.groupby(level=0, axis=1).apply(lambda x: x.ffill(axis=1).bfill(axis=1))
.sort_index(axis=1, level=1))
out.columns = [f'val{i}' for i in range(out.shape[1])]
Output:
val0 val1 val2 val3 val4 val5 val6 val7 val8
id
1 10.0 30.0 50.0 10.0 30.0 50.0 100.0 30.0 10.0
2 20.0 -20.0 120.0 30.0 -30.0 300.0 30.0 -30.0 300.0
3 40.0 -40.0 140.0 50.0 -50.0 150.0 60.0 -60.0 160.0