Suppose that I have a dataframe which can be created using code below
df = pd.DataFrame(data = {'date':['2021-01-01', '2021-01-02', '2021-01-05','2021-01-02', '2021-01-03', '2021-01-05'],
'product':['A', 'A', 'A', 'B', 'B', 'B'],
'price':[10, 20, 30, 40, 50, 60]
}
)
df['date'] = pd.to_datetime(df['date'])
I want to create an empty dataframe let's say main_df
which will contain all dates between df.date.min()
and df.date.max()
for each product
and on days where values in nan
I want to ffill
and bfill
for remaning. The resulting dataframe would be as below:
------------ --------- -------
| date | product | price |
------------ --------- -------
| 2021-01-01 | A | 10 |
| 2021-01-02 | A | 20 |
| 2021-01-03 | A | 20 |
| 2021-01-04 | A | 20 |
| 2021-01-05 | A | 30 |
| 2021-01-01 | B | 40 |
| 2021-01-02 | B | 40 |
| 2021-01-03 | B | 50 |
| 2021-01-04 | B | 50 |
| 2021-01-05 | B | 60 |
------------ --------- -------
CodePudding user response:
First
make pivot table, upsampling by asfreq
and fill null
df.pivot_table('price', 'date', 'product').asfreq('D').ffill().bfill()
output:
product A B
date
2021-01-01 10.0 40.0
2021-01-02 20.0 40.0
2021-01-03 20.0 50.0
2021-01-04 20.0 50.0
2021-01-05 30.0 60.0
Second
stack
result and so on (include full code)
(df.pivot_table('price', 'date', 'product').asfreq('D').ffill().bfill()
.stack().reset_index().rename(columns={0:'price'})
.sort_values('product').reset_index(drop=True))
output:
date product price
0 2021-01-01 A 10.0
1 2021-01-02 A 20.0
2 2021-01-03 A 20.0
3 2021-01-04 A 20.0
4 2021-01-05 A 30.0
5 2021-01-01 B 40.0
6 2021-01-02 B 40.0
7 2021-01-03 B 50.0
8 2021-01-04 B 50.0
9 2021-01-05 B 60.0