Is it possible to fill_forward only some columns when upsampling using Polars?
For example, whould like to fill in the missing dates in sample dataframe (see code below). 'upsample' and 'forward_fill' works beautifully and is blazing fast with a much larger dataset. The output is as expected in the table below. No issues, all as expected.
However, and the question: is it possible to exclude a column for the forward_fill, so for example it returns blanks in the 'utc_time' column instead of filling the time. I have tried listing the columns in select statement by replacing 'pl.all() with pl.col([...])', but that just removes the column that is not listed.
import polars as pl
from datetime import datetime
df = pl.DataFrame(
{
'utc_created':pl.date_range(low=datetime(2021, 12, 16), high=datetime(2021, 12, 22, 0), interval="2d"),
'utc_time':['21:12:06','21:20:06','17:51:10','03:54:49'],
'sku':[9100000801,9100000801,9100000801,9100000801],
'old':[18,17,16,15],
'new':[17,16,15,14],
'alert_type':['Inventory','Inventory','Inventory','Inventory'],
'alert_level':['Info','Info','Info','Info']
}
)
df = df.upsample(
time_column = 'utc_created',every="1d", by =('sku'))
.select(pl.all()
.forward_fill()
)
Returns:
| utc_created | utc_time | sku | old | new | alert_type | alert_level |
| 2021-12-16 00:00:00 | 21:12:06 | 9100000801 | 18 | 17 | Inventory | Info |
| 2021-12-17 00:00:00 | 21:12:06 | 9100000801 | 18 | 17 | Inventory | Info |
| 2021-12-18 00:00:00 | 21:20:06 | 9100000801 | 17 | 16 | Inventory | Info |
| 2021-12-19 00:00:00 | 21:20:06 | 9100000801 | 17 | 16 | Inventory | Info |
| 2021-12-20 00:00:00 | 17:51:10 | 9100000801 | 16 | 15 | Inventory | Info |
| 2021-12-21 00:00:00 | 17:51:10 | 9100000801 | 16 | 15 | Inventory | Info |
| 2021-12-22 00:00:00 | 03:54:49 | 9100000801 | 15 | 14 | Inventory | Info |
CodePudding user response:
You can use pl.exclude(name).forward_fill()
e.g.
.with_column(pl.exclude("utc_time").forward_fill())