Home > database >  Upsampling a polars dataframe with groupby forward fill only some columns?
Upsampling a polars dataframe with groupby forward fill only some columns?

Time:12-26

Is it possible to fill_forward only some columns when upsampling using Polars?

For example, whould like to fill in the missing dates in sample dataframe (see code below). 'upsample' and 'forward_fill' works beautifully and is blazing fast with a much larger dataset. The output is as expected in the table below. No issues, all as expected.

However, and the question: is it possible to exclude a column for the forward_fill, so for example it returns blanks in the 'utc_time' column instead of filling the time. I have tried listing the columns in select statement by replacing 'pl.all() with pl.col([...])', but that just removes the column that is not listed.

import polars as pl
from datetime import datetime

df = pl.DataFrame(  
    {  
     'utc_created':pl.date_range(low=datetime(2021, 12, 16), high=datetime(2021, 12, 22, 0), interval="2d"),  
     'utc_time':['21:12:06','21:20:06','17:51:10','03:54:49'],  
     'sku':[9100000801,9100000801,9100000801,9100000801],  
     'old':[18,17,16,15],  
     'new':[17,16,15,14],  
     'alert_type':['Inventory','Inventory','Inventory','Inventory'],  
     'alert_level':['Info','Info','Info','Info']  
    }  
    )

df = df.upsample(
        time_column = 'utc_created',every="1d", by =('sku'))
        .select(pl.all()
        .forward_fill()
        )

Returns:

| utc_created         | utc_time | sku        | old | new | alert_type | alert_level |
| 2021-12-16 00:00:00 | 21:12:06 | 9100000801 | 18  | 17  | Inventory  | Info        |
| 2021-12-17 00:00:00 | 21:12:06 | 9100000801 | 18  | 17  | Inventory  | Info        |
| 2021-12-18 00:00:00 | 21:20:06 | 9100000801 | 17  | 16  | Inventory  | Info        |
| 2021-12-19 00:00:00 | 21:20:06 | 9100000801 | 17  | 16  | Inventory  | Info        |
| 2021-12-20 00:00:00 | 17:51:10 | 9100000801 | 16  | 15  | Inventory  | Info        |
| 2021-12-21 00:00:00 | 17:51:10 | 9100000801 | 16  | 15  | Inventory  | Info        |
| 2021-12-22 00:00:00 | 03:54:49 | 9100000801 | 15  | 14  | Inventory  | Info        |

CodePudding user response:

You can use pl.exclude(name).forward_fill() e.g.

.with_column(pl.exclude("utc_time").forward_fill())
  • Related