Home > Back-end >  fill_null() values with other columns data
fill_null() values with other columns data

Time:10-25

i have a question regarding fill null values, is it possible to backfill data from other columns as in pandas?

Working pandas example on how to backfill data :

df.loc[:, ['A', 'B', 'C']] = df[['A', 'B', 'C']].fillna(
    value={'A': df['D'],
           'B': df['D'],
           'C': df['D'],
           }) 

Polars example as i tried to backfill data from column D to column A if the value is null, but it's not working:

df = pl.DataFrame(
        {"date": ["2020-01-01 00:00:00", "2020-01-07 00:00:00", "2020-01-14 00:00:00"],
        "A": [3, 4, 7],
        "B": [3, 4, 5],
        "C": [0, 1, 2],
        "D": [1, 2, 5]})
df = df.with_column(pl.col("date").str.strptime(pl.Datetime, "%Y-%m-%d %H:%M:%S"))
date_range = df.select(pl.arange(df["date"][0], df["date"]
                        [-1]   1, step=1000*60*60*24).cast(pl.Datetime).alias("date"))
df = (date_range.join(df, on="date", how="left"))
df['D'] = df['D'].fill_null("forward")
print(df)
df[:, ['A']] = df[['A']].fill_null({
    'A': df['D']
    }
)
print(df)

Kind regards, Tom

CodePudding user response:

The fillna() method is used to fill null values in pandas.

df['D'] = df['D'].fillna(df['A'].mean())

The above code will replace null values of D column with the mean value of A column.

CodePudding user response:

In the example you show and the accomponied pandas code. A fillna doesn't fill any null values, because the other columns are also NaN. So I am going to assume that you want to fill missing values by values of another column that doesn't have missing values, but correct me if I am wrong.

import polars as pl
from polars import col

df = pl.DataFrame({
    "a": [0, 1, 2, 3, None, 5, 6, None, 8, None],
    "b": range(10),
})

out = df.with_columns([
    pl.when(col("a").is_null()).then(col("b")).otherwise(col("a")).alias("a"),
    pl.when(col("a").is_null()).then(col("b").shift(1)).otherwise(col("a")).alias("a_filled_lag"),
    pl.when(col("a").is_null()).then(col("b").mean()).otherwise(col("a")).alias("a_filled_mean")

])

print(out)

In the example above, we use a when -> then -> othwerwise expression to fill missing values by another columns values. Think about if else expressions but then on whole columns.

I gave 3 examples, one where we fill on that value, one where we fill with the lagged value, and one where we fill with the mean value of the other column.

The snippet above produces:

shape: (10, 4)
┌─────┬─────┬──────────────┬───────────────┐
│ a   ┆ b   ┆ a_filled_lag ┆ a_filled_mean │
│ --- ┆ --- ┆ ---          ┆ ---           │
│ i64 ┆ i64 ┆ i64          ┆ f64           │
╞═════╪═════╪══════════════╪═══════════════╡
│ 0000.0           │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1111             │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2222             │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 3333             │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ ... ┆ ... ┆ ...          ┆ ...           │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 5555             │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 6666             │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 7764.5           │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 8888             │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 9984.5           │
└─────┴─────┴──────────────┴───────────────┘


  • Related