Home > database >  Polars equivalent of pandas expression df.groupby['col1','col2']['col3'
Polars equivalent of pandas expression df.groupby['col1','col2']['col3'

Time:10-29

pandasdf=pd.DataFrame(
    {
        "A": [1, 2, 3, 4, 5],
        "fruits": ["banana", "banana", "apple", "apple", "banana"],
        "B": [5, 4, 3, 2, 1],
        "cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
        "optional": [28, 300, None, 2, -30],
    }
)
pandasdf.groupby(["fruits","cars"])['B'].sum().unstack()

enter image description here

How can i create an equivalent truth table in polars?

Something like the below table into a truth table

df=pl.DataFrame(
    {
        "A": [1, 2, 3, 4, 5],
        "fruits": ["banana", "banana", "apple", "apple", "banana"],
        "B": [5, 4, 3, 2, 1],
        "cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
        "optional": [28, 300, None, 2, -30],
    }
)
df.groupby(["fruits","cars"]).agg(pl.col('B').sum()) #->truthtable

The efficiency of the code is important as the dataset is too large (for using it with apriori algorithm)

The unstack function in polars is different, polars alterative for pd.crosstab would also work.

CodePudding user response:

It seems like you want tot do a pivot.

df = pl.DataFrame(
    {
        "A": [1, 2, 3, 4, 5],
        "fruits": ["banana", "banana", "apple", "apple", "banana"],
        "B": [5, 4, 3, 2, 1],
        "cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
        "optional": [28, 300, None, 2, -30],
    }
)

df.pivot(values="B", index="cars", columns="fruits", aggregate_fn=pl.element().sum())
shape: (2, 3)
┌────────┬────────┬───────┐
│ cars   ┆ banana ┆ apple │
│ ---    ┆ ---    ┆ ---   │
│ str    ┆ i64    ┆ i64   │
╞════════╪════════╪═══════╡
│ beetle ┆ 6      ┆ 5     │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ audi   ┆ 4      ┆ null  │
└────────┴────────┴───────┘

  • Related