Home > OS >  python-polars split dataframe into many dfs by column value using dictionary
python-polars split dataframe into many dfs by column value using dictionary

Time:09-20

I want to split a single df into many dfs by unique column value using a dictionary. The code below shows how this can be done using pandas. how can I do the following in polars?

import pandas as pd

#Favorite color of 10 people
df = pd.DataFrame({"Favorite_Color":["Blue","Yellow","Black","Red","Blue","Blue","Green","Red","Red","Blue"]})
print(df)

#split df into many dfs by Favorite_Color using dict
dict_of_dfs={key: df.loc[value] for key, value in df.groupby(["Favorite_Color"]).groups.items()}
print(dict_of_dfs)

CodePudding user response:

Polars has a DataFrame method for this: partition_by. Use the as_dict keyword to create a dictionary of DataFrames.

df.partition_by(groups="Favorite_Color", as_dict=True)
{'Blue': shape: (4, 1)
┌────────────────┐
│ Favorite_Color │
│ ---            │
│ str            │
╞════════════════╡
│ Blue           │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Blue           │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Blue           │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Blue           │
└────────────────┘,
'Yellow': shape: (1, 1)
┌────────────────┐
│ Favorite_Color │
│ ---            │
│ str            │
╞════════════════╡
│ Yellow         │
└────────────────┘,
'Black': shape: (1, 1)
┌────────────────┐
│ Favorite_Color │
│ ---            │
│ str            │
╞════════════════╡
│ Black          │
└────────────────┘,
'Red': shape: (3, 1)
┌────────────────┐
│ Favorite_Color │
│ ---            │
│ str            │
╞════════════════╡
│ Red            │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Red            │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Red            │
└────────────────┘,
'Green': shape: (1, 1)
┌────────────────┐
│ Favorite_Color │
│ ---            │
│ str            │
╞════════════════╡
│ Green          │
└────────────────┘}
  • Related