I want to split a single df into many dfs by unique column value using a dictionary. The code below shows how this can be done using pandas. how can I do the following in polars?
import pandas as pd
#Favorite color of 10 people
df = pd.DataFrame({"Favorite_Color":["Blue","Yellow","Black","Red","Blue","Blue","Green","Red","Red","Blue"]})
print(df)
#split df into many dfs by Favorite_Color using dict
dict_of_dfs={key: df.loc[value] for key, value in df.groupby(["Favorite_Color"]).groups.items()}
print(dict_of_dfs)
CodePudding user response:
Polars has a DataFrame method for this: partition_by
. Use the as_dict
keyword to create a dictionary of DataFrames.
df.partition_by(groups="Favorite_Color", as_dict=True)
{'Blue': shape: (4, 1)
┌────────────────┐
│ Favorite_Color │
│ --- │
│ str │
╞════════════════╡
│ Blue │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Blue │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Blue │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Blue │
└────────────────┘,
'Yellow': shape: (1, 1)
┌────────────────┐
│ Favorite_Color │
│ --- │
│ str │
╞════════════════╡
│ Yellow │
└────────────────┘,
'Black': shape: (1, 1)
┌────────────────┐
│ Favorite_Color │
│ --- │
│ str │
╞════════════════╡
│ Black │
└────────────────┘,
'Red': shape: (3, 1)
┌────────────────┐
│ Favorite_Color │
│ --- │
│ str │
╞════════════════╡
│ Red │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Red │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Red │
└────────────────┘,
'Green': shape: (1, 1)
┌────────────────┐
│ Favorite_Color │
│ --- │
│ str │
╞════════════════╡
│ Green │
└────────────────┘}