I have a DataFrame
df = DataFrame(a=[1,1,2,2],b=[6,7,8,9])
4×2 DataFrame
Row │ a b
│ Int64 Int64
─────┼──────────────
1 │ 1 6
2 │ 1 7
3 │ 2 8
4 │ 2 9
Is there a canonical way of splitting it into a Vector{DataFrame}
s? I can do
[df[df.a .== i,:] for i in unique(df.a)]
2-element Vector{DataFrame}:
2×2 DataFrame
Row │ a b
│ Int64 Int64
─────┼──────────────
1 │ 1 6
2 │ 1 7
2×2 DataFrame
Row │ a b
│ Int64 Int64
─────┼──────────────
1 │ 1 6
2 │ 1 7
but is there maybe something more elegant?
CodePudding user response:
Use:
julia> gdf = groupby(df, :a, sort=true)
GroupedDataFrame with 2 groups based on key: a
First Group (2 rows): a = 1
Row │ a b
│ Int64 Int64
─────┼──────────────
1 │ 1 6
2 │ 1 7
⋮
Last Group (2 rows): a = 2
Row │ a b
│ Int64 Int64
─────┼──────────────
1 │ 2 8
2 │ 2 9
(you could omit sort=true
, but sorting ensures that the output is ordered in ascending order of the lookup key).
Then you can just work with this object as a vector:
julia> gdf[1]
2×2 SubDataFrame
Row │ a b
│ Int64 Int64
─────┼──────────────
1 │ 1 6
2 │ 1 7
julia> gdf[2]
2×2 SubDataFrame
Row │ a b
│ Int64 Int64
─────┼──────────────
1 │ 2 8
2 │ 2 9
This operation is non-allocating (it is a view into your original data frame).
If you really want Vector{DataFrame}
(i.e. make copies of all groups) do:
julia> collect(DataFrame, gdf)
2-element Vector{DataFrame}:
2×2 DataFrame
Row │ a b
│ Int64 Int64
─────┼──────────────
1 │ 1 6
2 │ 1 7
2×2 DataFrame
Row │ a b
│ Int64 Int64
─────┼──────────────
1 │ 2 8
2 │ 2 9